# Trimmed Mean

Trimmed means are averaging techniques that do not count (i.e. trim off) extreme values. The goal is to make mean calculations more robust to extreme values by not considering those values when calculating the mean.

SciPy offers a great methods of calculating trimmed means.

## Preliminaries

```
# Import libraries
import pandas as pd
from scipy import stats
```

## Create DataFrame

```
# Create dataframe with two extreme values
data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy', 'Bob', 'Jack', 'Jill', 'Kelly', 'Mark', 'Kao', 'Dillon'],
'score': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 100, 100]
}
df = pd.DataFrame(data)
df
```

name | score | |
---|---|---|

0 | Jason | 1 |

1 | Molly | 2 |

2 | Tina | 3 |

3 | Jake | 4 |

4 | Amy | 5 |

5 | Bob | 6 |

6 | Jack | 7 |

7 | Jill | 8 |

8 | Kelly | 9 |

9 | Mark | 10 |

10 | Kao | 100 |

11 | Dillon | 100 |

## Calculate Non-Trimmed Mean

```
# Calculate non-trimmed mean
df['score'].mean()
```

```
21.25
```

## Calculate Mean After Trimming Off Highest And Lowest

```
# Trim off the 20% most extreme scores (lowest and highest)
stats.trim_mean(df['score'], proportiontocut=0.2)
```

```
6.5
```

We can use `trimboth`

to see which values are used to calculate the trimmed mean:

```
# Trim off the 20% most extreme scores and view the non-trimmed values
stats.trimboth(df['score'], proportiontocut=0.2)
```

```
array([ 3, 5, 4, 6, 7, 8, 9, 10])
```

## Calculate Mean After Trimming Only Highest Extremes

The `right`

tail refers to the highest values in the array and `left`

refers to the lowest values in the array.

```
# Trim off the highest 20% of values and view trimmed mean
stats.trim1(df['score'], proportiontocut=0.2, tail='right').mean()
```

```
5.5
```

```
# Trim off the highest 20% of values and view non-trimmed values
stats.trim1(df['score'], proportiontocut=0.2, tail='right')
```

```
array([ 1, 3, 2, 4, 5, 6, 7, 9, 8, 10])
```