Learning machine learning? Try my machine learning flashcards or Machine Learning with Python Cookbook.

# Histograms In MatPlotLib

## Preliminaries

```
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import math
# Set ipython's max row display
pd.set_option('display.max_row', 1000)
# Set iPython's max column width to 50
pd.set_option('display.max_columns', 50)
```

## Create dataframe

```
df = pd.read_csv('https://www.dropbox.com/s/52cb7kcflr8qm2u/5kings_battles_v1.csv?dl=1')
df.head()
```

name | year | battle_number | attacker_king | defender_king | attacker_1 | attacker_2 | attacker_3 | attacker_4 | defender_1 | defender_2 | defender_3 | defender_4 | attacker_outcome | battle_type | major_death | major_capture | attacker_size | defender_size | attacker_commander | defender_commander | summer | location | region | note | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

0 | Battle of the Golden Tooth | 298 | 1 | Joffrey/Tommen Baratheon | Robb Stark | Lannister | NaN | NaN | NaN | Tully | NaN | NaN | NaN | win | pitched battle | 1 | 0 | 15000 | 4000 | Jaime Lannister | Clement Piper, Vance | 1 | Golden Tooth | The Westerlands | NaN |

1 | Battle at the Mummer's Ford | 298 | 2 | Joffrey/Tommen Baratheon | Robb Stark | Lannister | NaN | NaN | NaN | Baratheon | NaN | NaN | NaN | win | ambush | 1 | 0 | NaN | 120 | Gregor Clegane | Beric Dondarrion | 1 | Mummer's Ford | The Riverlands | NaN |

2 | Battle of Riverrun | 298 | 3 | Joffrey/Tommen Baratheon | Robb Stark | Lannister | NaN | NaN | NaN | Tully | NaN | NaN | NaN | win | pitched battle | 0 | 1 | 15000 | 10000 | Jaime Lannister, Andros Brax | Edmure Tully, Tytos Blackwood | 1 | Riverrun | The Riverlands | NaN |

3 | Battle of the Green Fork | 298 | 4 | Robb Stark | Joffrey/Tommen Baratheon | Stark | NaN | NaN | NaN | Lannister | NaN | NaN | NaN | loss | pitched battle | 1 | 1 | 18000 | 20000 | Roose Bolton, Wylis Manderly, Medger Cerwyn, H... | Tywin Lannister, Gregor Clegane, Kevan Lannist... | 1 | Green Fork | The Riverlands | NaN |

4 | Battle of the Whispering Wood | 298 | 5 | Robb Stark | Joffrey/Tommen Baratheon | Stark | Tully | NaN | NaN | Lannister | NaN | NaN | NaN | win | ambush | 1 | 1 | 1875 | 6000 | Robb Stark, Brynden Tully | Jaime Lannister | 1 | Whispering Wood | The Riverlands | NaN |

## Make plot with bins of fixed size

```
# Make two variables of the attacker and defender size, but leaving out
# cases when there are over 10000 attackers
data1 = df['attacker_size'][df['attacker_size'] < 90000]
data2 = df['defender_size'][df['attacker_size'] < 90000]
# Create bins of 2000 each
bins = np.arange(data1.min(), data2.max(), 2000) # fixed bin size
# Plot a histogram of attacker size
plt.hist(data1,
bins=bins,
alpha=0.5,
color='#EDD834',
label='Attacker')
# Plot a histogram of defender size
plt.hist(data2,
bins=bins,
alpha=0.5,
color='#887E43',
label='Defender')
# Set the x and y boundaries of the figure
plt.ylim([0, 10])
# Set the title and labels
plt.title('Histogram of Attacker and Defender Size')
plt.xlabel('Number of troops')
plt.ylabel('Number of battles')
plt.legend(loc='upper right')
plt.show()
```

## Make plot with fixed number of bins

```
# Make two variables of the attacker and defender size, but leaving out
# cases when there are over 10000 attackers
data1 = df['attacker_size'][df['attacker_size'] < 90000]
data2 = df['defender_size'][df['attacker_size'] < 90000]
# Create 10 bins with the minimum
# being the smallest value of data1 and data2
bins = np.linspace(min(data1 + data2),
# the max being the highest value
max(data1 + data2),
# and divided into 10 bins
10)
# Plot a histogram of attacker size
plt.hist(data1,
# with bins defined as
bins=bins,
# with alpha
alpha=0.5,
# with color
color='#EDD834',
# labelled attacker
label='Attacker')
# Plot a histogram of defender size
plt.hist(data2,
# with bins defined as
bins=bins,
# with alpha
alpha=0.5,
# with color
color='#887E43',
# labeled defender
label='Defender')
# Set the x and y boundaries of the figure
plt.ylim([0, 10])
# Set the title and labels
plt.title('Histogram of Attacker and Defender Size')
plt.xlabel('Number of troops')
plt.ylabel('Number of battles')
plt.legend(loc='upper right')
plt.show()
```