# Histograms In MatPlotLib

**Note:** Based on: Sebastian Raschka.

Want to learn more? I recommend these Python books: Python for Data Analysis, Python Data Science Handbook, and Introduction to Machine Learning with Python.

## Preliminaries

%matplotlib inline import pandas as pd import matplotlib.pyplot as plt import numpy as np import math # Set ipython's max row display pd.set_option('display.max_row', 1000) # Set iPython's max column width to 50 pd.set_option('display.max_columns', 50)

## Create dataframe

df = pd.read_csv('https://www.dropbox.com/s/52cb7kcflr8qm2u/5kings_battles_v1.csv?dl=1') df.head()

name | year | battle_number | attacker_king | defender_king | attacker_1 | attacker_2 | attacker_3 | attacker_4 | defender_1 | defender_2 | defender_3 | defender_4 | attacker_outcome | battle_type | major_death | major_capture | attacker_size | defender_size | attacker_commander | defender_commander | summer | location | region | note | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

0 | Battle of the Golden Tooth | 298 | 1 | Joffrey/Tommen Baratheon | Robb Stark | Lannister | NaN | NaN | NaN | Tully | NaN | NaN | NaN | win | pitched battle | 1 | 0 | 15000 | 4000 | Jaime Lannister | Clement Piper, Vance | 1 | Golden Tooth | The Westerlands | NaN |

1 | Battle at the Mummer's Ford | 298 | 2 | Joffrey/Tommen Baratheon | Robb Stark | Lannister | NaN | NaN | NaN | Baratheon | NaN | NaN | NaN | win | ambush | 1 | 0 | NaN | 120 | Gregor Clegane | Beric Dondarrion | 1 | Mummer's Ford | The Riverlands | NaN |

2 | Battle of Riverrun | 298 | 3 | Joffrey/Tommen Baratheon | Robb Stark | Lannister | NaN | NaN | NaN | Tully | NaN | NaN | NaN | win | pitched battle | 0 | 1 | 15000 | 10000 | Jaime Lannister, Andros Brax | Edmure Tully, Tytos Blackwood | 1 | Riverrun | The Riverlands | NaN |

3 | Battle of the Green Fork | 298 | 4 | Robb Stark | Joffrey/Tommen Baratheon | Stark | NaN | NaN | NaN | Lannister | NaN | NaN | NaN | loss | pitched battle | 1 | 1 | 18000 | 20000 | Roose Bolton, Wylis Manderly, Medger Cerwyn, H... | Tywin Lannister, Gregor Clegane, Kevan Lannist... | 1 | Green Fork | The Riverlands | NaN |

4 | Battle of the Whispering Wood | 298 | 5 | Robb Stark | Joffrey/Tommen Baratheon | Stark | Tully | NaN | NaN | Lannister | NaN | NaN | NaN | win | ambush | 1 | 1 | 1875 | 6000 | Robb Stark, Brynden Tully | Jaime Lannister | 1 | Whispering Wood | The Riverlands | NaN |

## Make plot with bins of fixed size

# Make two variables of the attacker and defender size, but leaving out # cases when there are over 10000 attackers data1 = df['attacker_size'][df['attacker_size'] < 90000] data2 = df['defender_size'][df['attacker_size'] < 90000] # Create bins of 2000 each bins = np.arange(data1.min(), data2.max(), 2000) # fixed bin size # Plot a histogram of attacker size plt.hist(data1, bins=bins, alpha=0.5, color='#EDD834', label='Attacker') # Plot a histogram of defender size plt.hist(data2, bins=bins, alpha=0.5, color='#887E43', label='Defender') # Set the x and y boundaries of the figure plt.ylim([0, 10]) # Set the title and labels plt.title('Histogram of Attacker and Defender Size') plt.xlabel('Number of troops') plt.ylabel('Number of battles') plt.legend(loc='upper right') plt.show()

## Make plot with fixed number of bins

# Make two variables of the attacker and defender size, but leaving out # cases when there are over 10000 attackers data1 = df['attacker_size'][df['attacker_size'] < 90000] data2 = df['defender_size'][df['attacker_size'] < 90000] # Create 10 bins with the minimum # being the smallest value of data1 and data2 bins = np.linspace(min(data1 + data2), # the max being the highest value max(data1 + data2), # and divided into 10 bins 10) # Plot a histogram of attacker size plt.hist(data1, # with bins defined as bins=bins, # with alpha alpha=0.5, # with color color='#EDD834', # labelled attacker label='Attacker') # Plot a histogram of defender size plt.hist(data2, # with bins defined as bins=bins, # with alpha alpha=0.5, # with color color='#887E43', # labeled defender label='Defender') # Set the x and y boundaries of the figure plt.ylim([0, 10]) # Set the title and labels plt.title('Histogram of Attacker and Defender Size') plt.xlabel('Number of troops') plt.ylabel('Number of battles') plt.legend(loc='upper right') plt.show()