v Binning Data In Pandas - Python

Binning Data In Pandas

import modules

import pandas as pd

Create dataframe

raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], 
        'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'], 
        'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 
        'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
        'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore'])
df
regiment company name preTestScore postTestScore
0 Nighthawks 1st Miller 4 25
1 Nighthawks 1st Jacobson 24 94
2 Nighthawks 2nd Ali 31 57
3 Nighthawks 2nd Milner 2 62
4 Dragoons 1st Cooze 3 70
5 Dragoons 1st Jacon 4 25
6 Dragoons 2nd Ryaner 24 94
7 Dragoons 2nd Sone 31 57
8 Scouts 1st Sloan 2 62
9 Scouts 1st Piger 3 70
10 Scouts 2nd Riani 2 62
11 Scouts 2nd Ali 3 70

Define bins as 0 to 25, 25 to 50, 50 to 75, 75 to 100

bins = [0, 25, 50, 75, 100]

Create names for the four groups

group_names = ['Low', 'Okay', 'Good', 'Great']

Cut postTestScore and add scoresBinned column

categories = pd.cut(df['postTestScore'], bins, labels=group_names)
df['categories'] = pd.cut(df['postTestScore'], bins, labels=group_names)
df['scoresBinned'] = pd.cut(df['postTestScore'], bins)
categories
0       Low
1     Great
2      Good
3      Good
4      Good
5       Low
6     Great
7      Good
8      Good
9      Good
10     Good
11     Good
Name: postTestScore, dtype: category
Categories (4, object): [Good < Great < Low < Okay]

Count the number of observations which each value

pd.value_counts(df['categories'])
Good     8
Low      2
Great    2
Okay     0
Name: categories, dtype: int64

View the dataframe

df
regiment company name preTestScore postTestScore categories scoresBinned
0 Nighthawks 1st Miller 4 25 Low (0, 25]
1 Nighthawks 1st Jacobson 24 94 Great (75, 100]
2 Nighthawks 2nd Ali 31 57 Good (50, 75]
3 Nighthawks 2nd Milner 2 62 Good (50, 75]
4 Dragoons 1st Cooze 3 70 Good (50, 75]
5 Dragoons 1st Jacon 4 25 Low (0, 25]
6 Dragoons 2nd Ryaner 24 94 Great (75, 100]
7 Dragoons 2nd Sone 31 57 Good (50, 75]
8 Scouts 1st Sloan 2 62 Good (50, 75]
9 Scouts 1st Piger 3 70 Good (50, 75]
10 Scouts 2nd Riani 2 62 Good (50, 75]
11 Scouts 2nd Ali 3 70 Good (50, 75]