Convert A Categorical Variable Into Dummy Variables

Want to learn more? I recommend these Python books: Python for Data Analysis, Python Data Science Handbook, and Introduction to Machine Learning with Python.

# import modules
import pandas as pd
# Create a dataframe
raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
        'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'],
        'sex': ['male', 'female', 'male', 'female', 'female']}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'sex'])
df
first_name last_name sex
0 Jason Miller male
1 Molly Jacobson female
2 Tina Ali male
3 Jake Milner female
4 Amy Cooze female

5 rows × 3 columns

# Create a set of dummy variables from the sex variable
df_sex = pd.get_dummies(df['sex'])
# Join the dummy variables to the main dataframe
df_new = pd.concat([df, df_sex], axis=1)
df_new
first_name last_name sex female male
0 Jason Miller male 0 1
1 Molly Jacobson female 1 0
2 Tina Ali male 0 1
3 Jake Milner female 1 0
4 Amy Cooze female 1 0

5 rows × 5 columns

# Alterative for joining the new columns
df_new = df.join(df_sex)
df_new
first_name last_name sex female male
0 Jason Miller male 0 1
1 Molly Jacobson female 1 0
2 Tina Ali male 0 1
3 Jake Milner female 1 0
4 Amy Cooze female 1 0

5 rows × 5 columns