Create A Pipeline In Pandas

Pandas' pipeline feature allows you to string together Python functions in order to build a pipeline of data processing.

Want to learn more? I recommend these Python books: Python for Data Analysis, Python Data Science Handbook, and Introduction to Machine Learning with Python.


import pandas as pd

Create Dataframe

# Create empty dataframe
df = pd.DataFrame()

# Create a column
df['name'] = ['John', 'Steve', 'Sarah']
df['gender'] = ['Male', 'Male', 'Female']
df['age'] = [31, 32, 19]

# View dataframe
name gender age
0 John Male 31
1 Steve Male 32
2 Sarah Female 19

Create Functions To Process Data

# Create a function that
def mean_age_by_group(dataframe, col):
    # groups the data by a column and returns the mean age per group
    return dataframe.groupby(col).mean()
# Create a function that
def uppercase_column_name(dataframe):
    # Capitalizes all the column headers
    dataframe.columns = dataframe.columns.str.upper()
    # And returns them
    return dataframe

Create A Pipeline Of Those Functions

# Create a pipeline that applies the mean_age_by_group function
(df.pipe(mean_age_by_group, col='gender')
   # then applies the uppercase column name function
Female 19.0
Male 31.5