Selecting Pandas DataFrame Rows Based On Conditions

Want to learn more? I recommend these Python books: Python for Data Analysis, Python Data Science Handbook, and Introduction to Machine Learning with Python.

Preliminaries

# Import modules
import pandas as pd
import numpy as np
# Create a dataframe
raw_data = {'first_name': ['Jason', 'Molly', np.nan, np.nan, np.nan],
        'nationality': ['USA', 'USA', 'France', 'UK', 'UK'],
        'age': [42, 52, 36, 24, 70]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'nationality', 'age'])
df
first_name nationality age
0 Jason USA 42
1 Molly USA 52
2 NaN France 36
3 NaN UK 24
4 NaN UK 70

Method 1: Using Boolean Variables

# Create variable with TRUE if nationality is USA
american = df['nationality'] == "USA"

# Create variable with TRUE if age is greater than 50
elderly = df['age'] > 50

# Select all casess where nationality is USA and age is greater than 50
df[american & elderly]
first_name nationality age
1 Molly USA 52

Method 2: Using variable attributes

# Select all cases where the first name is not missing and nationality is USA
df[df['first_name'].notnull() & (df['nationality'] == "USA")]
first_name nationality age
0 Jason USA 42
1 Molly USA 52