Random Sampling Dataframe

Want to learn more? I recommend these Python books: Python for Data Analysis, Python Data Science Handbook, and Introduction to Machine Learning with Python.

import modules

import pandas as pd
import numpy as np

Create dataframe

raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
        'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'],
        'age': [42, 52, 36, 24, 73],
        'preTestScore': [4, 24, 31, 2, 3],
        'postTestScore': [25, 94, 57, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'preTestScore', 'postTestScore'])
df
first_name last_name age preTestScore postTestScore
0 Jason Miller 42 4 25
1 Molly Jacobson 52 24 94
2 Tina Ali 36 31 57
3 Jake Milner 24 2 62
4 Amy Cooze 73 3 70

5 rows × 5 columns

Select a random subset of 2 without replacement

df.take(np.random.permutation(len(df))[:2])
first_name last_name age preTestScore postTestScore
3 Jake Milner 24 2 62
0 Jason Miller 42 4 25

2 rows × 5 columns