v Loading A CSV Into Pandas - Python

Loading A CSV Into Pandas

import modules

import pandas as pd
import numpy as np

Create dataframe (that we will be importing)

raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
        'last_name': ['Miller', 'Jacobson', ".", 'Milner', 'Cooze'], 
        'age': [42, 52, 36, 24, 73], 
        'preTestScore': [4, 24, 31, ".", "."],
        'postTestScore': ["25,000", "94,000", 57, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'preTestScore', 'postTestScore'])
df
first_name last_name age preTestScore postTestScore
0 Jason Miller 42 4 25,000
1 Molly Jacobson 52 24 94,000
2 Tina . 36 31 57
3 Jake Milner 24 . 62
4 Amy Cooze 73 . 70

Save dataframe as csv in the working director

df.to_csv('pandas_dataframe_importing_csv/example.csv')

Load a csv

df = pd.read_csv('pandas_dataframe_importing_csv/example.csv')
df
Unnamed: 0 first_name last_name age preTestScore postTestScore
0 0 Jason Miller 42 4 25,000
1 1 Molly Jacobson 52 24 94,000
2 2 Tina . 36 31 57
3 3 Jake Milner 24 . 62
4 4 Amy Cooze 73 . 70

Load a csv with no headers

df = pd.read_csv('pandas_dataframe_importing_csv/example.csv', header=None)
df
0 1 2 3 4 5
0 NaN first_name last_name age preTestScore postTestScore
1 0.0 Jason Miller 42 4 25,000
2 1.0 Molly Jacobson 52 24 94,000
3 2.0 Tina . 36 31 57
4 3.0 Jake Milner 24 . 62
5 4.0 Amy Cooze 73 . 70

Load a csv while specifying column names

df = pd.read_csv('pandas_dataframe_importing_csv/example.csv', names=['UID', 'First Name', 'Last Name', 'Age', 'Pre-Test Score', 'Post-Test Score'])
df
UID First Name Last Name Age Pre-Test Score Post-Test Score
0 NaN first_name last_name age preTestScore postTestScore
1 0.0 Jason Miller 42 4 25,000
2 1.0 Molly Jacobson 52 24 94,000
3 2.0 Tina . 36 31 57
4 3.0 Jake Milner 24 . 62
5 4.0 Amy Cooze 73 . 70

Load a csv with setting the index column to UID

df = pd.read_csv('pandas_dataframe_importing_csv/example.csv', index_col='UID', names=['UID', 'First Name', 'Last Name', 'Age', 'Pre-Test Score', 'Post-Test Score'])
df
First Name Last Name Age Pre-Test Score Post-Test Score
UID
NaN first_name last_name age preTestScore postTestScore
0.0 Jason Miller 42 4 25,000
1.0 Molly Jacobson 52 24 94,000
2.0 Tina . 36 31 57
3.0 Jake Milner 24 . 62
4.0 Amy Cooze 73 . 70

Load a csv while setting the index columns to First Name and Last Name

df = pd.read_csv('pandas_dataframe_importing_csv/example.csv', index_col=['First Name', 'Last Name'], names=['UID', 'First Name', 'Last Name', 'Age', 'Pre-Test Score', 'Post-Test Score'])
df
UID Age Pre-Test Score Post-Test Score
First Name Last Name
first_name last_name NaN age preTestScore postTestScore
Jason Miller 0.0 42 4 25,000
Molly Jacobson 1.0 52 24 94,000
Tina . 2.0 36 31 57
Jake Milner 3.0 24 . 62
Amy Cooze 4.0 73 . 70

Load a csv while specifying "." as missing values

df = pd.read_csv('pandas_dataframe_importing_csv/example.csv', na_values=['.'])
pd.isnull(df)
Unnamed: 0 first_name last_name age preTestScore postTestScore
0 False False False False False False
1 False False False False False False
2 False False True False False False
3 False False False False True False
4 False False False False True False

Load a csv while specifying "." and "NA" as missing values in the Last Name column and "." as missing values in Pre-Test Score column

sentinels = {'Last Name': ['.', 'NA'], 'Pre-Test Score': ['.']}
df = pd.read_csv('pandas_dataframe_importing_csv/example.csv', na_values=sentinels)
df
Unnamed: 0 first_name last_name age preTestScore postTestScore
0 0 Jason Miller 42 4 25,000
1 1 Molly Jacobson 52 24 94,000
2 2 Tina . 36 31 57
3 3 Jake Milner 24 . 62
4 4 Amy Cooze 73 . 70

Load a csv while skipping the top 3 rows

df = pd.read_csv('pandas_dataframe_importing_csv/example.csv', na_values=sentinels, skiprows=3)
df
2 Tina . 36 31 57
0 3 Jake Milner 24 . 62
1 4 Amy Cooze 73 . 70

Load a csv while interpreting "," in strings around numbers as thousands seperators

df = pd.read_csv('pandas_dataframe_importing_csv/example.csv', thousands=',')
df
Unnamed: 0 first_name last_name age preTestScore postTestScore
0 0 Jason Miller 42 4 25000
1 1 Molly Jacobson 52 24 94000
2 2 Tina . 36 31 57
3 3 Jake Milner 24 . 62
4 4 Amy Cooze 73 . 70