Easy tabular data reading and manipulation with pandas
This introduction to pandas is derived from Data School's pandas Q&A with my own notes and code.
Reading a tabular data file into pandas¶
Tabular data file examples
- csv
- excel
- table-like data format
In [1]:
# import pandas
import pandas as pd
In [8]:
# reading a well-formatted .tsv file
url = 'http://bit.ly/chiporders'
orders = pd.read_table(url)
orders.head()
Out[8]:
read_table assumptions
- file is separated by tabs
- presence of a header role
In [10]:
url2 = 'http://bit.ly/movieusers'
users = pd.read_table(url2)
users.head()
Out[10]:
Issues
- Separator is a pipe character
- We need to tell pandas that this is the separator using sep=
- There is no header
- We need to use header=None
- We can add a row of names for the columns using names=user_cols
In [15]:
user_cols = ['user_id', 'age', 'gender', 'occupation', 'zip_code']
users = pd.read_table(url2, sep='|', header=None, names=user_cols)
users.head()
Out[15]:
Tips
- If you've a data file where you've some text at the top and bottom of the file
- skiprows=None
- Skip rows at the top or bottom
- skipfooter=None
- skiprows=None