Reading subset of columns or rows, iterating through a Series or DataFrame, dropping all non-numeric columns and passing arguments
This introduction to pandas is derived from Data School's pandas Q&A with my own notes and code.
Reading subset of columns or rows, iterating through a Series or DataFrame, dropping all non-numeric columns and passing arguments¶
1. Reading subset of columns or rows¶
In [1]:
import pandas as pd
In [2]:
link = 'http://bit.ly/uforeports'
ufo = pd.read_csv(link)
In [3]:
ufo.columns
Out[3]:
In [4]:
# reference using String
cols = ['City', 'State']
ufo = pd.read_csv(link, usecols=cols)
In [5]:
ufo.head()
Out[5]:
In [6]:
# reference using position (Integer)
cols2 = [0, 4]
ufo = pd.read_csv(link, usecols=cols2)
In [7]:
ufo.head()
Out[7]:
In [8]:
# if you only want certain number of rows
ufo = pd.read_csv(link, nrows=3)
In [9]:
ufo
Out[9]:
2. Iterating through a Series and DataFrame¶
In [11]:
# intuitive method
for c in ufo.City:
print(c)
In [12]:
# pandas method
# you can grab index and row
for index, row in ufo.iterrows():
print(index, row.City, row.State)
3. Drop non-numeric column in a DataFrame¶
In [13]:
link = 'http://bit.ly/drinksbycountry'
drinks = pd.read_csv(link)
In [14]:
# you have 2 non-numeric columns
drinks.dtypes
Out[14]:
In [17]:
import numpy as np
drinks.select_dtypes(include=[np.number]).dtypes
Out[17]:
4. Passing arguments, when to use list or string¶
In [19]:
drinks.describe(include='all')
Out[19]:
In [21]:
# here you pass a list
# use shift + tab to know what arguments to pass in
list_include = ['object', 'float64']
drinks.describe(include=list_include)
Out[21]: