This introduction to pandas is derived from Data School's pandas Q&A with my own notes and code.

Reading subset of columns or rows, iterating through a Series or DataFrame, dropping all non-numeric columns and passing arguments¶

1. Reading subset of columns or rows¶

import pandas as pd

link = 'http://bit.ly/uforeports'
ufo = pd.read_csv(link)

ufo.columns

Index(['City', 'Colors Reported', 'Shape Reported', 'State', 'Time'], dtype='object')

# reference using String
cols = ['City', 'State']

ufo = pd.read_csv(link, usecols=cols)

ufo.head()

# reference using position (Integer)
cols2 = [0, 4]

ufo = pd.read_csv(link, usecols=cols2)

ufo.head()

# if you only want certain number of rows
ufo = pd.read_csv(link, nrows=3)

ufo

2. Iterating through a Series and DataFrame¶

# intuitive method
for c in ufo.City:
    print(c)

Ithaca
Willingboro
Holyoke

# pandas method
# you can grab index and row
for index, row in ufo.iterrows():
    print(index, row.City, row.State)

0 Ithaca NY
1 Willingboro NJ
2 Holyoke CO

3. Drop non-numeric column in a DataFrame¶

link = 'http://bit.ly/drinksbycountry'
drinks = pd.read_csv(link)

# you have 2 non-numeric columns
drinks.dtypes

country                          object
beer_servings                     int64
spirit_servings                   int64
wine_servings                     int64
total_litres_of_pure_alcohol    float64
continent                        object
dtype: object

import numpy as np
drinks.select_dtypes(include=[np.number]).dtypes

beer_servings                     int64
spirit_servings                   int64
wine_servings                     int64
total_litres_of_pure_alcohol    float64
dtype: object

4. Passing arguments, when to use list or string¶

drinks.describe(include='all')

# here you pass a list
# use shift + tab to know what arguments to pass in
list_include = ['object', 'float64']
drinks.describe(include=list_include)

	City	Time
0	Ithaca	6/1/1930 22:00
1	Willingboro	6/30/1930 20:00
2	Holyoke	2/15/1931 14:00
3	Abilene	6/1/1931 13:00
4	New York Worlds Fair	4/18/1933 19:00

	country	beer_servings	spirit_servings	wine_servings	total_litres_of_pure_alcohol	continent
count	193	193.000000	193.000000	193.000000	193.000000	193
unique	193	NaN	NaN	NaN	NaN	6
top	Bahrain	NaN	NaN	NaN	NaN	Africa
freq	1	NaN	NaN	NaN	NaN	53
mean	NaN	106.160622	80.994819	49.450777	4.717098	NaN
std	NaN	101.143103	88.284312	79.697598	3.773298	NaN
min	NaN	0.000000	0.000000	0.000000	0.000000	NaN
25%	NaN	20.000000	4.000000	1.000000	1.300000	NaN
50%	NaN	76.000000	56.000000	8.000000	4.200000	NaN
75%	NaN	188.000000	128.000000	59.000000	7.200000	NaN
max	NaN	376.000000	438.000000	370.000000	14.400000	NaN

	country	total_litres_of_pure_alcohol	continent
count	193	193.000000	193
unique	193	NaN	6
top	Bahrain	NaN	Africa
freq	1	NaN	53
mean	NaN	4.717098	NaN
std	NaN	3.773298	NaN
min	NaN	0.000000	NaN
25%	NaN	1.300000	NaN
50%	NaN	4.200000	NaN
75%	NaN	7.200000	NaN
max	NaN	14.400000	NaN

	City	Colors Reported	Shape Reported	State	Time
0	Ithaca	NaN	TRIANGLE	NY	6/1/1930 22:00
1	Willingboro	NaN	OTHER	NJ	6/30/1930 20:00
2	Holyoke	NaN	OVAL	CO	2/15/1931 14:00

Examining Dataset

Reading subset of columns or rows, iterating through a Series or DataFrame, dropping all non-numeric columns and passing arguments¶

1. Reading subset of columns or rows¶

2. Iterating through a Series and DataFrame¶

3. Drop non-numeric column in a DataFrame¶

4. Passing arguments, when to use list or string¶