Using Pandas groupby
This introduction to pandas is derived from Data School's pandas Q&A with my own notes and code.
Using "groupby" in pandas¶
In [1]:
import pandas as pd
In [2]:
url = 'http://bit.ly/drinksbycountry'
drinks = pd.read_csv(url)
In [3]:
drinks.head()
Out[3]:
In [5]:
# get mean of the beer_servings' column
drinks.beer_servings.mean()
Out[5]:
In [6]:
# using .groupby
drinks.groupby('continent').beer_servings.mean()
Out[6]:
In [9]:
# here we are accessing all of Africa in the column "continent
drinks[drinks.continent=='Africa'].head()
Out[9]:
In [10]:
drinks[drinks.continent=='Africa'].mean()
Out[10]:
In [11]:
drinks[drinks.continent=='Africa'].beer_servings.mean()
Out[11]:
In [14]:
drinks[drinks.continent=='Europe'].beer_servings.mean()
Out[14]:
This is the same as the number given when we used .groupby
- This is because we are grouping beer_servings by the continent
.groupby max and min
In [15]:
drinks.groupby('continent').beer_servings.max()
Out[15]:
In [16]:
drinks.groupby('continent').beer_servings.min()
Out[16]:
Aggregate findings
In [18]:
drinks.groupby('continent').beer_servings.agg(['count', 'min', 'max', 'mean'])
Out[18]:
You can get mean of all numeric columns instead of specifying beer_servings
In [19]:
drinks.groupby('continent').mean()
Out[19]:
Visualization
In [20]:
# allow plots to appear in notebook using matplotlib
%matplotlib inline
In [24]:
data = drinks.groupby('continent').mean()
data
Out[24]:
In [22]:
data.plot(kind='bar')
Out[22]: