Apply function to Series and DataFrame using .map() and .applymap()
Applying a function to a pandas Series or DataFrame¶
In [1]:
import pandas as pd
In [4]:
url = 'http://bit.ly/kaggletrain'
train = pd.read_csv(url)
train.head(3)
Out[4]:
map() function as a Series method
Mostly used for mapping categorical data to numerical data
In [8]:
# create new column
train['Sex_num'] = train.Sex.map({'female':0, 'male':1})
In [9]:
# let's compared Sex and Sex_num columns
# here we can see we map male to 1 and female to 0
train.loc[0:4, ['Sex', 'Sex_num']]
Out[9]:
apply() function as a Series method
Applies a function to each element in the Series
In [10]:
# say we want to calculate length of string in each string in "Name" column
# create new column
# we are applying Python's len function
train['Name_length'] = train.Name.apply(len)
In [12]:
# the apply() method applies the function to each element
train.loc[0:4, ['Name', 'Name_length']]
Out[12]:
In [16]:
import numpy as np
# say we look at the "Fare" column and we want to round it up
# we will use numpy's ceil function to round up the numbers
train['Fare_ceil'] = train.Fare.apply(np.ceil)
In [17]:
train.loc[0:4, ['Fare', 'Fare_ceil']]
Out[17]:
In [19]:
# let's extract last name of each person
# we will use a str method
# now the series is a list of strings
# each cell has 2 strings in a list as you can see below
train.Name.str.split(',').head()
Out[19]:
In [22]:
# we just want the first string from the list
# we create a function to retrieve
def get_element(my_list, position):
return my_list[position]
In [23]:
# use our created get_element function
# we pass position=0
train.Name.str.split(',').apply(get_element, position=0).head()
Out[23]:
In [25]:
# instead of above, we can use a lambda function
# input x (the list in this case)
# output x[0] (the first string of the list in this case)
train.Name.str.split(',').apply(lambda x: x[0]).head()
Out[25]:
In [27]:
# getting the second string
train.Name.str.split(',').apply(lambda x: x[1]).head()
Out[27]:
apply() function as a DataFrame method
Applies a function on either axis of the DataFrame
In [30]:
url = 'http://bit.ly/drinksbycountry'
drinks = pd.read_csv(url)
drinks.head()
Out[30]:
In [32]:
drinks.loc[:, 'beer_servings':'wine_servings'].head()
Out[32]:
In [33]:
# you want apply() method to travel axis=0 (downwards, column)
# apply Python's max() function
drinks.loc[:, 'beer_servings':'wine_servings'].apply(max, axis=0)
Out[33]:
In [34]:
# you want apply() method to travel axis=1 (right, row)
# apply Python's max() function
drinks.loc[:, 'beer_servings':'wine_servings'].apply(max, axis=1)
Out[34]:
In [35]:
# finding which column is the maximum's category name
drinks.loc[:, 'beer_servings':'wine_servings'].apply(np.argmax, axis=1)
Out[35]:
applymap() as a DataFrame method
Applies function to every element
In [37]:
drinks.loc[:, 'beer_servings': 'wine_servings'].applymap(float).head()
Out[37]:
In [41]:
# overwrite existing table
drinks.loc[:, 'beer_servings': 'wine_servings'] = drinks.loc[:, 'beer_servings': 'wine_servings'].applymap(float)
drinks.head()
Out[41]: