Apply function to Series and DataFrame using .map() and .applymap()
   
    
    
    
    
  
    Applying a function to a pandas Series or DataFrame¶
In [1]:
                import pandas as pd
In [4]:
                url = 'http://bit.ly/kaggletrain'
train = pd.read_csv(url)
train.head(3)
Out[4]:
                        
                    map() function as a Series method
                        
 Mostly used for mapping categorical data to numerical data
In [8]:
                # create new column
train['Sex_num'] = train.Sex.map({'female':0, 'male':1})
In [9]:
                # let's compared Sex and Sex_num columns
# here we can see we map male to 1 and female to 0
train.loc[0:4, ['Sex', 'Sex_num']]
Out[9]:
                        
                    apply() function as a Series method
                        
 Applies a function to each element in the Series
In [10]:
                # say we want to calculate length of string in each string in "Name" column
# create new column
# we are applying Python's len function
train['Name_length'] = train.Name.apply(len)
In [12]:
                # the apply() method applies the function to each element
train.loc[0:4, ['Name', 'Name_length']]
Out[12]:
                        
                    In [16]:
                import numpy as np
# say we look at the "Fare" column and we want to round it up
# we will use numpy's ceil function to round up the numbers
train['Fare_ceil'] = train.Fare.apply(np.ceil)
In [17]:
                train.loc[0:4, ['Fare', 'Fare_ceil']]
Out[17]:
                        
                    In [19]:
                # let's extract last name of each person
# we will use a str method
# now the series is a list of strings
# each cell has 2 strings in a list as you can see below
train.Name.str.split(',').head()
Out[19]:
                        
                    In [22]:
                # we just want the first string from the list
# we create a function to retrieve
def get_element(my_list, position):
    return my_list[position]
In [23]:
                # use our created get_element function
# we pass position=0
train.Name.str.split(',').apply(get_element, position=0).head()
Out[23]:
                        
                    In [25]:
                # instead of above, we can use a lambda function
# input x (the list in this case)
# output x[0] (the first string of the list in this case)
train.Name.str.split(',').apply(lambda x: x[0]).head()
Out[25]:
                        
                    In [27]:
                # getting the second string
train.Name.str.split(',').apply(lambda x: x[1]).head()
Out[27]:
                        
                    apply() function as a DataFrame method
                        
 Applies a function on either axis of the DataFrame
In [30]:
                url = 'http://bit.ly/drinksbycountry'
drinks = pd.read_csv(url)
drinks.head()
Out[30]:
                        
                    In [32]:
                drinks.loc[:, 'beer_servings':'wine_servings'].head()
Out[32]:
                        
                    In [33]:
                # you want apply() method to travel axis=0 (downwards, column) 
# apply Python's max() function
drinks.loc[:, 'beer_servings':'wine_servings'].apply(max, axis=0)
Out[33]:
                        
                    In [34]:
                # you want apply() method to travel axis=1 (right, row) 
# apply Python's max() function
drinks.loc[:, 'beer_servings':'wine_servings'].apply(max, axis=1)
Out[34]:
                        
                    In [35]:
                # finding which column is the maximum's category name
drinks.loc[:, 'beer_servings':'wine_servings'].apply(np.argmax, axis=1)
Out[35]:
                        
                    applymap() as a DataFrame method
                        
 Applies function to every element
In [37]:
                drinks.loc[:, 'beer_servings': 'wine_servings'].applymap(float).head()
Out[37]:
                        
                    In [41]:
                # overwrite existing table
drinks.loc[:, 'beer_servings': 'wine_servings'] = drinks.loc[:, 'beer_servings': 'wine_servings'].applymap(float)
drinks.head()
Out[41]: