Changing Data Type in Pandas
This introduction to pandas is derived from Data School's pandas Q&A with my own notes and code.
Changing data type of a pandas Series¶
In [1]:
import pandas as pd
In [2]:
url = 'http://bit.ly/drinksbycountry'
drinks = pd.read_csv(url)
In [3]:
drinks.head()
Out[3]:
In [5]:
drinks.dtypes
Out[5]:
Data type summary
- 3 integers (int64)
- 1 floating (float64)
- 2 objects (object)
Method 1: Change datatype after reading the csv
In [8]:
# to change use .astype()
drinks['beer_servings'] = drinks.beer_servings.astype(float)
In [10]:
drinks.dtypes
Out[10]:
Method 2: Change datatype before reading the csv
In [11]:
drinks = pd.read_csv(url, dtype={'beer_servings':float})
In [12]:
drinks.dtypes
Out[12]:
In [13]:
url = 'http://bit.ly/chiporders'
orders = pd.read_table(url)
In [14]:
orders.head()
Out[14]:
In [15]:
orders.dtypes
Out[15]:
The issue here is how pandas don't recognize item_price as a floating object
In [18]:
# we use .str to replace and then convert to float
orders['item_price'] = orders.item_price.str.replace('$', '').astype(float)
In [19]:
orders.dtypes
Out[19]:
In [20]:
# we can now calculate the mean
orders.item_price.mean()
Out[20]:
To find out whether a column's row contains a certain string by return True or False
In [22]:
orders['item_name'].str.contains('Chicken').head()
Out[22]:
In [23]:
# convert to binary value
orders['item_name'].str.contains('Chicken').astype(int).head()
Out[23]: