Selecting and manipulating pandas series
This introduction to pandas is derived from Data School's pandas Q&A with my own notes and code.
Selecting a pandas Series from a DataFrame¶
What is a series
- It is a m x 1 vector
- m is the number of rows
- 1 is the number of columns
- Each column in DataFrame is known as a pandas series
In [1]:
import pandas as pd
In [11]:
# The csv file is separated by commas
url = 'http://bit.ly/uforeports'
# method 1: read_table
ufo = pd.read_table(url, sep=',')
# method 2: read_csv
# this is a short-cut here using read_csv because it uses comma as the default separator
ufo = pd.read_csv(url)
ufo.head()
Out[11]:
In [15]:
# Method 1: Selecting City series (this will always work)
ufo['City']
# Method 2: Selecting City series
ufo.City
# 'City' is case-sensitive, you cannot use 'city'
Out[15]:
In [10]:
# confirm type
type(ufo['City'])
type(ufo.City)
Out[10]:
How do you select a column name with spacing between words?
- You cannot use method 2 (ufo.category_name)
- You have to use method 1 (ufo['category name'])
In [16]:
ufo['Colors Reported']
Out[16]:
How do I create a new panda Series in a DataFrame?
In [18]:
# example of concatenating strings
'ab' + 'cd'
Out[18]:
In [22]:
# created a new column called "Location" with a concatenation of "City" and "State"
ufo['Location'] = ufo.City + ', ' + ufo.State
ufo.head()
Out[22]: