Selecting multiple rows and columns in pandas
This introduction to pandas is derived from Data School's pandas Q&A with my own notes and code.
Selecting multiple rows and columns from a pandas DataFrame¶
- .loc
- .iloc
- .ix
In [1]:
import pandas as pd
In [3]:
url = 'http://bit.ly/uforeports'
ufo = pd.read_csv(url)
In [5]:
# show first 3 shows
ufo.head(3)
Out[5]:
| City | Colors Reported | Shape Reported | State | Time | |
|---|---|---|---|---|---|
| 0 | Ithaca | NaN | TRIANGLE | NY | 6/1/1930 22:00 |
| 1 | Willingboro | NaN | OTHER | NJ | 6/30/1930 20:00 |
| 2 | Holyoke | NaN | OVAL | CO | 2/15/1931 14:00 |
.loc usage
This is a really powerful and flexible method
In [6]:
# .loc DataFrame method
# filtering rows and selecting columns by label
# format
# ufo.loc[rows, columns]
# row 0, all columns
ufo.loc[0, :]
Out[6]:
City Ithaca Colors Reported NaN Shape Reported TRIANGLE State NY Time 6/1/1930 22:00 Name: 0, dtype: object
In [10]:
# rows 0, 1, 2
# all columns
ufo.loc[[0, 1, 2], :]
# more efficient code
ufo.loc[0:2, :]
Out[10]:
| City | Colors Reported | Shape Reported | State | Time | |
|---|---|---|---|---|---|
| 0 | Ithaca | NaN | TRIANGLE | NY | 6/1/1930 22:00 |
| 1 | Willingboro | NaN | OTHER | NJ | 6/30/1930 20:00 |
| 2 | Holyoke | NaN | OVAL | CO | 2/15/1931 14:00 |
In [12]:
# if you leave off ", :" pandas would assume it's there
# but you should leave it there to improve code readability
ufo.loc[0:2]
Out[12]:
| City | Colors Reported | Shape Reported | State | Time | |
|---|---|---|---|---|---|
| 0 | Ithaca | NaN | TRIANGLE | NY | 6/1/1930 22:00 |
| 1 | Willingboro | NaN | OTHER | NJ | 6/30/1930 20:00 |
| 2 | Holyoke | NaN | OVAL | CO | 2/15/1931 14:00 |
In [13]:
# all rows
# column: City
ufo.loc[:, 'City']
Out[13]:
0 Ithaca
1 Willingboro
2 Holyoke
3 Abilene
4 New York Worlds Fair
5 Valley City
6 Crater Lake
7 Alma
8 Eklutna
9 Hubbard
10 Fontana
11 Waterloo
12 Belton
13 Keokuk
14 Ludington
15 Forest Home
16 Los Angeles
17 Hapeville
18 Oneida
19 Bering Sea
20 Nebraska
21 NaN
22 NaN
23 Owensboro
24 Wilderness
25 San Diego
26 Wilderness
27 Clovis
28 Los Alamos
29 Ft. Duschene
...
18211 Holyoke
18212 Carson
18213 Pasadena
18214 Austin
18215 El Campo
18216 Garden Grove
18217 Berthoud Pass
18218 Sisterdale
18219 Garden Grove
18220 Shasta Lake
18221 Franklin
18222 Albrightsville
18223 Greenville
18224 Eufaula
18225 Simi Valley
18226 San Francisco
18227 San Francisco
18228 Kingsville
18229 Chicago
18230 Pismo Beach
18231 Pismo Beach
18232 Lodi
18233 Anchorage
18234 Capitola
18235 Fountain Hills
18236 Grant Park
18237 Spirit Lake
18238 Eagle River
18239 Eagle River
18240 Ybor
Name: City, dtype: object
In [15]:
# all rows
# column: City, State
ufo.loc[:, ['City', 'State']]
# similar code for City through State
ufo.loc[:, 'City':'State']
Out[15]:
| City | Colors Reported | Shape Reported | State | |
|---|---|---|---|---|
| 0 | Ithaca | NaN | TRIANGLE | NY |
| 1 | Willingboro | NaN | OTHER | NJ |
| 2 | Holyoke | NaN | OVAL | CO |
| 3 | Abilene | NaN | DISK | KS |
| 4 | New York Worlds Fair | NaN | LIGHT | NY |
| 5 | Valley City | NaN | DISK | ND |
| 6 | Crater Lake | NaN | CIRCLE | CA |
| 7 | Alma | NaN | DISK | MI |
| 8 | Eklutna | NaN | CIGAR | AK |
| 9 | Hubbard | NaN | CYLINDER | OR |
| 10 | Fontana | NaN | LIGHT | CA |
| 11 | Waterloo | NaN | FIREBALL | AL |
| 12 | Belton | RED | SPHERE | SC |
| 13 | Keokuk | NaN | OVAL | IA |
| 14 | Ludington | NaN | DISK | MI |
| 15 | Forest Home | NaN | CIRCLE | CA |
| 16 | Los Angeles | NaN | NaN | CA |
| 17 | Hapeville | NaN | NaN | GA |
| 18 | Oneida | NaN | RECTANGLE | TN |
| 19 | Bering Sea | RED | OTHER | AK |
| 20 | Nebraska | NaN | DISK | NE |
| 21 | NaN | NaN | NaN | LA |
| 22 | NaN | NaN | LIGHT | LA |
| 23 | Owensboro | NaN | RECTANGLE | KY |
| 24 | Wilderness | NaN | DISK | WV |
| 25 | San Diego | NaN | CIGAR | CA |
| 26 | Wilderness | NaN | DISK | WV |
| 27 | Clovis | NaN | DISK | NM |
| 28 | Los Alamos | NaN | DISK | NM |
| 29 | Ft. Duschene | NaN | DISK | UT |
| ... | ... | ... | ... | ... |
| 18211 | Holyoke | NaN | DIAMOND | MA |
| 18212 | Carson | NaN | DISK | CA |
| 18213 | Pasadena | GREEN | FIREBALL | CA |
| 18214 | Austin | NaN | FORMATION | TX |
| 18215 | El Campo | NaN | OTHER | TX |
| 18216 | Garden Grove | ORANGE | LIGHT | CA |
| 18217 | Berthoud Pass | NaN | TRIANGLE | CO |
| 18218 | Sisterdale | NaN | DIAMOND | TX |
| 18219 | Garden Grove | NaN | CHEVRON | CA |
| 18220 | Shasta Lake | BLUE | DISK | CA |
| 18221 | Franklin | NaN | DISK | NH |
| 18222 | Albrightsville | NaN | OTHER | PA |
| 18223 | Greenville | NaN | NaN | SC |
| 18224 | Eufaula | NaN | DISK | OK |
| 18225 | Simi Valley | NaN | FORMATION | CA |
| 18226 | San Francisco | NaN | FORMATION | CA |
| 18227 | San Francisco | NaN | TRIANGLE | CA |
| 18228 | Kingsville | NaN | LIGHT | TX |
| 18229 | Chicago | NaN | DISK | IL |
| 18230 | Pismo Beach | NaN | OVAL | CA |
| 18231 | Pismo Beach | NaN | OVAL | CA |
| 18232 | Lodi | NaN | NaN | WI |
| 18233 | Anchorage | RED | VARIOUS | AK |
| 18234 | Capitola | NaN | TRIANGLE | CA |
| 18235 | Fountain Hills | NaN | NaN | AZ |
| 18236 | Grant Park | NaN | TRIANGLE | IL |
| 18237 | Spirit Lake | NaN | DISK | IA |
| 18238 | Eagle River | NaN | NaN | WI |
| 18239 | Eagle River | RED | LIGHT | WI |
| 18240 | Ybor | NaN | OVAL | FL |
18241 rows × 4 columns
In [17]:
# multiple rows and multiple columns
ufo.loc[0:2, 'City':'State']
Out[17]:
| City | Colors Reported | Shape Reported | State | |
|---|---|---|---|---|
| 0 | Ithaca | NaN | TRIANGLE | NY |
| 1 | Willingboro | NaN | OTHER | NJ |
| 2 | Holyoke | NaN | OVAL | CO |
In [18]:
# filter using City=='Oakland'
ufo[ufo.City=='Oakland']
Out[18]:
| City | Colors Reported | Shape Reported | State | Time | |
|---|---|---|---|---|---|
| 1694 | Oakland | NaN | CIGAR | CA | 7/21/1968 14:00 |
| 2144 | Oakland | NaN | DISK | CA | 8/19/1971 0:00 |
| 4686 | Oakland | NaN | LIGHT | MD | 6/1/1982 0:00 |
| 7293 | Oakland | NaN | LIGHT | CA | 3/28/1994 17:00 |
| 8488 | Oakland | NaN | NaN | CA | 8/10/1995 21:45 |
| 8768 | Oakland | NaN | NaN | CA | 10/10/1995 22:40 |
| 10816 | Oakland | NaN | LIGHT | OR | 10/1/1997 21:30 |
| 10948 | Oakland | NaN | DISK | CA | 11/14/1997 19:55 |
| 11045 | Oakland | NaN | TRIANGLE | CA | 12/10/1997 1:30 |
| 12322 | Oakland | NaN | FIREBALL | CA | 10/9/1998 19:40 |
| 12941 | Oakland | NaN | CYLINDER | CA | 1/23/1999 21:30 |
| 16803 | Oakland | NaN | TRIANGLE | MD | 7/4/2000 23:00 |
| 17322 | Oakland | NaN | CYLINDER | CA | 9/1/2000 21:35 |
In [20]:
# easier-to-read code
# here you specify the rows and columns you want
# ufo.loc[rows, columns]
ufo.loc[ufo.City=='Oakland', :]
Out[20]:
| City | Colors Reported | Shape Reported | State | Time | |
|---|---|---|---|---|---|
| 1694 | Oakland | NaN | CIGAR | CA | 7/21/1968 14:00 |
| 2144 | Oakland | NaN | DISK | CA | 8/19/1971 0:00 |
| 4686 | Oakland | NaN | LIGHT | MD | 6/1/1982 0:00 |
| 7293 | Oakland | NaN | LIGHT | CA | 3/28/1994 17:00 |
| 8488 | Oakland | NaN | NaN | CA | 8/10/1995 21:45 |
| 8768 | Oakland | NaN | NaN | CA | 10/10/1995 22:40 |
| 10816 | Oakland | NaN | LIGHT | OR | 10/1/1997 21:30 |
| 10948 | Oakland | NaN | DISK | CA | 11/14/1997 19:55 |
| 11045 | Oakland | NaN | TRIANGLE | CA | 12/10/1997 1:30 |
| 12322 | Oakland | NaN | FIREBALL | CA | 10/9/1998 19:40 |
| 12941 | Oakland | NaN | CYLINDER | CA | 1/23/1999 21:30 |
| 16803 | Oakland | NaN | TRIANGLE | MD | 7/4/2000 23:00 |
| 17322 | Oakland | NaN | CYLINDER | CA | 9/1/2000 21:35 |
In [21]:
# again, specifying the rows and columns you want
# this would be the best way to do it compared to chain indexing
ufo.loc[ufo.City=='Oakland', 'State']
Out[21]:
1694 CA 2144 CA 4686 MD 7293 CA 8488 CA 8768 CA 10816 OR 10948 CA 11045 CA 12322 CA 12941 CA 16803 MD 17322 CA Name: State, dtype: object
In [24]:
# chain indexing
# there may be issues in some cases
# try not to use this
ufo[ufo.City=='Oakland'].State
Out[24]:
1694 CA 2144 CA 4686 MD 7293 CA 8488 CA 8768 CA 10816 OR 10948 CA 11045 CA 12322 CA 12941 CA 16803 MD 17322 CA Name: State, dtype: object
iloc usage
In [25]:
ufo.iloc[:, [0, 3]]
Out[25]:
| City | State | |
|---|---|---|
| 0 | Ithaca | NY |
| 1 | Willingboro | NJ |
| 2 | Holyoke | CO |
| 3 | Abilene | KS |
| 4 | New York Worlds Fair | NY |
| 5 | Valley City | ND |
| 6 | Crater Lake | CA |
| 7 | Alma | MI |
| 8 | Eklutna | AK |
| 9 | Hubbard | OR |
| 10 | Fontana | CA |
| 11 | Waterloo | AL |
| 12 | Belton | SC |
| 13 | Keokuk | IA |
| 14 | Ludington | MI |
| 15 | Forest Home | CA |
| 16 | Los Angeles | CA |
| 17 | Hapeville | GA |
| 18 | Oneida | TN |
| 19 | Bering Sea | AK |
| 20 | Nebraska | NE |
| 21 | NaN | LA |
| 22 | NaN | LA |
| 23 | Owensboro | KY |
| 24 | Wilderness | WV |
| 25 | San Diego | CA |
| 26 | Wilderness | WV |
| 27 | Clovis | NM |
| 28 | Los Alamos | NM |
| 29 | Ft. Duschene | UT |
| ... | ... | ... |
| 18211 | Holyoke | MA |
| 18212 | Carson | CA |
| 18213 | Pasadena | CA |
| 18214 | Austin | TX |
| 18215 | El Campo | TX |
| 18216 | Garden Grove | CA |
| 18217 | Berthoud Pass | CO |
| 18218 | Sisterdale | TX |
| 18219 | Garden Grove | CA |
| 18220 | Shasta Lake | CA |
| 18221 | Franklin | NH |
| 18222 | Albrightsville | PA |
| 18223 | Greenville | SC |
| 18224 | Eufaula | OK |
| 18225 | Simi Valley | CA |
| 18226 | San Francisco | CA |
| 18227 | San Francisco | CA |
| 18228 | Kingsville | TX |
| 18229 | Chicago | IL |
| 18230 | Pismo Beach | CA |
| 18231 | Pismo Beach | CA |
| 18232 | Lodi | WI |
| 18233 | Anchorage | AK |
| 18234 | Capitola | CA |
| 18235 | Fountain Hills | AZ |
| 18236 | Grant Park | IL |
| 18237 | Spirit Lake | IA |
| 18238 | Eagle River | WI |
| 18239 | Eagle River | WI |
| 18240 | Ybor | FL |
18241 rows × 2 columns
In [28]:
# iloc excludes 4 (compared to loc where it includes 4)
# iloc includes 0
ufo.iloc[:, 0:4]
Out[28]:
| City | Colors Reported | Shape Reported | State | |
|---|---|---|---|---|
| 0 | Ithaca | NaN | TRIANGLE | NY |
| 1 | Willingboro | NaN | OTHER | NJ |
| 2 | Holyoke | NaN | OVAL | CO |
| 3 | Abilene | NaN | DISK | KS |
| 4 | New York Worlds Fair | NaN | LIGHT | NY |
| 5 | Valley City | NaN | DISK | ND |
| 6 | Crater Lake | NaN | CIRCLE | CA |
| 7 | Alma | NaN | DISK | MI |
| 8 | Eklutna | NaN | CIGAR | AK |
| 9 | Hubbard | NaN | CYLINDER | OR |
| 10 | Fontana | NaN | LIGHT | CA |
| 11 | Waterloo | NaN | FIREBALL | AL |
| 12 | Belton | RED | SPHERE | SC |
| 13 | Keokuk | NaN | OVAL | IA |
| 14 | Ludington | NaN | DISK | MI |
| 15 | Forest Home | NaN | CIRCLE | CA |
| 16 | Los Angeles | NaN | NaN | CA |
| 17 | Hapeville | NaN | NaN | GA |
| 18 | Oneida | NaN | RECTANGLE | TN |
| 19 | Bering Sea | RED | OTHER | AK |
| 20 | Nebraska | NaN | DISK | NE |
| 21 | NaN | NaN | NaN | LA |
| 22 | NaN | NaN | LIGHT | LA |
| 23 | Owensboro | NaN | RECTANGLE | KY |
| 24 | Wilderness | NaN | DISK | WV |
| 25 | San Diego | NaN | CIGAR | CA |
| 26 | Wilderness | NaN | DISK | WV |
| 27 | Clovis | NaN | DISK | NM |
| 28 | Los Alamos | NaN | DISK | NM |
| 29 | Ft. Duschene | NaN | DISK | UT |
| ... | ... | ... | ... | ... |
| 18211 | Holyoke | NaN | DIAMOND | MA |
| 18212 | Carson | NaN | DISK | CA |
| 18213 | Pasadena | GREEN | FIREBALL | CA |
| 18214 | Austin | NaN | FORMATION | TX |
| 18215 | El Campo | NaN | OTHER | TX |
| 18216 | Garden Grove | ORANGE | LIGHT | CA |
| 18217 | Berthoud Pass | NaN | TRIANGLE | CO |
| 18218 | Sisterdale | NaN | DIAMOND | TX |
| 18219 | Garden Grove | NaN | CHEVRON | CA |
| 18220 | Shasta Lake | BLUE | DISK | CA |
| 18221 | Franklin | NaN | DISK | NH |
| 18222 | Albrightsville | NaN | OTHER | PA |
| 18223 | Greenville | NaN | NaN | SC |
| 18224 | Eufaula | NaN | DISK | OK |
| 18225 | Simi Valley | NaN | FORMATION | CA |
| 18226 | San Francisco | NaN | FORMATION | CA |
| 18227 | San Francisco | NaN | TRIANGLE | CA |
| 18228 | Kingsville | NaN | LIGHT | TX |
| 18229 | Chicago | NaN | DISK | IL |
| 18230 | Pismo Beach | NaN | OVAL | CA |
| 18231 | Pismo Beach | NaN | OVAL | CA |
| 18232 | Lodi | NaN | NaN | WI |
| 18233 | Anchorage | RED | VARIOUS | AK |
| 18234 | Capitola | NaN | TRIANGLE | CA |
| 18235 | Fountain Hills | NaN | NaN | AZ |
| 18236 | Grant Park | NaN | TRIANGLE | IL |
| 18237 | Spirit Lake | NaN | DISK | IA |
| 18238 | Eagle River | NaN | NaN | WI |
| 18239 | Eagle River | RED | LIGHT | WI |
| 18240 | Ybor | NaN | OVAL | FL |
18241 rows × 4 columns
In [31]:
# this is the major difference
# exclusive of 3
ufo.iloc[0:3, :]
Out[31]:
| City | Colors Reported | Shape Reported | State | Time | |
|---|---|---|---|---|---|
| 0 | Ithaca | NaN | TRIANGLE | NY | 6/1/1930 22:00 |
| 1 | Willingboro | NaN | OTHER | NJ | 6/30/1930 20:00 |
| 2 | Holyoke | NaN | OVAL | CO | 2/15/1931 14:00 |
In [38]:
# non-explicit code
ufo[['City', 'State']]
# explicit code
ufo.loc[:, ['City', 'State']]
Out[38]:
| City | State | |
|---|---|---|
| 0 | Ithaca | NY |
| 1 | Willingboro | NJ |
| 2 | Holyoke | CO |
| 3 | Abilene | KS |
| 4 | New York Worlds Fair | NY |
| 5 | Valley City | ND |
| 6 | Crater Lake | CA |
| 7 | Alma | MI |
| 8 | Eklutna | AK |
| 9 | Hubbard | OR |
| 10 | Fontana | CA |
| 11 | Waterloo | AL |
| 12 | Belton | SC |
| 13 | Keokuk | IA |
| 14 | Ludington | MI |
| 15 | Forest Home | CA |
| 16 | Los Angeles | CA |
| 17 | Hapeville | GA |
| 18 | Oneida | TN |
| 19 | Bering Sea | AK |
| 20 | Nebraska | NE |
| 21 | NaN | LA |
| 22 | NaN | LA |
| 23 | Owensboro | KY |
| 24 | Wilderness | WV |
| 25 | San Diego | CA |
| 26 | Wilderness | WV |
| 27 | Clovis | NM |
| 28 | Los Alamos | NM |
| 29 | Ft. Duschene | UT |
| ... | ... | ... |
| 18211 | Holyoke | MA |
| 18212 | Carson | CA |
| 18213 | Pasadena | CA |
| 18214 | Austin | TX |
| 18215 | El Campo | TX |
| 18216 | Garden Grove | CA |
| 18217 | Berthoud Pass | CO |
| 18218 | Sisterdale | TX |
| 18219 | Garden Grove | CA |
| 18220 | Shasta Lake | CA |
| 18221 | Franklin | NH |
| 18222 | Albrightsville | PA |
| 18223 | Greenville | SC |
| 18224 | Eufaula | OK |
| 18225 | Simi Valley | CA |
| 18226 | San Francisco | CA |
| 18227 | San Francisco | CA |
| 18228 | Kingsville | TX |
| 18229 | Chicago | IL |
| 18230 | Pismo Beach | CA |
| 18231 | Pismo Beach | CA |
| 18232 | Lodi | WI |
| 18233 | Anchorage | AK |
| 18234 | Capitola | CA |
| 18235 | Fountain Hills | AZ |
| 18236 | Grant Park | IL |
| 18237 | Spirit Lake | IA |
| 18238 | Eagle River | WI |
| 18239 | Eagle River | WI |
| 18240 | Ybor | FL |
18241 rows × 2 columns
In [40]:
# ambiguous code again, are we referring to rows or columns?
ufo[0:2]
# use iloc!
ufo.iloc[0:2, :]
Out[40]:
| City | Colors Reported | Shape Reported | State | Time | |
|---|---|---|---|---|---|
| 0 | Ithaca | NaN | TRIANGLE | NY | 6/1/1930 22:00 |
| 1 | Willingboro | NaN | OTHER | NJ | 6/30/1930 20:00 |
.ix usage
Mix labels and integers when using selection.
In [41]:
drinks_url = 'http://bit.ly/drinksbycountry'
drinks = pd.read_csv(drinks_url, index_col='country')
In [42]:
drinks.head()
Out[42]:
| beer_servings | spirit_servings | wine_servings | total_litres_of_pure_alcohol | continent | |
|---|---|---|---|---|---|
| country | |||||
| Afghanistan | 0 | 0 | 0 | 0.0 | Asia |
| Albania | 89 | 132 | 54 | 4.9 | Europe |
| Algeria | 25 | 0 | 14 | 0.7 | Africa |
| Andorra | 245 | 138 | 312 | 12.4 | Europe |
| Angola | 217 | 57 | 45 | 5.9 | Africa |
In [43]:
drinks.ix['Albania', 0]
Out[43]:
89
In [44]:
drinks.ix[1, 'beer_servings']
Out[44]:
89
In [46]:
# for .ix, columns are exclusive of 2
drinks.ix['Albania':'Andorra', 0:2]
Out[46]:
| beer_servings | spirit_servings | |
|---|---|---|
| country | ||
| Albania | 89 | 132 |
| Algeria | 25 | 0 |
| Andorra | 245 | 138 |
In [48]:
# for rows, .ix is inclusive from start to end
# for columns, .ix is exclusive of end but inclusive of start
ufo.ix[0:2, 0:2]
Out[48]:
| City | Colors Reported | |
|---|---|---|
| 0 | Ithaca | NaN |
| 1 | Willingboro | NaN |
| 2 | Holyoke | NaN |
Tags:
pandas