Selecting multiple rows and columns in pandas
This introduction to pandas is derived from Data School's pandas Q&A with my own notes and code.
Selecting multiple rows and columns from a pandas DataFrame¶
- .loc
- .iloc
- .ix
In [1]:
import pandas as pd
In [3]:
url = 'http://bit.ly/uforeports'
ufo = pd.read_csv(url)
In [5]:
# show first 3 shows
ufo.head(3)
Out[5]:
City | Colors Reported | Shape Reported | State | Time | |
---|---|---|---|---|---|
0 | Ithaca | NaN | TRIANGLE | NY | 6/1/1930 22:00 |
1 | Willingboro | NaN | OTHER | NJ | 6/30/1930 20:00 |
2 | Holyoke | NaN | OVAL | CO | 2/15/1931 14:00 |
.loc usage
This is a really powerful and flexible method
In [6]:
# .loc DataFrame method
# filtering rows and selecting columns by label
# format
# ufo.loc[rows, columns]
# row 0, all columns
ufo.loc[0, :]
Out[6]:
City Ithaca Colors Reported NaN Shape Reported TRIANGLE State NY Time 6/1/1930 22:00 Name: 0, dtype: object
In [10]:
# rows 0, 1, 2
# all columns
ufo.loc[[0, 1, 2], :]
# more efficient code
ufo.loc[0:2, :]
Out[10]:
City | Colors Reported | Shape Reported | State | Time | |
---|---|---|---|---|---|
0 | Ithaca | NaN | TRIANGLE | NY | 6/1/1930 22:00 |
1 | Willingboro | NaN | OTHER | NJ | 6/30/1930 20:00 |
2 | Holyoke | NaN | OVAL | CO | 2/15/1931 14:00 |
In [12]:
# if you leave off ", :" pandas would assume it's there
# but you should leave it there to improve code readability
ufo.loc[0:2]
Out[12]:
City | Colors Reported | Shape Reported | State | Time | |
---|---|---|---|---|---|
0 | Ithaca | NaN | TRIANGLE | NY | 6/1/1930 22:00 |
1 | Willingboro | NaN | OTHER | NJ | 6/30/1930 20:00 |
2 | Holyoke | NaN | OVAL | CO | 2/15/1931 14:00 |
In [13]:
# all rows
# column: City
ufo.loc[:, 'City']
Out[13]:
0 Ithaca 1 Willingboro 2 Holyoke 3 Abilene 4 New York Worlds Fair 5 Valley City 6 Crater Lake 7 Alma 8 Eklutna 9 Hubbard 10 Fontana 11 Waterloo 12 Belton 13 Keokuk 14 Ludington 15 Forest Home 16 Los Angeles 17 Hapeville 18 Oneida 19 Bering Sea 20 Nebraska 21 NaN 22 NaN 23 Owensboro 24 Wilderness 25 San Diego 26 Wilderness 27 Clovis 28 Los Alamos 29 Ft. Duschene ... 18211 Holyoke 18212 Carson 18213 Pasadena 18214 Austin 18215 El Campo 18216 Garden Grove 18217 Berthoud Pass 18218 Sisterdale 18219 Garden Grove 18220 Shasta Lake 18221 Franklin 18222 Albrightsville 18223 Greenville 18224 Eufaula 18225 Simi Valley 18226 San Francisco 18227 San Francisco 18228 Kingsville 18229 Chicago 18230 Pismo Beach 18231 Pismo Beach 18232 Lodi 18233 Anchorage 18234 Capitola 18235 Fountain Hills 18236 Grant Park 18237 Spirit Lake 18238 Eagle River 18239 Eagle River 18240 Ybor Name: City, dtype: object
In [15]:
# all rows
# column: City, State
ufo.loc[:, ['City', 'State']]
# similar code for City through State
ufo.loc[:, 'City':'State']
Out[15]:
City | Colors Reported | Shape Reported | State | |
---|---|---|---|---|
0 | Ithaca | NaN | TRIANGLE | NY |
1 | Willingboro | NaN | OTHER | NJ |
2 | Holyoke | NaN | OVAL | CO |
3 | Abilene | NaN | DISK | KS |
4 | New York Worlds Fair | NaN | LIGHT | NY |
5 | Valley City | NaN | DISK | ND |
6 | Crater Lake | NaN | CIRCLE | CA |
7 | Alma | NaN | DISK | MI |
8 | Eklutna | NaN | CIGAR | AK |
9 | Hubbard | NaN | CYLINDER | OR |
10 | Fontana | NaN | LIGHT | CA |
11 | Waterloo | NaN | FIREBALL | AL |
12 | Belton | RED | SPHERE | SC |
13 | Keokuk | NaN | OVAL | IA |
14 | Ludington | NaN | DISK | MI |
15 | Forest Home | NaN | CIRCLE | CA |
16 | Los Angeles | NaN | NaN | CA |
17 | Hapeville | NaN | NaN | GA |
18 | Oneida | NaN | RECTANGLE | TN |
19 | Bering Sea | RED | OTHER | AK |
20 | Nebraska | NaN | DISK | NE |
21 | NaN | NaN | NaN | LA |
22 | NaN | NaN | LIGHT | LA |
23 | Owensboro | NaN | RECTANGLE | KY |
24 | Wilderness | NaN | DISK | WV |
25 | San Diego | NaN | CIGAR | CA |
26 | Wilderness | NaN | DISK | WV |
27 | Clovis | NaN | DISK | NM |
28 | Los Alamos | NaN | DISK | NM |
29 | Ft. Duschene | NaN | DISK | UT |
... | ... | ... | ... | ... |
18211 | Holyoke | NaN | DIAMOND | MA |
18212 | Carson | NaN | DISK | CA |
18213 | Pasadena | GREEN | FIREBALL | CA |
18214 | Austin | NaN | FORMATION | TX |
18215 | El Campo | NaN | OTHER | TX |
18216 | Garden Grove | ORANGE | LIGHT | CA |
18217 | Berthoud Pass | NaN | TRIANGLE | CO |
18218 | Sisterdale | NaN | DIAMOND | TX |
18219 | Garden Grove | NaN | CHEVRON | CA |
18220 | Shasta Lake | BLUE | DISK | CA |
18221 | Franklin | NaN | DISK | NH |
18222 | Albrightsville | NaN | OTHER | PA |
18223 | Greenville | NaN | NaN | SC |
18224 | Eufaula | NaN | DISK | OK |
18225 | Simi Valley | NaN | FORMATION | CA |
18226 | San Francisco | NaN | FORMATION | CA |
18227 | San Francisco | NaN | TRIANGLE | CA |
18228 | Kingsville | NaN | LIGHT | TX |
18229 | Chicago | NaN | DISK | IL |
18230 | Pismo Beach | NaN | OVAL | CA |
18231 | Pismo Beach | NaN | OVAL | CA |
18232 | Lodi | NaN | NaN | WI |
18233 | Anchorage | RED | VARIOUS | AK |
18234 | Capitola | NaN | TRIANGLE | CA |
18235 | Fountain Hills | NaN | NaN | AZ |
18236 | Grant Park | NaN | TRIANGLE | IL |
18237 | Spirit Lake | NaN | DISK | IA |
18238 | Eagle River | NaN | NaN | WI |
18239 | Eagle River | RED | LIGHT | WI |
18240 | Ybor | NaN | OVAL | FL |
18241 rows × 4 columns
In [17]:
# multiple rows and multiple columns
ufo.loc[0:2, 'City':'State']
Out[17]:
City | Colors Reported | Shape Reported | State | |
---|---|---|---|---|
0 | Ithaca | NaN | TRIANGLE | NY |
1 | Willingboro | NaN | OTHER | NJ |
2 | Holyoke | NaN | OVAL | CO |
In [18]:
# filter using City=='Oakland'
ufo[ufo.City=='Oakland']
Out[18]:
City | Colors Reported | Shape Reported | State | Time | |
---|---|---|---|---|---|
1694 | Oakland | NaN | CIGAR | CA | 7/21/1968 14:00 |
2144 | Oakland | NaN | DISK | CA | 8/19/1971 0:00 |
4686 | Oakland | NaN | LIGHT | MD | 6/1/1982 0:00 |
7293 | Oakland | NaN | LIGHT | CA | 3/28/1994 17:00 |
8488 | Oakland | NaN | NaN | CA | 8/10/1995 21:45 |
8768 | Oakland | NaN | NaN | CA | 10/10/1995 22:40 |
10816 | Oakland | NaN | LIGHT | OR | 10/1/1997 21:30 |
10948 | Oakland | NaN | DISK | CA | 11/14/1997 19:55 |
11045 | Oakland | NaN | TRIANGLE | CA | 12/10/1997 1:30 |
12322 | Oakland | NaN | FIREBALL | CA | 10/9/1998 19:40 |
12941 | Oakland | NaN | CYLINDER | CA | 1/23/1999 21:30 |
16803 | Oakland | NaN | TRIANGLE | MD | 7/4/2000 23:00 |
17322 | Oakland | NaN | CYLINDER | CA | 9/1/2000 21:35 |
In [20]:
# easier-to-read code
# here you specify the rows and columns you want
# ufo.loc[rows, columns]
ufo.loc[ufo.City=='Oakland', :]
Out[20]:
City | Colors Reported | Shape Reported | State | Time | |
---|---|---|---|---|---|
1694 | Oakland | NaN | CIGAR | CA | 7/21/1968 14:00 |
2144 | Oakland | NaN | DISK | CA | 8/19/1971 0:00 |
4686 | Oakland | NaN | LIGHT | MD | 6/1/1982 0:00 |
7293 | Oakland | NaN | LIGHT | CA | 3/28/1994 17:00 |
8488 | Oakland | NaN | NaN | CA | 8/10/1995 21:45 |
8768 | Oakland | NaN | NaN | CA | 10/10/1995 22:40 |
10816 | Oakland | NaN | LIGHT | OR | 10/1/1997 21:30 |
10948 | Oakland | NaN | DISK | CA | 11/14/1997 19:55 |
11045 | Oakland | NaN | TRIANGLE | CA | 12/10/1997 1:30 |
12322 | Oakland | NaN | FIREBALL | CA | 10/9/1998 19:40 |
12941 | Oakland | NaN | CYLINDER | CA | 1/23/1999 21:30 |
16803 | Oakland | NaN | TRIANGLE | MD | 7/4/2000 23:00 |
17322 | Oakland | NaN | CYLINDER | CA | 9/1/2000 21:35 |
In [21]:
# again, specifying the rows and columns you want
# this would be the best way to do it compared to chain indexing
ufo.loc[ufo.City=='Oakland', 'State']
Out[21]:
1694 CA 2144 CA 4686 MD 7293 CA 8488 CA 8768 CA 10816 OR 10948 CA 11045 CA 12322 CA 12941 CA 16803 MD 17322 CA Name: State, dtype: object
In [24]:
# chain indexing
# there may be issues in some cases
# try not to use this
ufo[ufo.City=='Oakland'].State
Out[24]:
1694 CA 2144 CA 4686 MD 7293 CA 8488 CA 8768 CA 10816 OR 10948 CA 11045 CA 12322 CA 12941 CA 16803 MD 17322 CA Name: State, dtype: object
iloc usage
In [25]:
ufo.iloc[:, [0, 3]]
Out[25]:
City | State | |
---|---|---|
0 | Ithaca | NY |
1 | Willingboro | NJ |
2 | Holyoke | CO |
3 | Abilene | KS |
4 | New York Worlds Fair | NY |
5 | Valley City | ND |
6 | Crater Lake | CA |
7 | Alma | MI |
8 | Eklutna | AK |
9 | Hubbard | OR |
10 | Fontana | CA |
11 | Waterloo | AL |
12 | Belton | SC |
13 | Keokuk | IA |
14 | Ludington | MI |
15 | Forest Home | CA |
16 | Los Angeles | CA |
17 | Hapeville | GA |
18 | Oneida | TN |
19 | Bering Sea | AK |
20 | Nebraska | NE |
21 | NaN | LA |
22 | NaN | LA |
23 | Owensboro | KY |
24 | Wilderness | WV |
25 | San Diego | CA |
26 | Wilderness | WV |
27 | Clovis | NM |
28 | Los Alamos | NM |
29 | Ft. Duschene | UT |
... | ... | ... |
18211 | Holyoke | MA |
18212 | Carson | CA |
18213 | Pasadena | CA |
18214 | Austin | TX |
18215 | El Campo | TX |
18216 | Garden Grove | CA |
18217 | Berthoud Pass | CO |
18218 | Sisterdale | TX |
18219 | Garden Grove | CA |
18220 | Shasta Lake | CA |
18221 | Franklin | NH |
18222 | Albrightsville | PA |
18223 | Greenville | SC |
18224 | Eufaula | OK |
18225 | Simi Valley | CA |
18226 | San Francisco | CA |
18227 | San Francisco | CA |
18228 | Kingsville | TX |
18229 | Chicago | IL |
18230 | Pismo Beach | CA |
18231 | Pismo Beach | CA |
18232 | Lodi | WI |
18233 | Anchorage | AK |
18234 | Capitola | CA |
18235 | Fountain Hills | AZ |
18236 | Grant Park | IL |
18237 | Spirit Lake | IA |
18238 | Eagle River | WI |
18239 | Eagle River | WI |
18240 | Ybor | FL |
18241 rows × 2 columns
In [28]:
# iloc excludes 4 (compared to loc where it includes 4)
# iloc includes 0
ufo.iloc[:, 0:4]
Out[28]:
City | Colors Reported | Shape Reported | State | |
---|---|---|---|---|
0 | Ithaca | NaN | TRIANGLE | NY |
1 | Willingboro | NaN | OTHER | NJ |
2 | Holyoke | NaN | OVAL | CO |
3 | Abilene | NaN | DISK | KS |
4 | New York Worlds Fair | NaN | LIGHT | NY |
5 | Valley City | NaN | DISK | ND |
6 | Crater Lake | NaN | CIRCLE | CA |
7 | Alma | NaN | DISK | MI |
8 | Eklutna | NaN | CIGAR | AK |
9 | Hubbard | NaN | CYLINDER | OR |
10 | Fontana | NaN | LIGHT | CA |
11 | Waterloo | NaN | FIREBALL | AL |
12 | Belton | RED | SPHERE | SC |
13 | Keokuk | NaN | OVAL | IA |
14 | Ludington | NaN | DISK | MI |
15 | Forest Home | NaN | CIRCLE | CA |
16 | Los Angeles | NaN | NaN | CA |
17 | Hapeville | NaN | NaN | GA |
18 | Oneida | NaN | RECTANGLE | TN |
19 | Bering Sea | RED | OTHER | AK |
20 | Nebraska | NaN | DISK | NE |
21 | NaN | NaN | NaN | LA |
22 | NaN | NaN | LIGHT | LA |
23 | Owensboro | NaN | RECTANGLE | KY |
24 | Wilderness | NaN | DISK | WV |
25 | San Diego | NaN | CIGAR | CA |
26 | Wilderness | NaN | DISK | WV |
27 | Clovis | NaN | DISK | NM |
28 | Los Alamos | NaN | DISK | NM |
29 | Ft. Duschene | NaN | DISK | UT |
... | ... | ... | ... | ... |
18211 | Holyoke | NaN | DIAMOND | MA |
18212 | Carson | NaN | DISK | CA |
18213 | Pasadena | GREEN | FIREBALL | CA |
18214 | Austin | NaN | FORMATION | TX |
18215 | El Campo | NaN | OTHER | TX |
18216 | Garden Grove | ORANGE | LIGHT | CA |
18217 | Berthoud Pass | NaN | TRIANGLE | CO |
18218 | Sisterdale | NaN | DIAMOND | TX |
18219 | Garden Grove | NaN | CHEVRON | CA |
18220 | Shasta Lake | BLUE | DISK | CA |
18221 | Franklin | NaN | DISK | NH |
18222 | Albrightsville | NaN | OTHER | PA |
18223 | Greenville | NaN | NaN | SC |
18224 | Eufaula | NaN | DISK | OK |
18225 | Simi Valley | NaN | FORMATION | CA |
18226 | San Francisco | NaN | FORMATION | CA |
18227 | San Francisco | NaN | TRIANGLE | CA |
18228 | Kingsville | NaN | LIGHT | TX |
18229 | Chicago | NaN | DISK | IL |
18230 | Pismo Beach | NaN | OVAL | CA |
18231 | Pismo Beach | NaN | OVAL | CA |
18232 | Lodi | NaN | NaN | WI |
18233 | Anchorage | RED | VARIOUS | AK |
18234 | Capitola | NaN | TRIANGLE | CA |
18235 | Fountain Hills | NaN | NaN | AZ |
18236 | Grant Park | NaN | TRIANGLE | IL |
18237 | Spirit Lake | NaN | DISK | IA |
18238 | Eagle River | NaN | NaN | WI |
18239 | Eagle River | RED | LIGHT | WI |
18240 | Ybor | NaN | OVAL | FL |
18241 rows × 4 columns
In [31]:
# this is the major difference
# exclusive of 3
ufo.iloc[0:3, :]
Out[31]:
City | Colors Reported | Shape Reported | State | Time | |
---|---|---|---|---|---|
0 | Ithaca | NaN | TRIANGLE | NY | 6/1/1930 22:00 |
1 | Willingboro | NaN | OTHER | NJ | 6/30/1930 20:00 |
2 | Holyoke | NaN | OVAL | CO | 2/15/1931 14:00 |
In [38]:
# non-explicit code
ufo[['City', 'State']]
# explicit code
ufo.loc[:, ['City', 'State']]
Out[38]:
City | State | |
---|---|---|
0 | Ithaca | NY |
1 | Willingboro | NJ |
2 | Holyoke | CO |
3 | Abilene | KS |
4 | New York Worlds Fair | NY |
5 | Valley City | ND |
6 | Crater Lake | CA |
7 | Alma | MI |
8 | Eklutna | AK |
9 | Hubbard | OR |
10 | Fontana | CA |
11 | Waterloo | AL |
12 | Belton | SC |
13 | Keokuk | IA |
14 | Ludington | MI |
15 | Forest Home | CA |
16 | Los Angeles | CA |
17 | Hapeville | GA |
18 | Oneida | TN |
19 | Bering Sea | AK |
20 | Nebraska | NE |
21 | NaN | LA |
22 | NaN | LA |
23 | Owensboro | KY |
24 | Wilderness | WV |
25 | San Diego | CA |
26 | Wilderness | WV |
27 | Clovis | NM |
28 | Los Alamos | NM |
29 | Ft. Duschene | UT |
... | ... | ... |
18211 | Holyoke | MA |
18212 | Carson | CA |
18213 | Pasadena | CA |
18214 | Austin | TX |
18215 | El Campo | TX |
18216 | Garden Grove | CA |
18217 | Berthoud Pass | CO |
18218 | Sisterdale | TX |
18219 | Garden Grove | CA |
18220 | Shasta Lake | CA |
18221 | Franklin | NH |
18222 | Albrightsville | PA |
18223 | Greenville | SC |
18224 | Eufaula | OK |
18225 | Simi Valley | CA |
18226 | San Francisco | CA |
18227 | San Francisco | CA |
18228 | Kingsville | TX |
18229 | Chicago | IL |
18230 | Pismo Beach | CA |
18231 | Pismo Beach | CA |
18232 | Lodi | WI |
18233 | Anchorage | AK |
18234 | Capitola | CA |
18235 | Fountain Hills | AZ |
18236 | Grant Park | IL |
18237 | Spirit Lake | IA |
18238 | Eagle River | WI |
18239 | Eagle River | WI |
18240 | Ybor | FL |
18241 rows × 2 columns
In [40]:
# ambiguous code again, are we referring to rows or columns?
ufo[0:2]
# use iloc!
ufo.iloc[0:2, :]
Out[40]:
City | Colors Reported | Shape Reported | State | Time | |
---|---|---|---|---|---|
0 | Ithaca | NaN | TRIANGLE | NY | 6/1/1930 22:00 |
1 | Willingboro | NaN | OTHER | NJ | 6/30/1930 20:00 |
.ix usage
Mix labels and integers when using selection.
In [41]:
drinks_url = 'http://bit.ly/drinksbycountry'
drinks = pd.read_csv(drinks_url, index_col='country')
In [42]:
drinks.head()
Out[42]:
beer_servings | spirit_servings | wine_servings | total_litres_of_pure_alcohol | continent | |
---|---|---|---|---|---|
country | |||||
Afghanistan | 0 | 0 | 0 | 0.0 | Asia |
Albania | 89 | 132 | 54 | 4.9 | Europe |
Algeria | 25 | 0 | 14 | 0.7 | Africa |
Andorra | 245 | 138 | 312 | 12.4 | Europe |
Angola | 217 | 57 | 45 | 5.9 | Africa |
In [43]:
drinks.ix['Albania', 0]
Out[43]:
89
In [44]:
drinks.ix[1, 'beer_servings']
Out[44]:
89
In [46]:
# for .ix, columns are exclusive of 2
drinks.ix['Albania':'Andorra', 0:2]
Out[46]:
beer_servings | spirit_servings | |
---|---|---|
country | ||
Albania | 89 | 132 |
Algeria | 25 | 0 |
Andorra | 245 | 138 |
In [48]:
# for rows, .ix is inclusive from start to end
# for columns, .ix is exclusive of end but inclusive of start
ufo.ix[0:2, 0:2]
Out[48]:
City | Colors Reported | |
---|---|---|
0 | Ithaca | NaN |
1 | Willingboro | NaN |
2 | Holyoke | NaN |
Tags:
pandas