Central Tendency

Calculate mean and median

Central Tendency

Measures of Central Tendency¶

Describing a distribution using measures of center

Mode
- Value (on the x-axis) at which frequency is highest
- Other cases
  - May be a range that occured with the highest frequency
  - No mode for uniform distributions
  - May have multiple modes
  - May be a categorical mode
    - X-axis (plain and peanut)
    - y-axis (plain = 60,000, peanut = 10,000)
      - Mode = plain (x-axis)
      - 60,000 and 10,000 are frequencies
  - All scores in the dataset may not affect the mode
    - [2, 2, 3, 4, 100]
    - Mode is the same even if we add a big number 10000
  - Mode changes with each sample
    - May not be the same as the population's mode
  - Mode changes with bin sizes
  - There is no equation for calculating the mode
Median
- Value in the middle for an odd set of numbers
- Mean of the 2 values in the middle for an even set of numbers
- Properties
  - This will not be affected by the outlier
  - It does not take every score in the distribution
Mean
- Average
- Properties
  - All scores of a distribution affect the mean
  - Mean can be represented by a formula
  - Many samples would have similar means
  - Mean will be affected by outliers

Calculating measures of central tendency in Pandas

In [55]:

import pandas as pd

In [56]:

url = './fb_data.csv'
data = pd.read_csv(url, header=None)

In [64]:

data

Out[64]:

	0
0	0
1	69
2	123
3	137
4	174
5	240
6	241
7	256
8	258
9	322
10	366
11	376
12	408
13	479
14	555
15	589
16	600
17	777
18	784
19	822
20	850
21	863
22	1116
23	1143
24	1214
25	1250
26	1776

In [58]:

sorted(data)
data

Out[58]:

	0
0	0
1	69
2	123
3	137
4	174
5	240
6	241
7	256
8	258
9	322
10	366
11	376
12	408
13	479
14	555
15	589
16	600
17	777
18	784
19	822
20	850
21	863
22	1116
23	1143
24	1214
25	1250
26	1776

In [59]:

# Since this is a pandas DataFrame, we can use mean() and median() methods
type(data)

Out[59]:

pandas.core.frame.DataFrame

In [60]:

data.mean()

Out[60]:

0    584.740741
dtype: float64

In [61]:

data.median()

Out[61]:

0    479
dtype: float64

In [63]:

# this is a uniform distribution
data.mode()

Out[63]:

	0

Tags: