Calculate mean and median
Measures of Central Tendency¶
Describing a distribution using measures of center
- Mode
- Value (on the x-axis) at which frequency is highest
- Other cases
- May be a range that occured with the highest frequency
- No mode for uniform distributions
- May have multiple modes
- May be a categorical mode
- X-axis (plain and peanut)
- y-axis (plain = 60,000, peanut = 10,000)
- Mode = plain (x-axis)
- 60,000 and 10,000 are frequencies
- All scores in the dataset may not affect the mode
- [2, 2, 3, 4, 100]
- Mode is the same even if we add a big number 10000
- Mode changes with each sample
- May not be the same as the population's mode
- Mode changes with bin sizes
- There is no equation for calculating the mode
- Median
- Value in the middle for an odd set of numbers
- Mean of the 2 values in the middle for an even set of numbers
- Properties
- This will not be affected by the outlier
- It does not take every score in the distribution
- Mean
- Average
- Properties
- All scores of a distribution affect the mean
- Mean can be represented by a formula
- Many samples would have similar means
- Mean will be affected by outliers
Calculating measures of central tendency in Pandas
In [55]:
import pandas as pd
In [56]:
url = './fb_data.csv'
data = pd.read_csv(url, header=None)
In [64]:
data
Out[64]:
In [58]:
sorted(data)
data
Out[58]:
In [59]:
# Since this is a pandas DataFrame, we can use mean() and median() methods
type(data)
Out[59]:
In [60]:
data.mean()
Out[60]:
In [61]:
data.median()
Out[61]:
In [63]:
# this is a uniform distribution
data.mode()
Out[63]: