Calculate mean and median
   
    
    
    
    
    
  
  
  Measures of Central Tendency¶
Describing a distribution using measures of center
- Mode
- Value (on the x-axis) at which frequency is highest
 - Other cases
- May be a range that occured with the highest frequency
 - No mode for uniform distributions
 - May have multiple modes
 - May be a categorical mode
- X-axis (plain and peanut)
 - y-axis (plain = 60,000, peanut = 10,000)
- Mode = plain (x-axis)
 - 60,000 and 10,000 are frequencies
 
 
 - All scores in the dataset may not affect the mode
- [2, 2, 3, 4, 100]
 - Mode is the same even if we add a big number 10000
 
 - Mode changes with each sample
- May not be the same as the population's mode
 
 - Mode changes with bin sizes
 - There is no equation for calculating the mode
 
 
 - Median
- Value in the middle for an odd set of numbers
 - Mean of the 2 values in the middle for an even set of numbers
 - Properties
- This will not be affected by the outlier
 - It does not take every score in the distribution
 
 
 - Mean
- Average
 - Properties
- All scores of a distribution affect the mean
 - Mean can be represented by a formula
 - Many samples would have similar means
 - Mean will be affected by outliers
 
 
 
Calculating measures of central tendency in Pandas
In [55]:
import pandas as pd
In [56]:
url = './fb_data.csv'
data = pd.read_csv(url, header=None)
In [64]:
data
Out[64]:
In [58]:
sorted(data)
data
Out[58]:
In [59]:
# Since this is a pandas DataFrame, we can use mean() and median() methods
type(data)
Out[59]:
In [60]:
data.mean()
Out[60]:
In [61]:
data.median()
Out[61]:
In [63]:
# this is a uniform distribution
data.mode()
Out[63]: