Motivation of dimensionality reduction, Principal Component Analysis (PCA), and applying PCA.
1. Motivation
I would like to give full credits to the respective authors as these are my personal python notebooks taken from deep learning courses from Andrew Ng, Data School and Udemy :) This is a simple python notebook hosted generously through Github Pages that is on my main personal notes repository on https://github.com/ritchieng/ritchieng.github.io. They are meant for my personal review but I have open-source my repository of personal notes as a lot of people found it useful.
1a. Motivation I: Data Compression
- You are able to reduce the dimension of the data from 2D to 1D
- For example, pilot skill and pilot happiness can be reduced to pilot’s aptitude
- Generally, you can reduce x1 and x2 to z1
- Your are able to reduce the dimension of the data from 3D to 2D
- Project the data such that they lie on a plane
- Specify two axes
- z1
- z2
- You would then be able to reduce the data’s dimension from 3D to 2D
1b. Motivation II: Visualization
- Given a set of data, how are able to examine the data such as this?
- We can use reduce the data’s dimensionality from 50D to 2D
- Typically we do not know what the 2 dimensions’ meanings are
- But we can make sense of out of the 2 dimensions
2. Principal Component Analysis (PCA)
2a. PCA Problem Formation
- Let’s say we have the following 2D data
- We can project with a diagonal line (red line)
- PCA reduces the blue lines (the projection error)
- Before performing PCA, perform mean normalization (mean = 0) and feature scaling
- PCA reduces the blue lines (the projection error)
- We can also project with another diagonal line (magenta)
- But the projection errors are much larger
- Hence PCA would choose the red line instead of this magenta line
- We can project with a diagonal line (red line)
- Goal of PCA
- It’s trying to find a lower dimensional surface onto which to project the data, so as to minimize this squared projection error
- To minimize the square distance between each point and the location of where it gets projected.
- PCA is not linear regression
- PCA is a minimization of the orthogonal distance
2b. Principal Component Analysis Algorithm
- Data pre-processing step
- You must always do this before doing PCA
- PCA intuition
- You need to compute the vector or vectors
- Left graph: compute vector z(1)
- Right graph: compute vector z(1) and z(2)
- You need to compute the vector or vectors
- Procedure
- You can use eig (eigen) or svd (singular value decomposition) but the later is more stable
- You can use any library in other languages that does singular value decomposition
- You will get 3 matrices: U, S and V
- But we only need matrix U where we manipulate to get z that is a k x 1 vector
- You can use eig (eigen) or svd (singular value decomposition) but the later is more stable
- Summary of PCA algorithm in octave
3. Applying PCA
3a. Reconstruction from Compressed Representation
- We can go from lower dimensionality to higher dimensionality
3b. Choosing the Number of Principal Components
- k is the number of principal components
- But how do we choose k?
- There is a more efficient method on the right compared to the left
- We then use the S matrix for calculations
- You would realise that PCA can retain a high percentage of the variance even after compressing the number of dimensions of the data
3c. Advice for Applying PCA
- Supervised learning
- For many data sets, we can reduce by 5-10x easily to ensure our learning algorithm runs much faster
- Application of PCA
- Compression
- Reduce memory or disk needed to store data
- Speed up learning algorithm
- We choose k by percentage of variance retained
- Visualization
- We choose only k = 2 or k = 3
- Compression
- Bad uses of PCA
- To prevent over-fitting
- Regularization is better because it is less likely to throw away valuable information as it knows the labels
- Running PCA without consideration
- To prevent over-fitting