Principal Component Analysis
Overview
PCA is an unsupervised learning algorithm that lets you take data with a lot of features (50, 1000, etc.) and reduce the number of features considerably, so you can plot and visualize it.
Example 1: Given many features of a car (like length, width, etc.), the PCA algorithm will recognize that the width doesn’t change all that much across instances, and will pick length
Example 2: Given two important features (height and length), we might take a combination of the two to capture the importance that the two features convey.
PCA: reduce many number of features down to two or three $z$ axes, essentially compressing the original features.
Algorithm
Before conducting the PCA algorithm steps, the data need to be normalized to have zero mean and scaled to account for extremely different scales (square ft. VS number of rooms).
For a two-variable dataset, run an axis (z-axis, also the principal component) that the coordinates are projected upon. The principal component is the axis that captures the most amount of variation in the data.
Let’s say that for two variables, the PCA algorithm found the z-axis to be the vector
$[0.71 \;\ 0.71]$. To find the projection of the point $(2,3)$ onto this principal component, we take the dot product of the two vectors:
\[\begin{bmatrix} 2 \\ 3 \end{bmatrix} \cdot \begin{bmatrix} 0.71 \\ 0.71 \end{bmatrix} = 3.55\]This is finding the first principal component. We can also find the second, third, etc. components, which will be $90^\circ$to the previous axis.
Implementation
- Optional pre-processing — Perform feature scaling
- “Fit” the data to obtain 2 or 3 new axes (principal components) —
fitdoes mean normalization by default. - Optionally examine how much info/variance is explained by each principal component — given by
explain_variance_ratio - Transform (project) the data onto the new axes — done by
transform
X = np.array([
[1,1], [2,1], [3,2],
[-1, -1], [-2, -1], [-3, -2]
])
pca_1 = PCA(n_components=1)
pca_1.fit(X)
pca_1.explained_variance_ratio # 0.992 (kept 99.2% of the original variability)
# returns a 1-dim vector whose values are the projections and distances from the origin
X_trans_1 = pca_1.transform(X)
# reconstructs the original data
X_reduced_1 = pca.inverse_transform(X_trans_1)