PCA solved example step by step

In this blog post we will uncover the working methods of PCA detailed with a simple dataset, so that everyone can understand. I will also teach PCA solved problem with example step by step. The PCA is a dimensionality reduction algorithm used to break the larger and complex dataset into a simple and more understandable dataset while keeping much of the information without any data loss. As you go throught this blog we will find the detailed definition of PCA.

What is PCA ?

PCA also called Principal component analysis is used for reducing the dimension of the dataset to a lower dimension where it can be analysed more easily and also keep all the relevant information of the dataset without any loss.

PCA is used in the Data Cleaning and Data processing stage in Machine learning. PCA can be used in unsupervised datasets, that is the dataset does not have any label. Consider an example where the dataset has n number of features. Some features have high correlation with target variables and some features may have less correlation, so this PCA converts high dimensionality to lower dimensional subspace. As we can imagine the data upto three dimensions in realtime but in realtime data we have N number of dimensions so PCA helps us to imagine the features in three dimensions.

In this PCA tutorial, we delve into the fascinating world of dimensionality reduction. For a deeper understanding of applications in different domains, explore our guide on [LLM Reasoning and Zero Shot Learning](insert-LLM-blog-URL-here). Master the intricacies and broaden your knowledge base.

To understand how PCA is used in different domains explore on of the domain how PCA used in LLM Reasoning for zero shot learning.

Dataset

F	EX1	EX2	EX3	EX4
X1	4	8	13	7
X2	11	4	5	14

Here is the simple dataset where it has 2 features and 4 rows.

Step 1 : Calculate Mean

In step 1 calculate the mean for each feature that is x1 and x2

$\overline{x1} = \frac{1}{4}(4+8+13+7) = 8$

$\overline{x2} = \frac{1}{4}(11+4+5+14) = 8.5$

$\overline{x1} = 8$

$\overline{x2} = 8.5$

The calculate mean for feature x1 is 8 and x2 is 8.5

Step 2 : Calculation of Covariance Matrix

In this step we calculate covariance Matrix. The formula to calculate covariance matrix is

$s = \begin{bmatrix} cov(x1,x1) & cov(x1,x2)\\ cov(x2,x1) & cov(x2,x2) \end{bmatrix}$

The cov(x1,x1) can be calculated by

$cov(x_{1},x_{1}) = \frac{1}{N-1}\sum_{K=1}^{N}(x_{1k} - \overline{x1})(x_{1k}-\overline{x1})$

$cov(x_{1},x_{1}) = \frac{1}{3}((4-8)^{2}+(8-8)^{2}+(13-8)^{2}+(7-8)^{2})$

$= 14$

$cov(x_{1},x_{2}) = \frac{1}{N-1}\sum_{K=1}^{N}(x_{1k} - \overline{x1})(x_{2k}-\overline{x2})$

$cov(x_{1},x_{2}) = \frac{1}{3}((4-8)(11-8.5)+(8-8)(4-8.5)+(13-8)(5-8.5)+(7-8)(14-8.5))$

$= -11$

The cov(x2,x1) is equal to cov(x1,x2), so cov(x1,x2) is also -11

$cov(x_{2},x_{1}) = cov(x_{1},x_{2})$

$= -11$

$cov(x_{2},x_{2}) = \frac{1}{N-1}\sum_{K=1}^{N}(x_{2k} - \overline{x2})(x_{2k}-\overline{x2})$

$cov(x_{1},x_{1}) = \frac{1}{3}((11-8.5)^{2}+(4-8.5)^{2}+(5-8.5)^{2}+(14-8.5)^{2})$

$= 23$

$s = \begin{bmatrix} 14 & -11\\ -11 & 23 \end{bmatrix}$

Step 3: Eigenvalues of the covariance Matrix

The characteristic Equation of the covariance matrix is

$0 = det(s-\lambda I)$

$= \begin{bmatrix} 14-\lambda & -11 \\ -11 & 23-\lambda \end{bmatrix}$

$= (14-\lambda)(23-\lambda) - (-11)(-11)$

$= \lambda^{2} - 37\lambda+201$

The roots of this equation are

$\lambda = \frac{1}{2}(37\pm \sqrt{565})$

$\lambda 1 = 30.3849$

$\lambda 2 = 6.6151$

Step 4: Computation of the Eignevectors

$u = \begin{bmatrix} u1 \\ u2 \end{bmatrix}$

$\begin{bmatrix} 0 \\ 0 \end{bmatrix} = (s-\lambda I)u$

$= \begin{bmatrix} 14-\lambda & -11 \\ -11 & 23- \lambda \end{bmatrix}\begin{bmatrix} u1 \\ u2 \end{bmatrix}$

$= (14- \lambda)u1 - 11u2$

$= -11u1 + (23-\lambda)u2$

$(14-\lambda)u1 = 11u2 = t$

$\frac{u1}{11} = \frac{u2}{14-\lambda} = t$

$u1 = 11t$

$u2 = (14-\lambda)t$

Assume t=1

$u1 = \begin{bmatrix} 11 \\ 14-\lambda \end{bmatrix}$

To find a unit Eigenvector we compute the length of u1 which is given by

$\left \| U \right \| = \sqrt{11^{2} + (14- \lambda)^{2}}$

$= \sqrt{11^{2} + (14- 30.3849)^{2}}$

$= 19.7348$

$e1 = \begin{bmatrix} \frac{11}{\left \| U \right \|}\\ \frac{(14-\lambda)}{\left \| U \right \|} \end{bmatrix}$

$e1 = \begin{bmatrix} \frac{11}{19.7348}\\ \frac{(14-30.3849)}{19.7348} \end{bmatrix}$

$e1 = \begin{bmatrix} 0.5574\\ -0.8303 \end{bmatrix}$

Similarly

$e2 = \begin{bmatrix} 0.8303\\ 0.5574 \end{bmatrix}$

Step 5: Computation of First Principal Components

$e_{1}^{T}\begin{bmatrix} x_{1k}-\overline{x_{1}}\\ x_{2k}-\overline{x_{2}} \end{bmatrix}$

$= \begin{bmatrix} 0.5574 - 0.8303 \end{bmatrix} \begin{bmatrix} x_{11}-\overline{x_{1}}\\ x_{21}-\overline{x_{2}} \end{bmatrix}$

$=0.5574(x_{11} - \overline{x_{1}}) - 0.8303(x_{21} - \overline{x_{2}})$

$=0.5574(4 - 8) - 0.8303(11-8.5)$

$=-4.30535$

Feature	EX1	EX2	EX3	EX4
X1	4	8	13	7
X2	11	4	5	14
X3	-4.3052	3.7361	5.6928	-5.1238

hence the PCA example is solved step by step

Advantage of PCA

Dimensionality reduction

As we discuss in the beginning of the blog, PCA helps us to reduce the dimension of the dataset while perserving the most important information. It reduces the complexity of the dataset by transforming them into a smaller set of uncorrelated features which is known as principal components.

Data Visualization

As a human we can see only three dimensions,so the PCA helps us to visualize the higher dimension data in lower-dimensional space. This makes a data scientist to understand and interpret the relationship between data points more easier.

Noise Reduction

PCA helps in reducing the Noise also called as outliers or irrelevant information and capture the most variance in the data, which makes the dataset more cleaner and robust.

Multicollinearity Handling

Sometime in the datasets the features may be highly correlated with each other, this leads to major problem in statistical analyses. To Handle this multicollinearity PCA helps in tranforming the original variables into a set of linearly uncorrelated components.

Feature Engineering

In Feature Engineering, PCA is also used to exact new features from the existing features, which helps us to find to underlying patterns in the data.

Disadvantage of PCA

Loss of Interpretability

The calculated principal components may not give a clear meaning in the original feature space. This leads to challenging the transformed components back to real world context.

Assumption of Linearity

PCA works best for linear relationship between the existing features. Incase the features have nonlinear relationship, PCA may not effectively capture the variance and the standard dimensionality reduction may decrease.

Sensitivity to Outliers

PCA is sensitive to outliers in the data. outliers can have greater influence to principal components, which leads to squeezed representation of variance and decrease the efficiency of dimensionality reduction.

Computational Intensity

To calculate PCA involves complex mathematicl operations specfically incase of large datasets. This may not be feasible for very large datasets and slowing down analysis.

Not suitable for All Data types

PCA does not works with categorical or binary data but works well with numerical data. we need to apply some techniques to handle categorical variables to work with PCA.

Why PCA is used ?

PCA is used to simply complex dataset by reducing the number of features and make sure the essential information are preserved.It promotes more effective modelling, reduces multicollinearity problems, removes noise, and helps with data visualisation.PCA improves analysis by converting variables into principal components, which is especially useful for visualising and comprehending high-dimensional data.

What are Key features of PCA?

The main features of PCA are reducing dimensionality, generate uncorrelated principle components, noise reduction. The data visualization and feature engineering can be done effectively. It is frequently used to improve performance on analytics and modelling task.

Is PCA used for Feature selection ?

PCA is not used for Feature selection instead it used for dimensionality reduction which transform higher dimension to lower dimension making sure the main information is not lost.

How does PCA reduce Dimensionality ?

The PCA reduce the dimensions by transforming existing feature to principal component which are uncorrelated with each. These principal component capture the maximum variance in the dataset which also retains the important main information with lower dimensions.