How to calculate explained variance ratio in PCA using SVD?

Mohit Sharma
3 min readApr 9, 2023

source: https://thespacedoctors.com/index.php/2021/03/01/fractal-patterns-have-remarkable-benefits/

The ultimate goal of my articles is to provide maximum information in minimum time, so if you want to skip to the main content click here

What is PCA?

PCA is a dimensionality reduction algorithm that aims to reduces the number of features in dataset. This may help visualize high dimensional datasets, models run faster, aid in feature selection, etc.

The word “may” here suggests that it is not necessary that PCA will always give the best results over its list of benefits.

Important: PCA does not just remove the features from the dataset, rather it identifies a smaller number of uncorrelated variables, that can explain the maximum amount of variance in the original dataset. These variables are called as principal components and they help identifying the most important patterns and relationships in the data.

SVD

Now, SVD is a matrix factorization technique that can help you decompose your data into the product of three matrices, A = UΣVᵗ, here A represents your data, U and V are orthogonal matrices and Σ is a diagonal matrix of singular values of A (your data).

If you want to know why SVD is performed for PCA or what the relationship between them is — https://stats.stackexchange.com/questions/134282/relationship-between-svd-and-pca-how-to-use-svd-to-perform-pca

What is singular value of A?

The singular value of a matrix A are the square roots of the eigenvalues of the matrix A.T * A (A.T — transpose of A).

Visualization of singular value of A

Solve this to get the eigenvalues λ₁, λ₂, λ₃, then square root these values to get σ₁, σ₂, and σ₃ which will be singular value of the matrix A.

Why is singular value of A important?

As the main topic of this post is that we need to find explained variance ratio, this value play a significant role for it. This value provides information about the amount of variation in the data represented by each principal component. σ₁ will represent the amount of variation captured by the first principal component, σ₂ for the second principal component, and so on.

How to compute variance using this logic?

import numpy as np

U, S, Vt = np.linalg.svd(X)
variances = S**2 / (X.shape[0]-1)
explained_variances = variances / np.sum(variances)

Important to know!
You need to your data to be centered around the mean. Wanna know why? Stay tuned for my next article.

Conclusion - The explained_variances variable will do the job same as pca.explained_variance_ratio_ when PCA is performed using sklearn.decomposition package.

.

I would like to extend a special thanks to my friend Vaibhav for providing me with valuable insight and inspiration for this post.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Mohit Sharma
Mohit Sharma

Written by Mohit Sharma

I talk machines and to machines.

No responses yet

Write a response