How to calculate explained variance ratio in PCA using SVD?

3 min readApr 9, 2023

source: https://thespacedoctors.com/index.php/2021/03/01/fractal-patterns-have-remarkable-benefits/

The ultimate goal of my articles is to provide maximum information in minimum time, so if you want to skip to the main content click here

What is PCA?

PCA is a dimensionality reduction algorithm that aims to reduces the number of features in dataset. This may help visualize high dimensional datasets, models run faster, aid in feature selection, etc.

The word “may” here suggests that it is not necessary that PCA will always give the best results over its list of benefits.
Important: PCA does not just remove the features from the dataset, rather it identifies a smaller number of uncorrelated variables, that can explain the maximum amount of variance in the original dataset. These variables are called as principal components and they help identifying the most important patterns and relationships in the data.

SVD

Now, SVD is a matrix factorization technique that can help you decompose your data into the product of three matrices, A = UΣVᵗ, here A represents your data, U and V are orthogonal matrices and Σ is a diagonal matrix of singular values of A (your data).

If you want to know why SVD is performed for PCA or what the relationship between them is — https://stats.stackexchange.com/questions/134282/relationship-between-svd-and-pca-how-to-use-svd-to-perform-pca

What is singular value of A?

The singular value of a matrix A are the square roots of the eigenvalues of the matrix A.T * A (A.T — transpose of A).

Solve this to get the eigenvalues λ₁, λ₂, λ₃, then square root these values to get σ₁, σ₂, and σ₃ which will be singular value of the matrix A.

Why is singular value of A important?

As the main topic of this post is that we need to find explained variance ratio, this value play a significant role for it. This value provides information about the amount of variation in the data represented by each principal component. σ₁ will represent the amount of variation captured by the first principal component, σ₂ for the second principal component, and so on.

How to compute variance using this logic?

import numpy as np

U, S, Vt = np.linalg.svd(X)
variances = S**2 / (X.shape[0]-1)
explained_variances = variances / np.sum(variances)

Important to know!
You need to your data to be centered around the mean. Wanna know why? Stay tuned for my next article.

Conclusion - The explained_variances variable will do the job same as pca.explained_variance_ratio_ when PCA is performed using sklearn.decomposition package.

I would like to extend a special thanks to my friend Vaibhav for providing me with valuable insight and inspiration for this post.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Dimensionality Reduction

Written by Mohit Sharma

11 Followers

10 Following

I talk machines and to machines.

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

More from Mohit Sharma

What is Add & Norm, as soon as possible?

Mohit Sharma

What is Add & Norm, as soon as possible?

This article is inspired by — Attention is all you need paper and its explanation by Umar Jamil.

Jul 1, 2024

Faster Alternatives to Pandas .apply() Function for Scalable Machine Learning Pipelines

Mohit Sharma

Faster Alternatives to Pandas .apply() Function for Scalable Machine Learning Pipelines

We’ve all been guilty of sticking to the data science practices that we learned when we began our journey in the world of data. There…

Sep 23, 2024

Mohit Sharma

What is RAG, as quick as possible?

LLM

Mar 18, 2024

See all from Mohit Sharma

Recommended from Medium

Natural Language Processing: Applying SVD on Term-Document Matrices

Rishabh Singh

Natural Language Processing: Applying SVD on Term-Document Matrices

The Term-Document Matrix (TDM) is a simple and popular method used for this purpose. It is a type of matrix used to represent the…

Oct 11, 2024

Jo Wang

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a statistical technique used for dimensionality reduction while retaining most of the variance…

Oct 21, 2024

Chi Square Test — Intuition, Examples, and Step-by-Step Calculation

IntuitionMath

Aerin Kim

Chi Square Test — Intuition, Examples, and Step-by-Step Calculation

The best way to see if two variables are related.

Feb 12, 2023

Interpreting Support Vector Machine Coefficients: A Comprehensive Analysis

D.H. Jang

Interpreting Support Vector Machine Coefficients: A Comprehensive Analysis

In the rapidly advancing landscape of artificial intelligence (AI) and machine learning (ML), specific methodologies and their…

Nov 3, 2024

AI Bias: Good intentions can lead to nasty results

Cassie Kozyrkov

AI Bias: Good intentions can lead to nasty results

Why fairness through unawareness is a pretty idea with ugly consequences

Sep 30, 2023

Python Coding

Network Graph using Python

This code snippet demonstrates how to create and visualize a simple network graph using the networkx and matplotlib libraries in Python.

Nov 10, 2024

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams