Principal Component Analysis (PCA) is an unsupervised dimensionality reduction and visualization technique. It is often referred to as a linear technique because the mapping of new features is given by the multiplication of features by the matrix of PCA eigenvectors. It works by simply identifying the hyperplane that lies close to the data and then projecting the data onto it in order to maximize the variance. Due to the simplistic approach PCA follows, it is widely used in data mining, bioinformatics, psychology, etc. Most of us are unaware of the fact that there are various versions of this algorithm out there which are better than the conventional approach. Let’s look at them one by one.
Principal Component Analysis (PCA) is a technique for dimensionality reduction and feature extraction that is commonly used in machine learning and data analysis. It is implemented in many programming languages, including Python.
There are several variations of PCA that have been developed to address specific challenges or improve performance. Some of the variations of PCA in Python include:
Kernel PCA: This variation of PCA uses a kernel trick to transform the data into a higher-dimensional space where it is more easily linearly separable. This can be useful for handling non-linearly separable data.
Incremental PCA: This variation of PCA allows for the analysis of large datasets that cannot be fit into memory all at once. It is useful for handling big data problems.
Sparse PCA: This variation of PCA adds a sparsity constraint to the PCA problem, which encourages the algorithm to find a lower-dimensional representation of the data with fewer non-zero components.
Robust PCA: This variation of PCA is designed to handle datasets with outliers or noise. It separates the data into a low-rank component and a sparse component, where the sparse component represents the outliers or noise.
Non-negative Matrix Factorization (NMF): This variation of PCA is used to factorize a non-negative matrix into two non-negative matrices, where the resulting matrices have non-negative elements.
PCA with L1-Regularization: This variation of PCA adds L1 regularization term to the PCA optimization problem. This can be useful for handling high-dimensional datasets with many correlated features.
Randomized PCA:
This is an extension to PCA which uses approximated Singular Value Decomposition(SVD) of data. Conventional PCA works in O(n*p2) + O(p3) where n is the number of data points and p is the number of features whereas randomized version works in O(n*d*2) + O(d3) where d is the number of principal components. Thus, it is blazing fast when d is much smaller than n.
sklearn provides a method randomized_svd in sklearn.utils.extmath which can be used to do randomized PCA. This method returns three matrices: U which is an m x m matrix, Sigma is an m x n diagonal matrix, and V^T is the transpose of an n x n matrix where T is a superscript. Another way to use sklearn.decomposition.PCA and change the svd_solver hyperparameter from ‘auto’ to ‘randomized’ or ‘full’. However, Scikit-learn automatically uses randomized PCA if either p or n exceeds 500 or the number of principal components is less than 80% of p and n.
Randomized PCA is a variation of Principal Component Analysis (PCA) that is designed to approximate the first k principal components of a large dataset efficiently. Instead of computing the eigenvectors of the covariance matrix of the data, as is done in traditional PCA, randomized PCA uses a random projection matrix to map the data to a lower-dimensional subspace. The first k principal components of the data can then be approximated by computing the eigenvectors of the covariance matrix of the projected data.
Randomized PCA has several advantages over traditional PCA:
- Scalability: Randomized PCA can handle large datasets that are not possible to fit into memory using traditional PCA.
- Speed: Randomized PCA is much faster than traditional PCA for large datasets, making it more suitable for real-time applications.
- Sparsity: Randomized PCA is able to handle sparse datasets, which traditional PCA is not able to handle well.
- Low-rank approximation: Randomized PCA can be used to obtain a low-rank approximation of a large dataset, which can then be used for further analysis or visualization.
Code:
Python3
# Python3 program to show the working of # randomized PCA # importing libraries import numpy as np from sklearn.decomposition import PCA from sklearn.utils.extmath import randomized_svd # dummy data X = np.array([[ - 1 , - 1 ], [ - 2 , - 1 ], [ - 3 , - 2 ], [ 1 , 1 ], [ 2 , 1 ], [ 3 , 2 ]]) # creates instance of PCA with randomized svd_solver pca = PCA(n_components = 2 , svd_solver = 'randomized' ) # This function takes a matrix and returns the # U, Sigma and V ^ T elements U, S, VT = randomized_svd(X, n_components = 2 ) # matrix returned by randomized_svd print (f "Matrix U of size m * m: {U}\n" ) print (f "Matrix S of size m * n: {S}\n" ) print (f "Matrix V ^ T of size n * n: {VT}\n" ) # fitting the pca model pca.fit(X) # printing the explained variance ratio print ( "Explained Variance using PCA with randomized svd_solver:" , pca.explained_variance_ratio_) |
Output:
Matrix U of size m*m: [[ 0.21956688 -0.53396977] [ 0.35264795 0.45713538] [ 0.57221483 -0.07683439] [-0.21956688 0.53396977] [-0.35264795 -0.45713538] [-0.57221483 0.07683439]] Matrix S of size m*n: [6.30061232 0.54980396] Matrix V^T of size n*n: [[-0.83849224 -0.54491354] [-0.54491354 0.83849224]] Explained Variance using PCA with randomized svd_solver: [0.99244289 0.00755711]
Incremental PCA:
The major problem with PCA and most of the dimensionality reduction algorithms is that they require whole data to fit in the memory at a single time and as the data is very huge at times thus it becomes very difficult to fit in memory.
Fortunately, there is an algorithm called Incremental PCA which is useful for large training datasets as it splits the data into min-batches and feeds it to Incremental PCA one batch at a time. This is called as on-the-fly learning. As not much data is present in the memory at a time thus memory usage is controlled.
Scikit-Learn provides us with a class called as sklearn.decomposition.IncrementalPCA using which we can implement this.
Code:
Python3
# Python3 program to show the working of # incremental PCA # importing libraries import numpy as np from sklearn.decomposition import IncrementalPCA # dummy data X = np.array([[ - 1 , - 1 ], [ - 2 , - 1 ], [ - 3 , - 2 ], [ 1 , 1 ], [ 2 , 1 ], [ 3 , 2 ]]) # specify the number of batches no_of_batches = 3 # create an instance of IncrementalPCA incremental_pca = IncrementalPCA(n_components = 2 ) # fit the data in batches for batch in np.array_split(X, no_of_batches): incremental_pca.fit(batch) # fit and transform the data final = incremental_pca.transform(X) # prints an 2d-array (as n_components = 2) print (final) |
Output:
[[-4.24264069e+00 7.07106781e-01] [-4.94974747e+00 1.41421356e+00] [-6.36396103e+00 1.41421356e+00] [-1.41421356e+00 7.07106781e-01] [-7.07106781e-01 -5.55111512e-17] [ 7.07106781e-01 5.55111512e-17]]
Kernel PCA:
Kernel PCA is yet another extension of PCA using a kernel. The kernel is a mathematical technique using which we can map instances to very high dimensional space called the feature space, enabling non-linear classification and regression with Support Vector Machines(SVM). This is usually employed in novelty detections and image de-noising.
Scikit-Learn provides a class KernelPCA in sklearn.decomposition which can be used to perform Kernel PCA.
Code:
Python3
# Python3 program to show the working of # Kernel PCA # importing libraries import numpy as np from sklearn.decomposition import KernelPCA # dummy data X = np.array([[ - 1 , - 1 ], [ - 2 , - 1 ], [ - 3 , - 2 ], [ 1 , 1 ], [ 2 , 1 ], [ 3 , 2 ]]) # creating an instance of KernelPCA using rbf kernel kernel_pca = KernelPCA(n_components = 2 , kernel = "rbf" , gamma = 0.03 ) # fit and transform the data final = kernel_pca.fit_transform(X) # prints an 2d-array (as n_components = 2) print (final) |
Output:
[[-0.3149893 -0.17944928] [-0.46965347 -0.0475298 ] [-0.62541667 0.22697909] [ 0.3149893 -0.17944928] [ 0.46965347 -0.0475298 ] [ 0.62541667 0.22697909]]
KernelPCA is unsupervised thus there is no obvious measure to select the best kernel. As we usually use dimensionality reduction as a step in supervised learning algorithms so we can use a pipeline with GridSearchCV for selecting optimal hyperparameters and then using those hyperparameters (kernel and gamma) to get the best classification accuracy.
REFERENCES:
- “Applied Multivariate Statistical Analysis” by Richard Johnson and Dean Wichern, is a classic textbook that covers the fundamental concepts and methods of multivariate statistical analysis including PCA and its variants.
- “An Introduction to Statistical Learning: with Applications in R” by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, provides a comprehensive introduction to statistical learning, including PCA and other dimensionality reduction techniques.
- “Python Machine Learning” by Sebastian Raschka and Vahid Mirjalili, is a popular book that covers the most important machine learning algorithms in Python, including PCA and its variants.
- “The Elements of Statistical Learning: Data Mining, Inference, and Prediction” by Trevor Hastie, Robert Tibshirani and Jerome Friedman, is a comprehensive textbook that covers the most important machine learning techniques, including PCA and its variants.
- “Data Science from Scratch: First Principles with Python” by Joel Grus is a book that gives an introduction to data science using Python, including the implementation of PCA and other dimensionality reduction techniques.