watex.analysis.extract_pca#

watex.analysis.extract_pca(X)[source]#

A naive approach to extract PCA from training set X

Parameters: X (Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.
Returns: Eigen values , eigen vectors and Xsc scaled (standardized)
Return type: Tuple (eigen_vals, eigen_vecs, Xsc)

Examples

>>> from watex.exlib.sklearn import SimpleImputer
>>> from watex.utils import selectfeatures
>>> from watex.datasets import fetch_data
>>> from watex.analysis import extract_pca
>>> data= fetch_data("bagoue original").get('data=dfy1') # encoded flow categories
>>> y = data.flow ; X= data.drop(columns='flow')
>>> # select the numerical features
>>> X =selectfeatures(X, include ='number')
>>> # imputed the missing data
>>> X = SimpleImputer().fit_transform(X)
>>> eigval, eigvecs, _ = extract_pca(X)
>>> eigval
... array([2.09220756, 1.43940464, 0.20251943, 1.08913226, 0.97512157,
       0.85749283, 0.64907948, 0.71364687])

Notes

All consequent principal component (pc) will have the larget variance given the constraint that these component are uncorrelated (orthogonal) to other pc - even if the inputs features are corralated , the resulting of pc will be mutually orthogonal (uncorelated). Note that the PCA directions are highly sensistive to data scaling and we need to standardize the features prior to PCA if the features were measured on different scales and we assign equal importances of all features

the numpy function was designed to operate on both symetric and non-symetric squares matrices. However you may find it return complex eigenvalues in certains casesA related function, numpy.linalg.eigh has been implemented to decompose Hermetian matrices which is numerically more stable to work with symetric matrices such as the covariance matrix. numpy.linalg.eigh always returns real eigh eigenvalues