Decomposition#

Steps behing the principal component analysis (PCA) and matrices decomposition

watex.analysis.decomposition.decision_region(X, y, clf, Xt=None, yt=None, random_state=42, test_size=0.3, scaling=True, split=False, n_components=2, view='X', resolution=0.02, return_expl_variance_ratio=False, return_axe=False, axe=None, **kws)[source]#

View decision regions for the training data reduced to two principal component axes.

Parameters:
  • X (Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

  • Xt (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Shorthand for “test set”; data that is observed at testing and prediction time, used as independent variables in learning.The notation is uppercase to denote that it is ordinarily a matrix.

  • yt (array-like, shape (M, ) M=m-samples,) – test target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

  • clf (callable, always as a function, classifier estimator) –

    A supervised (or semi-supervised) predictor with a finite set of discrete possible output values. A classifier supports modeling some of binary, multiclass, multilabel, or multiclass multioutput targets. Within scikit-learn, all classifiers support multi-class classification, defaulting to using a one-vs-rest strategy over the binary classification problem. Classifiers must store a classes_ attribute after fitting, and usually inherit from base.ClassifierMixin, which sets their _estimator_type attribute. A classifier can be distinguished from other estimators with is_classifier. It must implement:

    * fit
    * predict
    * score
    

    It may also be appropriate to implement decision_function, predict_proba and predict_log_proba.

  • random_state (int, default {42}) – state of shuffling the data

  • test_size (float < 1 , default {.3}) – the size to keep remainder data into the test set .

  • split (bool, False) – Split (X,y) data into a training and test sets(Xt, yt). Here, it value is triggered to True, we assume (X, y) previously given are all the whole dataset with target y.

  • n_components (int, float 2 , default {2}) – the number of principal component to retrieve. If value is given as a ratio for instance ‘.95’ i.e. the ratio of keeping variance is 95% and the n_components can be get using the attributes scikit-learn getter as `<estimator>.n_components_

  • view (str , ['X', 'Xt', None]) – the kind of vizualization. ‘X’, ‘Xt’ mean the training and test set decision region visualization respectively. If set to ``None``(default), the view are muted.

  • resolution (float, default{.02}) – level of the extension of numpy meshgrip to tighting layout the plot.

  • return_expl_variance_ratio (bool, default is {False}) – returns the PCA variance ratio explaines of all principal components.

  • return_axes (bool, default=False,) – Return matplotlib object axe

  • ax (Matplotlib.Axes object, optional) – If not supplied, it is created.

  • kws (dict) – Additional keywords arguments passed to the scikit-learn function sklearn.model_selection.train_test_split()

Returns:

X PCA training set transformed or PCA explained variance ratio.

Return type:

nd-array | arraylike (return_expl_variance_ratio=True)

Examples

>>> from watex.datasets import fetch_data
>>> from watex.exlib.sklearn import SimpleImputer, LogisticRegression
>>> from watex.analysis.decomposition import decision_region
>>> data= fetch_data("bagoue original").get('data=dfy1') # encoded flow categories
>>> y = data.flow ; X= data.drop(columns='flow')
>>> # select the numerical features
>>> X =selectfeatures(X, include ='number')
>>> # imputed the missing data
>>> X = SimpleImputer().fit_transform(X)
>>> lr_clf = LogisticRegression(multi_class ='ovr', random_state =1, solver ='lbfgs')
>>> Xpca= decision_region(X, y, clf=lr_clf, split = True, view ='Xt') # test set view
>>> Xpca[0]
... array([-1.02925449,  1.42195127])
watex.analysis.decomposition.extract_pca(X)[source]#

A naive approach to extract PCA from training set X

Parameters:

X (Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

Returns:

Eigen values , eigen vectors and Xsc scaled (standardized)

Return type:

Tuple (eigen_vals, eigen_vecs, Xsc)

Examples

>>> from watex.exlib.sklearn import SimpleImputer
>>> from watex.utils import selectfeatures
>>> from watex.datasets import fetch_data
>>> from watex.analysis import extract_pca
>>> data= fetch_data("bagoue original").get('data=dfy1') # encoded flow categories
>>> y = data.flow ; X= data.drop(columns='flow')
>>> # select the numerical features
>>> X =selectfeatures(X, include ='number')
>>> # imputed the missing data
>>> X = SimpleImputer().fit_transform(X)
>>> eigval, eigvecs, _ = extract_pca(X)
>>> eigval
... array([2.09220756, 1.43940464, 0.20251943, 1.08913226, 0.97512157,
       0.85749283, 0.64907948, 0.71364687])

Notes

All consequent principal component (pc) will have the larget variance given the constraint that these component are uncorrelated (orthogonal) to other pc - even if the inputs features are corralated , the resulting of pc will be mutually orthogonal (uncorelated). Note that the PCA directions are highly sensistive to data scaling and we need to standardize the features prior to PCA if the features were measured on different scales and we assign equal importances of all features

the numpy function was designed to operate on both symetric and non-symetric squares matrices. However you may find it return complex eigenvalues in certains casesA related function, numpy.linalg.eigh has been implemented to decompose Hermetian matrices which is numerically more stable to work with symetric matrices such as the covariance matrix. numpy.linalg.eigh always returns real eigh eigenvalues

watex.analysis.decomposition.feature_transformation(X, y=None, n_components=2, positive_class=1, view=False)[source]#

Transform X into new principal components after decomposing the covariances matrices.

Parameters:
  • X (Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

  • n_components (int, default=2) – Number of components with most total variance ratio.

  • positive_class (int,) – class label as an integer indenfier within the class representation.

  • view (bool, default {'False'}) – give an overview of the total explained variance.

Returns:

X_transf – X PCA training set transformed.

Return type:

nd-array

Examples

>>> from watex.analysis import feature_transformation
>>> # Use the X, y value in the example of `extract_pca` function
>>> Xtransf = feature_transformation(X, y=y,  positive_class = 2 , view =True)
>>> Xtransf[0]
... array([-1.0168034 ,  2.56417088])
watex.analysis.decomposition.linear_discriminant_analysis(X, y, n_components=2, view=False, verbose=0, return_X=True)[source]#

Linear Discriminant Analysis LDA.

LDA is used as a technique for feature extraction to increase the computational efficiency and reduce the degree of overfitting due to the curse of dimensionnality in non-regularized models. The general concept behind LDA is very similar to the principal component analysis (PCA), but whereas PCA attempts to find the orthogonal component axes of minimum variance in a dataset, the goal in LDA is to find the features subspace that optimize class separability. The main steps requiered to perform LDA are summarized below:

  • Standardize the d-dimensional datasets (d is the number of features)

  • For each class , compute the d-dimensional mean vectors. Thus for each mean feature value, \(\mu_m\) with respect to the examples of class \(i\):

    \[m_i = \frac{1}{n_i} \sum{x\in D_i} x_m\]
  • Construct the between-class scatter matrix, \(S_B\) and the within class scatter matrix, \(S_W\). Individual scatter matrices are scalled \(S_i\) before we sum them up as scatter matrix \(S_W\) as:

    \[ \begin{align}\begin{aligned}\sum{i} = \frac{1}{n_i}S_i\\\sum{i} = \frac{1}{n_i} \sum{x\in D_i} (x-m_i)(x-m_i)^T\end{aligned}\end{align} \]

    The within-class is also called the covariance matrix, thus we can compute the between class scatter_matrix \(S_B\).

    \[S_B= \sum{i}^{n_i}(m_i-m) (m_i-m)^T\]

    where \(m\) is the overall mean that is computed , including examples from all classes.

  • Compute the eigenvectors and corresponding eigenvalues of the matrix \(S_W^{-1}S_B\).

  • Sort the eigenvalues by decreasing order to rank the corresponding eigenvectors

  • Choose the \(k\) eigenvectors that correspond to the \(k\) largest eigenvalues to construct \(dxk\)-dimensional transformation matrix, \(W\); the eigenvectors are the columns of this matrix.

  • project the examples onto the new_features subspaces using the transformation matrix \(W\).

Parameters:
  • X (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • y (array-like, shape (M, ) M=m-samples,) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

  • n_components (int, default =2) – Number of components considered as the most discriminative eigen vector.

  • return_X (bool, default =True) – return the transformed training set from n_components.

  • view (bool ,default =False,) – Visualize the LDA plot. If set to True, the plot is triggered.

Returns:

X or W – The transformed train set (X) or matrix (W) from the most discriminative eigenvector columns

Return type:

ndarray (n_samples, 2 )

Examples

>>> from watex.datasets import fetch_data
>>> from watex.exlib.sklearn import SimpleImputer, LogisticRegression
>>> from watex.analysis.decomposition import linear_discriminant_analysis
>>> data= fetch_data("bagoue original").get('data=dfy1') # encoded flow
>>> y = data.flow ; X= data.drop(columns='flow')
>>> # select the numerical features
>>> X =selectfeatures(X, include ='number')
>>> # imputed the missing data
>>> X = SimpleImputer().fit_transform(X)
>>> Xtr= linear_discriminant_analysis (X, y , view =True)
watex.analysis.decomposition.total_variance_ratio(X, view=False)[source]#

Compute the total variance ratio.

Is the ratio of an eigenvalues \(\lambda_j\), as simply the fraction of and eigen value, \(\lambda_j\) and the total sum of the eigen values as:

\[\text{explained_variance_ratio}= \frac{\lambda_j}{\sum{j=1}^{d} \lambda_j}\]

Using numpy cumsum function, we can then calculate the cumulative sum of explained variance which can be plot if plot is set to True via matplotlib set function.

Parameters:
  • X (Nd-array, shape(M, N)) – Array of training set with M examples and N-features

  • view (bool, default {'False'}) – give an overview of the total explained variance.

Returns:

cum_var_exp – Cumulative sum of variance total explained.

Return type:

array-like

Examples

>>> from watex.analysis import total_variance_ratio
>>> # Use the X value in the example of `extract_pca` function
>>> cum_var = total_variance_ratio(X, view=True)
>>> cum_var
... array([0.26091916, 0.44042728, 0.57625294, 0.69786032, 0.80479823,
       0.89379712, 0.97474381, 1.        ])