watex.cases.prepare.base_transform#

watex.cases.prepare.base_transform(X, n_components=0.95, attr_names=None, attr_indexes=None, operator=None, view=False, **kws)[source]#

Tranformed X using PCA and plot variance ratio by experiencing the attributes combinaisons.

Create a new attributes using features index or litteral string operator. and prepared data for PCA variance plot.

Parameters:
  • X (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • n_components (float oR int) – Number of dimension to preserve. If`n_components` is ranged between float 0. to 1., it indicated the number of variance ratio to preserve. If None as default value the number of variance to preserve is 95%.

  • attr_names (list of str , optional) – List of features for combinaison. Decide to combine new feature values by from operator parameters. By default, the combinaison it is ratio of the given attribute/numerical features. For instance, attribute_names=['lwi', 'ohmS'] will divide the feature ‘lwi’ by ‘ohmS’.

  • attr_indexes (list of int,) – index of each feature/feature for experience combinaison. User warning should raise if any index does match the dataframe of array columns.

  • operator (str, default ='/') – Type of operation to perform when combining features. Can be [‘/’, ‘+’, ‘-’, ‘*’, ‘%’]

Returns:

  • X (n_darray, or pd.dataframe)

  • New array of dataframe with new attributes combined.

Examples

>>> from from watex.view.mlplot import MLPlots
>>> from watex.datasets import fetch_data
>>> from watex.analysis import pcaVarianceRatio
>>> plot_kws = {'lc':(.9,0.,.8),
        'lw' :3.,           # line width
        'font_size':7.,
        'show_grid' :True,        # visualize grid
       'galpha' :0.2,              # grid alpha
       'glw':.5,                   # grid line width
       'gwhich' :'major',          # minor ticks
        # 'fs' :3.,                 # coeff to manage font_size
        }
>>> X, _ = fetch_data ('Bagoue data analysis')
>>> mlObj =MLPlots(**plot_kws)
>>> pcaVarianceRatio(mlObj,X, plot_var_ratio=True)