watex.models.get_best_kPCA_params#

watex.models.get_best_kPCA_params(X, n_components=2, *, y=None, param_grid=None, clf=None, cv=7, **grid_kws)[source]#

Select the Kernel and hyperparameters using GridSearchCV that lead to the best performance.

As kPCA( unsupervised learning algorithm), there is obvious performance measure to help selecting the best kernel and hyperparameters values. However dimensionality reduction is often a preparation step for a supervised task(e.g. classification). So we can use grid search to select the kernel and hyperparameters that lead the best performance on that task. By default implementation we create two steps pipeline. First reducing dimensionality to two dimension using kPCA, then applying the LogisticRegression for classification. AFter use Grid searchCV to find the best kernel and gamma value for kPCA in oder to get the best clasification accuracy at the end of the pipeline.

Parameters:

X (Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.
y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.
n_components (int,) – Number of dimension to preserve. If n_components is ranged between 0. to 1., it indicated the number of variance ratio to preserve.

param_grid (list) –

list of parameters grids. For instance:

param_grid=[dict(
    kpca__gamma=np.linspace(0.03, 0.05, 10),
    kpca__kernel=["rbf", "sigmoid"]
    )]

clf (callable, always as a function, classifier estimator) –
A supervised (or semi-supervised) predictor with a finite set of discrete possible output values. A classifier supports modeling some of binary, multiclass, multilabel, or multiclass multioutput targets. Within scikit-learn, all classifiers support multi-class classification, defaulting to using a one-vs-rest strategy over the binary classification problem. Classifiers must store a classes_ attribute after fitting, and usually inherit from base.ClassifierMixin, which sets their _estimator_type attribute. A classifier can be distinguished from other estimators with is_classifier. It must implement:
```
* fit
* predict
* score
```
It may also be appropriate to implement decision_function, predict_proba and predict_log_proba. It can also be a base estimator or a composite estimor with pipeline. For instance:: clf =Pipeline([ (‘kpca’, KernelPCA(n_components=2)) (‘log_reg’, LogisticRegression()) ])

cv (float,) –

A cross validation splitting strategy. It used in cross-validation based routines. cv is also available in estimators such as multioutput. ClassifierChain or calibration.CalibratedClassifierCV which use the predictions of one estimator as training data for another, to not overfit the training supervision. Possible inputs for cv are usually:

* An integer, specifying the number of folds in K-fold cross validation.
    K-fold will be stratified over classes if the estimator is a classifier
    (determined by base.is_classifier) and the targets may represent a
    binary or multiclass (but not multioutput) classification problem
    (determined by utils.multiclass.type_of_target).
* A cross-validation splitter instance. Refer to the User Guide for
    splitters available within `Scikit-learn`_
* An iterable yielding train/test splits.

With some exceptions (especially where not using cross validation at all: is an option), the default is 4-fold.

grid_kws (dict,) – Additional keywords arguments passed to Grid parameters from GridSearch

Examples

>>> from watex.analysis.dimensionality import get_best_kPCA_params
>>> from watex.datasets import fetch_data
>>> X, y=fetch_data('Bagoue analysis data')
>>> param_grid=[dict(
    kpca__gamma=np.linspace(0.03, 0.05, 10),
    kpca__kernel=["rbf", "sigmoid"]
    )]
>>> clf =Pipeline([
    ('kpca', KernelPCA(n_components=2)),
    ('log_reg', LogisticRegression())
     ])
>>> kpca_best_params =get_best_kPCA_params(
            X,y=y,scoring = 'accuracy',
            n_components= 2, clf=clf,
            param_grid=param_grid)
>>> kpca_best_params
... {'kpca__gamma': 0.03, 'kpca__kernel': 'rbf'}