watex.analysis.decision_region#

watex.analysis.decision_region(X, y, clf, Xt=None, yt=None, random_state=42, test_size=0.3, scaling=True, split=False, n_components=2, view='X', resolution=0.02, return_expl_variance_ratio=False, return_axe=False, axe=None, **kws)[source]#

View decision regions for the training data reduced to two principal component axes.

Parameters:
  • X (Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

  • y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

  • Xt (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Shorthand for “test set”; data that is observed at testing and prediction time, used as independent variables in learning.The notation is uppercase to denote that it is ordinarily a matrix.

  • yt (array-like, shape (M, ) M=m-samples,) – test target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

  • clf (callable, always as a function, classifier estimator) –

    A supervised (or semi-supervised) predictor with a finite set of discrete possible output values. A classifier supports modeling some of binary, multiclass, multilabel, or multiclass multioutput targets. Within scikit-learn, all classifiers support multi-class classification, defaulting to using a one-vs-rest strategy over the binary classification problem. Classifiers must store a classes_ attribute after fitting, and usually inherit from base.ClassifierMixin, which sets their _estimator_type attribute. A classifier can be distinguished from other estimators with is_classifier. It must implement:

    * fit
    * predict
    * score
    

    It may also be appropriate to implement decision_function, predict_proba and predict_log_proba.

  • random_state (int, default {42}) – state of shuffling the data

  • test_size (float < 1 , default {.3}) – the size to keep remainder data into the test set .

  • split (bool, False) – Split (X,y) data into a training and test sets(Xt, yt). Here, it value is triggered to True, we assume (X, y) previously given are all the whole dataset with target y.

  • n_components (int, float 2 , default {2}) – the number of principal component to retrieve. If value is given as a ratio for instance ‘.95’ i.e. the ratio of keeping variance is 95% and the n_components can be get using the attributes scikit-learn getter as `<estimator>.n_components_

  • view (str , ['X', 'Xt', None]) – the kind of vizualization. ‘X’, ‘Xt’ mean the training and test set decision region visualization respectively. If set to ``None``(default), the view are muted.

  • resolution (float, default{.02}) – level of the extension of numpy meshgrip to tighting layout the plot.

  • return_expl_variance_ratio (bool, default is {False}) – returns the PCA variance ratio explaines of all principal components.

  • return_axes (bool, default=False,) – Return matplotlib object axe

  • ax (Matplotlib.Axes object, optional) – If not supplied, it is created.

  • kws (dict) – Additional keywords arguments passed to the scikit-learn function sklearn.model_selection.train_test_split()

Returns:

X PCA training set transformed or PCA explained variance ratio.

Return type:

nd-array | arraylike (return_expl_variance_ratio=True)

Examples

>>> from watex.datasets import fetch_data
>>> from watex.exlib.sklearn import SimpleImputer, LogisticRegression
>>> from watex.analysis.decomposition import decision_region
>>> data= fetch_data("bagoue original").get('data=dfy1') # encoded flow categories
>>> y = data.flow ; X= data.drop(columns='flow')
>>> # select the numerical features
>>> X =selectfeatures(X, include ='number')
>>> # imputed the missing data
>>> X = SimpleImputer().fit_transform(X)
>>> lr_clf = LogisticRegression(multi_class ='ovr', random_state =1, solver ='lbfgs')
>>> Xpca= decision_region(X, y, clf=lr_clf, split = True, view ='Xt') # test set view
>>> Xpca[0]
... array([-1.02925449,  1.42195127])