watex.analysis.decision_region#

watex.analysis.decision_region(X, y, clf, Xt=None, yt=None, random_state=42, test_size=0.3, scaling=True, split=False, n_components=2, view='X', resolution=0.02, return_expl_variance_ratio=False, return_axe=False, axe=None, **kws)[source]#

View decision regions for the training data reduced to two principal component axes.

Parameters

X (Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.
y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.
Xt (Ndarray ( M x N matrix where M=m-samples, & N=n-features)) – Shorthand for “test set”; data that is observed at testing and prediction time, used as independent variables in learning.The notation is uppercase to denote that it is ordinarily a matrix.
yt (array-like, shape (M, ) M=m-samples,) – test target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.
clf (callable, always as a function, classifier estimator) –
A supervised (or semi-supervised) predictor with a finite set of discrete possible output values. A classifier supports modeling some of binary, multiclass, multilabel, or multiclass multioutput targets. Within scikit-learn, all classifiers support multi-class classification, defaulting to using a one-vs-rest strategy over the binary classification problem. Classifiers must store a classes_ attribute after fitting, and usually inherit from base.ClassifierMixin, which sets their _estimator_type attribute. A classifier can be distinguished from other estimators with is_classifier. It must implement:
```
* fit
* predict
* score
```
It may also be appropriate to implement decision_function, predict_proba and predict_log_proba.
random_state (int, default {42}) – state of shuffling the data
test_size (float < 1 , default {.3}) – the size to keep remainder data into the test set .
split (bool, False) – Split (X,y) data into a training and test sets(Xt, yt). Here, it value is triggered to True, we assume (X, y) previously given are all the whole dataset with target y.
n_components (int, float 2 , default {2}) – the number of principal component to retrieve. If value is given as a ratio for instance ‘.95’ i.e. the ratio of keeping variance is 95% and the n_components can be get using the attributes scikit-learn getter as `<estimator>.n_components_
view (str , ['X', 'Xt', None]) – the kind of vizualization. ‘X’, ‘Xt’ mean the training and test set decision region visualization respectively. If set to ``None``(default), the view are muted.
resolution (float, default{.02}) – level of the extension of numpy meshgrip to tighting layout the plot.
return_expl_variance_ratio (bool, default is {False}) – returns the PCA variance ratio explaines of all principal components.
return_axes (bool, default=False,) – Return matplotlib object axe
ax (Matplotlib.Axes object, optional) – If not supplied, it is created.
kws (dict) – Additional keywords arguments passed to the scikit-learn function sklearn.model_selection.train_test_split()

Returns

X PCA training set transformed or PCA explained variance ratio.

Return type

nd-array | arraylike (return_expl_variance_ratio=True)

Examples

>>> from watex.datasets import fetch_data
>>> from watex.exlib.sklearn import SimpleImputer, LogisticRegression
>>> from watex.analysis.decomposition import decision_region
>>> data= fetch_data("bagoue original").get('data=dfy1') # encoded flow categories
>>> y = data.flow ; X= data.drop(columns='flow')
>>> # select the numerical features
>>> X =selectfeatures(X, include ='number')
>>> # imputed the missing data
>>> X = SimpleImputer().fit_transform(X)
>>> lr_clf = LogisticRegression(multi_class ='ovr', random_state =1, solver ='lbfgs')
>>> Xpca= decision_region(X, y, clf=lr_clf, split = True, view ='Xt') # test set view
>>> Xpca[0]
... array([-1.02925449,  1.42195127])