watex.models package#
Models sub-package focuses on training and validation phases. It also composed
of a set of grid-search tricks from model hyperparameters fine-tuning and
the pretrained models fetching from validation and
premodels respectively. Modules of ‘Models’ sub-package
expect the predictor \(X\) and the target \(y\) to be preprocessed.
- class watex.models.BaseEvaluation(base_estimator, cv=4, pipeline=None, prefit=False, scoring='nmse', random_state=42)[source]#
Bases:
objectEvaluation of dataset using a base estimator.
Quick evaluation of the data after preparing and pipeline constructions.
- Parameters:
base_estimator (Callable,) – estimator for trainset and label evaluating; something like a class that implements a fit methods. Refer to https://scikit-learn.org/stable/modules/classes.html
cv (float,) –
A cross validation splitting strategy. It used in cross-validation based routines. cv is also available in estimators such as multioutput. ClassifierChain or calibration.CalibratedClassifierCV which use the predictions of one estimator as training data for another, to not overfit the training supervision. Possible inputs for cv are usually:
* An integer, specifying the number of folds in K-fold cross validation. K-fold will be stratified over classes if the estimator is a classifier (determined by base.is_classifier) and the targets may represent a binary or multiclass (but not multioutput) classification problem (determined by utils.multiclass.type_of_target). * A cross-validation splitter instance. Refer to the User Guide for splitters available within `Scikit-learn`_ * An iterable yielding train/test splits.- With some exceptions (especially where not using cross validation at all
is an option), the default is
4-fold.
The default is
4.scoring (str,) – Specifies the score function to be maximized (usually by cross validation), or – in some cases – multiple score functions to be reported. The score function can be a string accepted by
sklearn.metrics.get_scorer()or a callable scorer, not to be confused with an evaluation metric, as the latter have a more diverse API.scoringmay also be set to None, in which case the estimator’s score method is used. See slearn.scoring_parameter in the Scikit-learn User Guide.pipeline (Callable or
Pipelineobject) – If pipeline is given , X is transformed accordingly, Otherwise evaluation is made using purely the base estimator with the given X. Refer to https://scikit-learn.org/stable/modules/classes.html#module-sklearn.pipeline for further details.kind (str, default ='GridSearchCV') – Kind of grid search method. Could be
GridSearchCVorRandomizedSearchCV.prefit (bool, default=False,) – If
False, does not need to compute the cross validation score once again andTrueotherwise.random_state (int, RandomState instance or None, default=None) – Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls..
- property base_estimator#
- fit(X, y, sample_weight=0.75)[source]#
Quick methods used to evaluate eastimator, display the error results as well as the sample model_predictions.
- Parameters:
X (Ndarray ( M x N matrix where
M=m-samples, &N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample.Xmay also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.y (array-like, shape (M, )
M=m-samples,) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.sample_weight (float,default = .75) – The ratio to sample X and y. The default sample 3/4 percent of the data. If given, will sample the X and y. If
None, will sample the half of the data.
- Returns:
`self` –
BaseEvaluationobject.- Return type:
- class watex.models.GridSearch(base_estimator, grid_params, cv=4, kind='GridSearchCV', scoring='nmse', verbose=0, **grid_kws)[source]#
Bases:
objectFine-tune hyperparameters using grid search methods.
Search Grid will be able to fiddle with the hyperparameters until to
- Parameters:
base_estimator (Callable,) – estimator for trainset and label evaluating; something like a class that implements a fit method. Refer to https://scikit-learn.org/stable/modules/classes.html
grid_params (list of dict,) –
list of hyperparameters params to be fine-tuned.For instance:
param_grid=[dict( kpca__gamma=np.linspace(0.03, 0.05, 10), kpca__kernel=["rbf", "sigmoid"] )]
pipeline (Callable or
Pipelineobject) – If pipeline is given , X is transformed accordingly, Otherwise evaluation is made using purely the base estimator with the given X.prefit (bool, default=False,) – If
False, does not need to compute the cross validation score once again andTrueotherwise.cv (float,) –
A cross validation splitting strategy. It used in cross-validation based routines. cv is also available in estimators such as multioutput. ClassifierChain or calibration.CalibratedClassifierCV which use the predictions of one estimator as training data for another, to not overfit the training supervision. Possible inputs for cv are usually:
* An integer, specifying the number of folds in K-fold cross validation. K-fold will be stratified over classes if the estimator is a classifier (determined by base.is_classifier) and the targets may represent a binary or multiclass (but not multioutput) classification problem (determined by utils.multiclass.type_of_target). * A cross-validation splitter instance. Refer to the User Guide for splitters available within `Scikit-learn`_ * An iterable yielding train/test splits.- With some exceptions (especially where not using cross validation at all
is an option), the default is
4-fold.
The default is
4.kind (str, default='GridSearchCV' or '1') – Kind of grid parameter searches. Can be
1forGridSearchCVor2forRandomizedSearchCV.scoring (str,) – Specifies the score function to be maximized (usually by cross validation), or – in some cases – multiple score functions to be reported. The score function can be a string accepted by
sklearn.metrics.get_scorer()or a callable scorer, not to be confused with an evaluation metric, as the latter have a more diverse API.scoringmay also be set to None, in which case the estimator’s score method is used. See slearn.scoring_parameter in the Scikit-learn User Guide.random_state (int, RandomState instance or None, default=None) – Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls..
Examples
>>> from pprint import pprint >>> from watex.datasets import fetch_data >>> from watex.models.validation import GridSearch >>> from watex.exlib.sklearn import RandomForestClassifier >>> X_prepared, y_prepared =fetch_data ('bagoue prepared') >>> grid_params = [ dict( ... n_estimators=[3, 10, 30], max_features=[2, 4, 6, 8]), ... dict(bootstrap=[False], n_estimators=[3, 10], ... max_features=[2, 3, 4]) ... ] >>> forest_clf = RandomForestClassifier() >>> grid_search = GridSearch(forest_clf, grid_params) >>> grid_search.fit(X= X_prepared,y = y_prepared,) >>> pprint(grid_search.best_params_ ) {'max_features': 8, 'n_estimators': 30} >>> pprint(grid_search.cv_results_)
- property base_estimator#
Return the base estimator class
- best_estimator_#
- best_params_#
- cv#
- cv_results_#
- feature_importances_#
- fit(X, y)[source]#
Fit method using base Estimator and populate gridSearch attributes.
- Parameters:
X (Ndarray ( M x N) matrix where
M=m-samples, &N=n-features)) – Training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample.Xmay also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.y (array-like, shape (M, )
M=m-samples,) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.
- Returns:
``self`` – Returns
GridSearch- Return type:
GridSearch
- grid_kws#
- grid_params#
- property kind#
Kind of searched. RandomizedSearchCV or GridSearchCV.
- scoring#
- verbose#
- class watex.models.GridSearchMultiple(estimators, scoring, grid_params, *, kind='GridSearchCV', cv=7, random_state=42, savejob=False, filename=None, verbose=0, **grid_kws)[source]#
Bases:
objectSearch and find multiples best parameters from differents estimators.
- Parameters:
estimators (list of callable obj) –
list of estimator objects to fine-tune their hyperparameters For instance:
random_state=42 # build estimators logreg_clf = LogisticRegression(random_state =random_state) linear_svc_clf = LinearSVC(random_state =random_state) sgd_clf = SGDClassifier(random_state = random_state) svc_clf = SVC(random_state =random_state)
)
estimators =(svc_clf,linear_svc_clf, logreg_clf, sgd_clf )
grid_params (list) –
list of parameters Grids. For instance:
grid_params= ([ dict(C=[1e-2, 1e-1, 1, 10, 100], gamma=[5, 2, 1, 1e-1, 1e-2, 1e-3], kernel=['rbf']), dict(kernel=['poly'],degree=[1, 3,5, 7], coef0=[1, 2, 3], 'C': [1e-2, 1e-1, 1, 10, 100])], [dict(C=[1e-2, 1e-1, 1, 10, 100], loss=['hinge'])], [dict()], [dict()] )
cv (float,) –
A cross validation splitting strategy. It used in cross-validation based routines. cv is also available in estimators such as multioutput. ClassifierChain or calibration.CalibratedClassifierCV which use the predictions of one estimator as training data for another, to not overfit the training supervision. Possible inputs for cv are usually:
* An integer, specifying the number of folds in K-fold cross validation. K-fold will be stratified over classes if the estimator is a classifier (determined by base.is_classifier) and the targets may represent a binary or multiclass (but not multioutput) classification problem (determined by utils.multiclass.type_of_target). * A cross-validation splitter instance. Refer to the User Guide for splitters available within `Scikit-learn`_ * An iterable yielding train/test splits.- With some exceptions (especially where not using cross validation at all
is an option), the default is
4-fold.
scoring (str,) – Specifies the score function to be maximized (usually by cross validation), or – in some cases – multiple score functions to be reported. The score function can be a string accepted by
sklearn.metrics.get_scorer()or a callable scorer, not to be confused with an evaluation metric, as the latter have a more diverse API.scoringmay also be set to None, in which case the estimator’s score method is used. See slearn.scoring_parameter in the Scikit-learn User Guide.kind (str, default='GridSearchCV' or '1') – Kind of grid parameter searches. Can be
1forGridSearchCVor2forRandomizedSearchCV.random_state (int, RandomState instance or None, default=None) – Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls..
savejob (bool, default=False) – Save your model parameters to external file using ‘joblib’ or Python persistent ‘pickle’ module. Default sorted to ‘joblib’ format.
verbose (int, default is
0) – Control the level of verbosity. Higher value lead to more messages.grid_kws (dict,) – Argument passed to grid_method additional keywords.
Examples
>>> from watex.models import GridSearchMultiple , displayFineTunedResults >>> from watex.exlib import LinearSVC, SGDClassifier, SVC, LogisticRegression >>> X, y = wx.fetch_data ('bagoue prepared') >>> X ... <344x18 sparse matrix of type '<class 'numpy.float64'>' ... with 2752 stored elements in Compressed Sparse Row format> >>> # As example, we can build 04 estimators and provide their >>> # grid parameters range for fine-tuning as :: >>> random_state=42 >>> logreg_clf = LogisticRegression(random_state =random_state) >>> linear_svc_clf = LinearSVC(random_state =random_state) >>> sgd_clf = SGDClassifier(random_state = random_state) >>> svc_clf = SVC(random_state =random_state) >>> estimators =(svc_clf,linear_svc_clf, logreg_clf, sgd_clf ) >>> grid_params= ([dict(C=[1e-2, 1e-1, 1, 10, 100], gamma=[5, 2, 1, 1e-1, 1e-2, 1e-3],kernel=['rbf']), dict(kernel=['poly'],degree=[1, 3,5, 7], coef0=[1, 2, 3], C= [1e-2, 1e-1, 1, 10, 100])], [dict(C=[1e-2, 1e-1, 1, 10, 100], loss=['hinge'])], [dict()], # we just no provided parameter for demo [dict()] ) >>> #Now we can call :class:`watex.models.GridSearchMultiple` for >>> # training and self-validating as: >>> gobj = GridSearchMultiple(estimators = estimators, grid_params = grid_params , cv =4, scoring ='accuracy', verbose =1, #> 7 put more verbose savejob=False , # set true to save job in binary disk file. kind='GridSearchCV').fit(X, y) >>> # Once the parameters are fined tuned, we can display the fined tuning >>> # results using displayFineTunedResults`` function >>> displayFineTunedResults (gobj.models.values_) MODEL NAME = SVC BEST PARAM = {'C': 100, 'gamma': 0.01, 'kernel': 'rbf'} BEST ESTIMATOR = SVC(C=100, gamma=0.01, random_state=42)
MODEL NAME = LinearSVC BEST PARAM = {‘C’: 100, ‘loss’: ‘hinge’} BEST ESTIMATOR = LinearSVC(C=100, loss=’hinge’, random_state=42)
MODEL NAME = LogisticRegression BEST PARAM = {} BEST ESTIMATOR = LogisticRegression(random_state=42)
MODEL NAME = SGDClassifier BEST PARAM = {} BEST ESTIMATOR = SGDClassifier(random_state=42)
Notes
Call
get_scorers()or use sklearn.metrics.SCORERS.keys() to get all the metrics used to evaluate model errors. Can be any others metrics in ~metrics.metrics.SCORERS.keys(). Furthermore if scoring is set toNonenmseis used as default value for ‘neg_mean_squared_error’`.
- watex.models.displayCVTables(cvres, cvmodels)[source]#
Display the cross-validation results from all models at each k-fold.
- Parameters:
cvres (dict of (str, Array-like)) – cross validation results after training the models of number of parameters equals to N. The str fits the each parameter stored during the cross-validation while the value is stored in Numpy array.
cvmnodels (list) – list of fined-tuned models.
Examples
>>> from watex.datasets import fetch_data >>> from watex.models import GridSearchMultiple, displayCVTables >>> X, y = fetch_data ('bagoue prepared') >>> gobj =GridSearchMultiple(estimators = estimators, grid_params = grid_params , cv =4, scoring ='accuracy', verbose =1, savejob=False , kind='GridSearchCV') >>> gobj.fit(X, y) >>> displayCVTables (cvmodels=[gobj.models.SVC] , cvres= [gobj.models.SVC.cv_results_ ]) ...
- watex.models.displayFineTunedResults(cvmodels)[source]#
Display fined -tuning results
- Parameters:
cvmnodels (list) – list of fined-tuned models.
- watex.models.displayModelMaxDetails(cvres, cv=4)[source]#
Display the max details of each stored model from cross-validation.
- Parameters:
cvres (dict of (str, Array-like)) – cross validation results after training the models of number of parameters equals to N. The str fits the each parameter stored during the cross-validation while the value is stored in Numpy array.
cv (int, default=1) – The number of KFlod during the fine-tuning models parameters.
- watex.models.getGlobalScores(cvres)[source]#
Retrieve the global mean and standard deviation score from the cross validation containers.
- Parameters:
cvres (dict of (str, Array-like)) – cross validation results after training the models of number of parameters equals to N. The str fits the each parameter stored during the cross-validation while the value is stored in Numpy array.
- Returns:
scores on CV test data and standard deviation
- Return type:
( mean_test_scores’, ‘std_test_scores’)
- watex.models.getSplitBestScores(cvres, split=0)[source]#
Get the best score at each split from cross-validation results
- Parameters:
cvres (dict of (str, Array-like)) – cross validation results after training the models of number of parameters equals to N. The str fits the each parameter stored during the cross-validation while the value is stored in Numpy array.
split (int, default=1) – The number of split to fetch parameters. The number of split must be the number of cross-validation (cv) minus one.
- Returns:
bests – Dictionnary of the best parameters at the corresponding split in the cross-validation.
- Return type:
Dict,
- watex.models.get_best_kPCA_params(X, n_components=2, *, y=None, param_grid=None, clf=None, cv=7, **grid_kws)[source]#
Select the Kernel and hyperparameters using GridSearchCV that lead to the best performance.
As kPCA( unsupervised learning algorithm), there is obvious performance measure to help selecting the best kernel and hyperparameters values. However dimensionality reduction is often a preparation step for a supervised task(e.g. classification). So we can use grid search to select the kernel and hyperparameters that lead the best performance on that task. By default implementation we create two steps pipeline. First reducing dimensionality to two dimension using kPCA, then applying the LogisticRegression for classification. AFter use Grid searchCV to find the best
kernelandgammavalue for kPCA in oder to get the best clasification accuracy at the end of the pipeline.- Parameters:
X (Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample.
Xmay also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.
n_components (int,) – Number of dimension to preserve. If n_components is ranged between 0. to 1., it indicated the number of variance ratio to preserve.
param_grid (list) –
list of parameters grids. For instance:
param_grid=[dict( kpca__gamma=np.linspace(0.03, 0.05, 10), kpca__kernel=["rbf", "sigmoid"] )]
clf (callable, always as a function, classifier estimator) –
A supervised (or semi-supervised) predictor with a finite set of discrete possible output values. A classifier supports modeling some of binary, multiclass, multilabel, or multiclass multioutput targets. Within scikit-learn, all classifiers support multi-class classification, defaulting to using a one-vs-rest strategy over the binary classification problem. Classifiers must store a classes_ attribute after fitting, and usually inherit from base.ClassifierMixin, which sets their _estimator_type attribute. A classifier can be distinguished from other estimators with is_classifier. It must implement:
* fit * predict * score
It may also be appropriate to implement decision_function, predict_proba and predict_log_proba. It can also be a base estimator or a composite estimor with pipeline. For instance:: clf =Pipeline([ (‘kpca’, KernelPCA(n_components=n_components)) (‘log_reg’, LogisticRegression()) ])
cv (float,) –
A cross validation splitting strategy. It used in cross-validation based routines. cv is also available in estimators such as multioutput. ClassifierChain or calibration.CalibratedClassifierCV which use the predictions of one estimator as training data for another, to not overfit the training supervision. Possible inputs for cv are usually:
* An integer, specifying the number of folds in K-fold cross validation. K-fold will be stratified over classes if the estimator is a classifier (determined by base.is_classifier) and the targets may represent a binary or multiclass (but not multioutput) classification problem (determined by utils.multiclass.type_of_target). * A cross-validation splitter instance. Refer to the User Guide for splitters available within `Scikit-learn`_ * An iterable yielding train/test splits.- With some exceptions (especially where not using cross validation at all
is an option), the default is
4-fold.
grid_kws (dict,) – Additional keywords arguments passed to Grid parameters from
GridSearch
Examples
>>> from watex.analysis.dimensionality import get_best_kPCA_params >>> from watex.datasets import fetch_data >>> X, y=fetch_data('Bagoue analysis data') >>> param_grid=[dict( kpca__gamma=np.linspace(0.03, 0.05, 10), kpca__kernel=["rbf", "sigmoid"] )] >>> kpca_best_params =get_best_kPCA_params( X,y=y,scoring = 'accuracy', n_components= 2, clf=clf, param_grid=param_grid) >>> kpca_best_params ... {'kpca__gamma': 0.03, 'kpca__kernel': 'rbf'}
- watex.models.get_scorers(*, scorer=None, check_scorer=False, error='ignore')[source]#
Fetch the list of available metrics from scikit-learn or verify whether the scorer exist in that list of metrics. This is prior necessary before the model evaluation.
- Parameters:
scorer – str, Must be an metrics for model evaluation. Refer to
sklearn.metrics
- :param check_scorer:bool, default=False
Returns bool if
Truewhether the scorer exists in the list of the metrics for the model evaluation. Note that scorer`can not be ``None` if check_scorer is set toTrue.
- Parameters:
error – str, [‘raise’, ‘ignore’] raise a ValueError if scorer not found in the list of metrics and check_scorer `is ``True`.
- Returns:
- scorers: bool, tuple
Trueif scorer is in the list of metrics provided that ` scorer` is notNone, or the tuple of scikit-metrics.sklearn.metrics
- class watex.models.pModels(model='svm', target='bin', kernel=None, oob_score=False, objective='fr')[source]#
Bases:
objectPretrained Models class.
The pretrained model class is composed of estimators already trained in a case study region in West -Africa Bagoue region. Refer to Kouadio et al, 2022 for furher details. It is a set of
support vector machines, decision tree`,k-nearest neighbors,Extreme ``gradient boosting machines, benchmartvoting classifier, and ``bagging classifier. Each retrained model is considered as a class object and attributes compose the training parameters from cross-validation results.- Parameters:
- model: str
Name of the pretrained model. Note that the pretrained SVMs is composed of 04 kernels such as the
rbffor radial basis function , thepolyfor polynomial ,sigfor sigmoid andlinfor linear. Default isrbf. Each kernel is a model attributes of SVM class. For instance to retrieve the pretrained model with kernel = ‘poly’, we must use after fittingpModelsclass:>>> pModels(model='svm', kernel='poly').fit().SVM.poly.best_estimator_ ... SVC(C=128.0, coef0=7, degree=5, gamma=0.00048828125, kernel='poly', tol=0.01) >>> # or >>> pModels(model='svm', kernel='poly').fit().estimator_ ... SVC(C=128.0, coef0=7, degree=5, gamma=0.00048828125, kernel='poly', tol=0.01)
- kernel: str
kernel refers to SVM machines kernels. It can be
rbffor radial basis function , thepolyfor polynomial ,sigfor sigmoid andlinfor linear. No need to provide since it can be retrieved as an attribute of the SVM model like:>>> pModels(model='svm').fit().SVM.rbf # is an object instance >>> # to retreive the rbf values use attribute `best_estimator_ >>> pModels(model='svm').fit().SVM.rbf.best_estimator_ ... SVC(C=2.0, coef0=0, degree=1, gamma=0.125)
- target: str
Two types of classification is predicted. The binary classification
binand the multiclass classificationmulti. default isbin. When turning target tomulti, be aware that only the SVMs are trained for multiclass prediction. Futhernore, the bin consisted to predict the flow rate (FR) with label {0} and {1} where {0} means the \(FR <=1 m^3/hr\) and {1} for \(FR> 1m^3/hr\). About multi, four classes are predicted such as:\[FR0 & = & FR = 0 FR1 & = & 0 < FR <=1 m^3/hr FR2 & = & 1< FR <=3 m^3/hr FR3 & = & FR> 3 m^3/hr\]- oob_score: bool,
Out-of-bag. Setting oob_score to
true, you will retrieve some pretrained model withobb_scoreset to true when training. The pretrained models with fine-tuned model with oob_score set to true are ‘RandomForest’ and ‘Extratrees’.- objective: str, default=’fr’
Is the prediction aim goal, the reason for storing the pretrained models. The default objective is ‘fr’ i.e. for flow rate prediction. Other objectives will be added as new engineering problems are solved and published.
. _Cote d’Ivoire: https://en.wikipedia.org/wiki/Ivory_Coast
- fit(X=None, y=None, **fit_params)[source]#
Fit X and y with the pretrained models.
Note that to retrieve only the pretrained model, don’t pass anything in fit method. For instance to fetch the best SVM estimator with kernel = ‘sigmoid’, one just needs to fit:class:.pModels class as follow:
>>> pModels(model='svm', kernel='sigmoid').fit().estimator_ Out[24]: SVC(C=512.0, coef0=0, degree=1, gamma=0.001953125, kernel='sigmoid', tol=1.0)
If model=’svm’ and none kernel is passed, the
rbfis used instead as default.- Parameters:
X (Ndarray of shape ( M x N), \(M=m-samples x N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample.
Xmay also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.y (array-like of shape (M, ) :math:`M=m-samples) – train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.
- Returns:
Returns
selffor easy method chaining.- Return type:
pModelsinstance
- property inspect#
Inspect object whether is fitted or not
- pdefaults_ = [('xgboost', 'ExtremeGradientBoosting'), ('svc', 'SupportVectorClassifier'), ('dtc', 'DecisionTreeClassifier'), ('stc', 'StackingClassifier'), ('bag', 'BaggingClassifier'), ('logit', 'LogisticRegression'), ('vtc', 'VotingClassifier'), ('rdf', 'RandomForestClassifier'), ('ada', 'AdaBoostClassifier'), ('extree', 'ExtraTreesClassifier'), ('knn', 'KNeighborsClassifier')]#
- predict(X)[source]#
Predict object from the pretrained model
- Parameters:
X (Ndarray of shape ( M x N), \(M=m-samples x N=n-features\)) – training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample.
Xmay also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.- Returns:
y_pred – the predicted target values from X.
- Return type:
Array-like, shape (M, )