watex.models.BaseEvaluation#
- class watex.models.BaseEvaluation(estimator, cv=4, pipeline=None, prefit=False, scoring='nmse', random_state=42, verbose=0)[source]#
Evaluation of dataset using a base estimator.
Quick evaluation of the data after preparing and pipeline constructions.
- Parameters:
estimator (Callable,) – estimator for trainset and label evaluating; something like a class that implements a fit methods. Refer to https://scikit-learn.org/stable/modules/classes.html
cv (float,) –
A cross validation splitting strategy. It used in cross-validation based routines. cv is also available in estimators such as multioutput. ClassifierChain or calibration.CalibratedClassifierCV which use the predictions of one estimator as training data for another, to not overfit the training supervision. Possible inputs for cv are usually:
* An integer, specifying the number of folds in K-fold cross validation. K-fold will be stratified over classes if the estimator is a classifier (determined by base.is_classifier) and the targets may represent a binary or multiclass (but not multioutput) classification problem (determined by utils.multiclass.type_of_target). * A cross-validation splitter instance. Refer to the User Guide for splitters available within `Scikit-learn`_ * An iterable yielding train/test splits.- With some exceptions (especially where not using cross validation at all
is an option), the default is
4-fold.
The default is
4.scoring (str,) – Specifies the score function to be maximized (usually by cross validation), or – in some cases – multiple score functions to be reported. The score function can be a string accepted by
sklearn.metrics.get_scorer()or a callable scorer, not to be confused with an evaluation metric, as the latter have a more diverse API.scoringmay also be set to None, in which case the estimator’s score method is used. See slearn.scoring_parameter in the Scikit-learn User Guide.pipeline (Callable or
Pipelineobject) – If pipeline is given , X is transformed accordingly, Otherwise evaluation is made using purely the base estimator with the given X. Refer to https://scikit-learn.org/stable/modules/classes.html#module-sklearn.pipeline for further details.kind (str, default ='GridSearchCV') – Kind of grid search method. Could be
GridSearchCVorRandomizedSearchCV.prefit (bool, default=False,) – If
False, does not need to compute the cross validation score once again andTrueotherwise.random_state (int, RandomState instance or None, default=None) – Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls..
Examples
>>> import watex as wx >>> from watex.datasets import load_bagoue >>> from watex.models import BaseEvaluation >>> X, y = load_bagoue (as_frame =True ) >>> # categorizing the labels >>> yc = wx.smart_label_classifier (y , values = [1, 3, 10 ], # labels =['FR0', 'FR1', 'FR2', 'FR4'] ) >>> # drop the subjective columns ['num', 'name'] >>> X = X.drop (columns = ['num', 'name']) >>> # X = wx.cleaner (X , columns = 'num name', mode='drop') >>> X.columns Index(['shape', 'type', 'geol', 'east', 'north', 'power', 'magnitude', 'sfi', 'ohmS', 'lwi'], dtype='object') >>> X = wx.naive_imputer ( X, mode ='bi-impute') # impute data >>> # create a pipeline for X >>> pipe = wx.make_naive_pipe (X) >>> Xtrain, Xtest, ytrain, ytest = wx.sklearn.train_test_split(X, yc) >>> b = BaseEvaluation (estimator= wx.sklearn.RandomForestClassifier, scoring = 'accuracy', pipeline = pipe) >>> b.fit(Xtrain, ytrain ) # accepts only array >>> b.cv_scores_ Out[174]: array([0.75409836, 0.72131148, 0.73333333, 0.78333333]) >>> ypred = b.predict(Xtest) >>> scores = wx.sklearn.accuracy_score (ytest, ypred) 0.7592592592592593