watex.models.BaseEvaluation#

class watex.models.BaseEvaluation(estimator, cv=4, pipeline=None, prefit=False, scoring='nmse', random_state=42, verbose=0)[source]#

Evaluation of dataset using a base estimator.

Quick evaluation of the data after preparing and pipeline constructions.

Parameters
  • estimator (Callable,) – estimator for trainset and label evaluating; something like a class that implements a fit methods. Refer to https://scikit-learn.org/stable/modules/classes.html

  • cv (float,) –

    A cross validation splitting strategy. It used in cross-validation based routines. cv is also available in estimators such as multioutput. ClassifierChain or calibration.CalibratedClassifierCV which use the predictions of one estimator as training data for another, to not overfit the training supervision. Possible inputs for cv are usually:

    * An integer, specifying the number of folds in K-fold cross validation.
        K-fold will be stratified over classes if the estimator is a classifier
        (determined by base.is_classifier) and the targets may represent a
        binary or multiclass (but not multioutput) classification problem
        (determined by utils.multiclass.type_of_target).
    * A cross-validation splitter instance. Refer to the User Guide for
        splitters available within `Scikit-learn`_
    * An iterable yielding train/test splits.
    
    With some exceptions (especially where not using cross validation at all

    is an option), the default is 4-fold.

    The default is 4.

  • scoring (str,) – Specifies the score function to be maximized (usually by cross validation), or – in some cases – multiple score functions to be reported. The score function can be a string accepted by sklearn.metrics.get_scorer() or a callable scorer, not to be confused with an evaluation metric, as the latter have a more diverse API. scoring may also be set to None, in which case the estimator’s score method is used. See slearn.scoring_parameter in the Scikit-learn User Guide.

  • pipeline (Callable or Pipeline object) – If pipeline is given , X is transformed accordingly, Otherwise evaluation is made using purely the base estimator with the given X. Refer to https://scikit-learn.org/stable/modules/classes.html#module-sklearn.pipeline for further details.

  • kind (str, default ='GridSearchCV') – Kind of grid search method. Could be GridSearchCV or RandomizedSearchCV.

  • prefit (bool, default=False,) – If False, does not need to compute the cross validation score once again and True otherwise.

  • random_state (int, RandomState instance or None, default=None) – Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls..

Examples

>>> import watex as wx
>>> from watex.datasets import load_bagoue
>>> from watex.models import BaseEvaluation
>>> X, y = load_bagoue (as_frame =True )
>>> # categorizing the labels
>>> yc = wx.smart_label_classifier (y , values = [1, 3, 10 ],
                                 # labels =['FR0', 'FR1', 'FR2', 'FR4']
                                 )
>>> # drop the subjective columns ['num', 'name']
>>> X = X.drop (columns = ['num', 'name'])
>>> # X = wx.cleaner (X , columns = 'num name', mode='drop')
>>> X.columns
Index(['shape', 'type', 'geol', 'east', 'north', 'power', 'magnitude', 'sfi',
       'ohmS', 'lwi'],
      dtype='object')
>>> X =  wx.naive_imputer ( X, mode ='bi-impute') # impute data
>>> # create a pipeline for X
>>> pipe = wx.make_naive_pipe (X)
>>> Xtrain, Xtest, ytrain, ytest = wx.sklearn.train_test_split(X, yc)
>>> b = BaseEvaluation (estimator= wx.sklearn.RandomForestClassifier,
                        scoring = 'accuracy', pipeline = pipe)
>>> b.fit(Xtrain, ytrain ) # accepts only array
>>> b.cv_scores_
Out[174]: array([0.75409836, 0.72131148, 0.73333333, 0.78333333])
>>> ypred = b.predict(Xtest)
>>> scores = wx.sklearn.accuracy_score (ytest, ypred)
0.7592592592592593