watex.models.GridSearchMultiple#

class watex.models.GridSearchMultiple(estimators, scoring, grid_params, *, kind='GridSearchCV', cv=7, random_state=42, savejob=False, filename=None, verbose=0, **grid_kws)[source]#

Search and find multiples best parameters from differents estimators.

Parameters

estimators (list of callable obj) –
list of estimator objects to fine-tune their hyperparameters For instance:

random_state=42 # build estimators logreg_clf = LogisticRegression(random_state =random_state) linear_svc_clf = LinearSVC(random_state =random_state) sgd_clf = SGDClassifier(random_state = random_state) svc_clf = SVC(random_state =random_state)

)

estimators =(svc_clf,linear_svc_clf, logreg_clf, sgd_clf )

grid_params (list) –

list of parameters Grids. For instance:

grid_params= ([
dict(C=[1e-2, 1e-1, 1, 10, 100], gamma=[5, 2, 1, 1e-1, 1e-2, 1e-3],
             kernel=['rbf']),
dict(kernel=['poly'],degree=[1, 3,5, 7], coef0=[1, 2, 3],
 'C': [1e-2, 1e-1, 1, 10, 100])],
[dict(C=[1e-2, 1e-1, 1, 10, 100], loss=['hinge'])],
[dict()], [dict()]
)

cv (float,) –

A cross validation splitting strategy. It used in cross-validation based routines. cv is also available in estimators such as multioutput. ClassifierChain or calibration.CalibratedClassifierCV which use the predictions of one estimator as training data for another, to not overfit the training supervision. Possible inputs for cv are usually:

* An integer, specifying the number of folds in K-fold cross validation.
    K-fold will be stratified over classes if the estimator is a classifier
    (determined by base.is_classifier) and the targets may represent a
    binary or multiclass (but not multioutput) classification problem
    (determined by utils.multiclass.type_of_target).
* A cross-validation splitter instance. Refer to the User Guide for
    splitters available within `Scikit-learn`_
* An iterable yielding train/test splits.

With some exceptions (especially where not using cross validation at all: is an option), the default is 4-fold.

scoring (str,) – Specifies the score function to be maximized (usually by cross validation), or – in some cases – multiple score functions to be reported. The score function can be a string accepted by sklearn.metrics.get_scorer() or a callable scorer, not to be confused with an evaluation metric, as the latter have a more diverse API. scoring may also be set to None, in which case the estimator’s score method is used. See slearn.scoring_parameter in the Scikit-learn User Guide.
kind (str, default='GridSearchCV' or '1') – Kind of grid parameter searches. Can be 1 for GridSearchCV or 2 for RandomizedSearchCV.
random_state (int, RandomState instance or None, default=None) – Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls..
savejob (bool, default=False) – Save your model parameters to external file using ‘joblib’ or Python persistent ‘pickle’ module. Default sorted to ‘joblib’ format.
verbose (int, default is 0) – Control the level of verbosity. Higher value lead to more messages.
grid_kws (dict,) – Argument passed to grid_method additional keywords.

Examples

>>> from watex.models import GridSearchMultiple , displayFineTunedResults
>>> from watex.exlib import LinearSVC, SGDClassifier, SVC, LogisticRegression
>>> X, y  = wx.fetch_data ('bagoue prepared')
>>> X
... <344x18 sparse matrix of type '<class 'numpy.float64'>'
... with 2752 stored elements in Compressed Sparse Row format>
>>> # As example, we can build 04 estimators and provide their
>>> # grid parameters range for fine-tuning as ::
>>> random_state=42
>>> logreg_clf = LogisticRegression(random_state =random_state)
>>> linear_svc_clf = LinearSVC(random_state =random_state)
>>> sgd_clf = SGDClassifier(random_state = random_state)
>>> svc_clf = SVC(random_state =random_state)
>>> estimators =(svc_clf,linear_svc_clf, logreg_clf, sgd_clf )
>>> grid_params= ([dict(C=[1e-2, 1e-1, 1, 10, 100],
                        gamma=[5, 2, 1, 1e-1, 1e-2, 1e-3],kernel=['rbf']),
                   dict(kernel=['poly'],degree=[1, 3,5, 7], coef0=[1, 2, 3],
                        C= [1e-2, 1e-1, 1, 10, 100])],
                [dict(C=[1e-2, 1e-1, 1, 10, 100], loss=['hinge'])],
                [dict()], # we just no provided parameter for demo
                [dict()]
                )
>>> #Now  we can call :class:`watex.models.GridSearchMultiple` for
>>> # training and self-validating as:
>>> gobj = GridSearchMultiple(estimators = estimators,
                       grid_params = grid_params ,
                       cv =4,
                       scoring ='accuracy',
                       verbose =1,   #> 7 put more verbose
                       savejob=False ,  # set true to save job in binary disk file.
                       kind='GridSearchCV').fit(X, y)
>>> # Once the parameters are fined tuned, we can display the fined tuning
>>> # results using displayFineTunedResults`` function
>>> displayFineTunedResults (gobj.models.values_)
MODEL NAME = SVC
BEST PARAM = {'C': 100, 'gamma': 0.01, 'kernel': 'rbf'}
BEST ESTIMATOR = SVC(C=100, gamma=0.01, random_state=42)

MODEL NAME = LinearSVC BEST PARAM = {‘C’: 100, ‘loss’: ‘hinge’} BEST ESTIMATOR = LinearSVC(C=100, loss=’hinge’, random_state=42)

MODEL NAME = LogisticRegression BEST PARAM = {} BEST ESTIMATOR = LogisticRegression(random_state=42)

MODEL NAME = SGDClassifier BEST PARAM = {} BEST ESTIMATOR = SGDClassifier(random_state=42)

Notes

Call get_scorers() or use sklearn.metrics.SCORERS.keys() to get all the metrics used to evaluate model errors. Can be any others metrics in ~metrics.metrics.SCORERS.keys(). Furthermore if scoring is set to None nmse is used as default value for ‘neg_mean_squared_error’`.