class watex.cases.modeling.BaseModel(data_fn=None, df=None, **kwargs)[source]#

Bases: object

Base model class. The most interesting and challenging part of modeling is the tuning hyperparameters after designing a composite estimator. Getting the best params is a better way to reorginize the created pipeline {transformers +estimators} so to have a great capability of data generalization.

Parameters:
  • *dataf_fn* (str) – Path to analysis data file.

  • *df* (pd.Core.DataFrame) – Dataframe of features for analysis . Must be contains of main parameters including the target name of pd.Core.series of columns of df.

  • arguments (Holds on others optionals infos in kwargs) –

  • ======================================= (================= ============) –

  • Description (Attributes Type) –

  • =======================================

  • estimator. (auto bool Trigger the composite) – If True a SVC-composite estimator preprocessor is given. default is False.

  • model (pipelines dict Collect your own pipeline for) – preprocessor trigging. it should be find automatically.

  • None (estimators Callable A given estimator. If) – is auto-selected as default estimator.

  • SVM – is auto-selected as default estimator.

  • test (model_score float/dict Model test score. Observe your) – model score using your compose estimator for enhancement or your own pipelines.

  • for (processor Callable Compose piplenes and estimators) – as well as the compose estimator enhancement.

  • for – default model scorage.

  • =======================================

Examples

>>> from watex.bases.modeling import BaseModel
>>> from sklearn.preprocessing import RobustScaler,  PolynomialFeatures
>>> from sklearn.feature_selection import SelectKBest, f_classif
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.compose import make_column_selector
>>> estimator2= RandomForestClassifier()
>>> modelObj = BaseModel(
...     data_fn ='data/geo_fdata/BagoueDataset2.xlsx',
...     pipelines = {
...            'num_column_selector_': make_column_selector(
...                dtype_include=np.number),
...            'cat_column_selector_': make_column_selector(
...                dtype_exclude=np.number),
...            'features_engineering_':PolynomialFeatures(
...                2, include_bias=False),
...            'selectors_': SelectKBest(f_classif, k=2),
...            'encodages_': RobustScaler()
...              },
...     estimator = RandomForestClassifier()
...        )
property feature_importances_#

Get the bar plot of features importances. If the estimator has not feature_importances_ attributes, it will raise an error.

get_learning_curve(estimator=None, X_train=None, y_train=None, learning_curve_kws=None, **kws)[source]#

Compute the train score and validation curve to visualize your learning curve.

Parameters:
  • estimator – The creating model. If None

  • X_train – pd.core.frame.DataFrame of selected trainset

  • x_test – pd.DataFrame of selected Data for testset

  • y_train – array_like of selected data for evaluation set.

  • y_test – array_like of selected data for model test

  • val_kws

    validation_curve keywords arguments. if none the default should be:

    val_curve_kws = {"param_name":'C',
                 "param_range": np.arange(1,210,10),
                 "cv":4}
    

Returns:

  • train_score: float|dict of trainset score.

  • val_score : float/dict of valisation score.

  • switch: Turn on or off the learning curve of validation

    curve.

-trigDec: Trigger the decorator. - N: number of param range for plotting.

Example:
>>> from watex.bases.modeling import BaseModel
>>> processObj = BaseModel(
    data_fn = 'data/geo_fdata/BagoueDataset2.xlsx')
>>> processObj.get_learning_curve (
    switch_plot='on', preprocessor=True)
get_model_prediction(estimator=None, X_test=None, y_test=None, **kws)[source]#

Get the model prediction and quick plot using the surche decorator.

The decorator holds many keyword arguments to customize plot. Refer to watex.utils.decorator.predPlot.

Parameters:
  • estimator – The creating model. If None

  • x_test – pd.DataFrame of selected Data for testset

  • y_test – array_like of selected data for model test

  • kws – Additional keywords arguments which refer to the data_fn df and pipelines parameters.

  • switch – Turn on or off the decorator.

Example:
>>> from watex.modeling.sl import Modeling
>>> modelObj = Modeling(
    data_fn ='data/geo_fdata/BagoueDataset2.xlsx',
    pipelines ={
        'num_column_selector_': make_column_selector(
            dtype_include=np.number),
        'cat_column_selector_': make_column_selector(
            dtype_exclude=np.number),
        'features_engineering_':PolynomialFeatures(2,
                                        include_bias=False),
        'selectors_': SelectKBest(f_classif, k=2),
        'encodages_': RobustScaler()
          }, estimator = SVC(C=1, gamma=0.1))
>>> modelObj.get_model_prediction(estimator =testim, switch ='on')
property model_#

Get a set of processor and eestimator composed of the composite model

property model_score#

Estimate your composite model prediction

permutation_feature_importance(estimator=None, X_train=None, y_train=None, pfi_kws=None, **kws)[source]#

Evaluation of features importance with tree estimators before shuffle and after shuffling trees.

Permutation feature importance is a model inspection technique that can be used for any fitted estimator when the data is tabular. This is especially useful for non-linear or opaque estimators. Refer to :ref:`this link <https://scikit-learn.org/stable/modules/permutation_importance.html>`_ for more details.

Parameters:
  • estimator – The estimator to evaluate the importance of features. The default is RandomForestClassifier.

  • X_train – pd.core.frame.DataFrame of selected trainset.

  • y_train – array_like of selected data for evaluation set.

  • n_estimators – Number of estimator composed the tree. The default is 100

  • n_repeats – Number of tree shuffling. The default is 10.

  • pfi_kwspermution_importance callable additional keywords arguments.

  • pfi_stype

    Type of plot. Can be : - pfi for permutation feature importance before

    and after shuffling trees

    -dendro for dendrogram plot . The default is pfi.

  • switch – Turn on or off the decorator.

Example:
>>> from watex.bases.modeling import BaseModel
>>> from sklearn.ensemble import AdaBoostClassifier
>>> modelObj = BaseModel()
>>> modelObj.permutation_feature_importance(
...    estimator = AdaBoostClassifier(random_state=7),
...    data_fn ='data/geo_fdata/BagoueDataset2.xlsx',
...     switch ='on', pfi_style='pfi')
property processor#

Get te processor after supplying the pipelines

tuning_hyperparameters(estimator=None, hyper_params=None, cv=4, grid_kws=None, **kws)[source]#

Tuning hyperparametres from existing estimator to evaluate performance. Boosting the model using the model best_param

Parameters:
  • estimator – Callable estimator or model to boost

  • hyper_params – dict of hyperparameters of the estimator

  • cv – Cross validation cutting off. the default is 4

:param grid_kws:dict of other gridSearch parameters

Example:
>>> from watex.modeling.basics import SLModeling
>>> from sklearn.preprocessing import RobustScaler,PolynomialFeatures
>>> from sklearn.feature_selection import SelectKBest, f_classif
>>> from sklearn.svm import SVC
>>> from sklearn.compose import make_column_selector
>>> my_own_pipelines= {
        'num_column_selector_': make_column_selector(
            dtype_include=np.number),
        'cat_column_selector_': make_column_selector(
            dtype_exclude=np.number),
        'features_engineering_':PolynomialFeatures(
            3, include_bias=False),
        'selectors_': SelectKBest(f_classif, k=3),
        'encodages_': RobustScaler()
          }
>>> my_estimator = SVC(C=1, gamma=1e-4, random_state=7)
>>> modelObj = SLModeling(data_fn ='data/geo_fdata/BagoueDataset2.xlsx',
               pipelines =my_own_pipelines ,
               estimator = my_estimator)
>>> hyperparams ={
    'columntransformer__pipeline-1__polynomialfeatures__degree':
        np.arange(2,10),
    'columntransformer__pipeline-1__selectkbest__k': np.arange(2,7),
    'svc__C': [1, 10, 100],
    'svc__gamma':[1e-1, 1e-2, 1e-3]}
>>> my_compose_estimator_ = modelObj.model_
>>> modelObj.tuning_hyperparameters(
                            estimator= my_compose_estimator_ ,
                            hyper_params= hyperparams,
                            search='rand')
>>> modelObj.best_params_
>>> modelObj.best_score_