<no title> — watex 0.1.6.dev220+gcf54d39.d20230309 documentation

class watex.cases.modeling.BaseModel(data_fn=None, df=None, **kwargs)[source]#

Bases: object

Base model class. The most interesting and challenging part of modeling is the tuning hyperparameters after designing a composite estimator. Getting the best params is a better way to reorginize the created pipeline {transformers +estimators} so to have a great capability of data generalization.

Parameters:

*dataf_fn* (str) – Path to analysis data file.
*df* (pd.Core.DataFrame) – Dataframe of features for analysis . Must be contains of main parameters including the target name of pd.Core.series of columns of df.
arguments (Holds on others optionals infos in kwargs) –
======================================= (================= ============) –
Description (Attributes Type) –
======================================= –
estimator. (auto bool Trigger the composite) – If True a SVC-composite estimator preprocessor is given. default is False.
model (pipelines dict Collect your own pipeline for) – preprocessor trigging. it should be find automatically.
None (estimators Callable A given estimator. If) – is auto-selected as default estimator.
SVM – is auto-selected as default estimator.
test (model_score float/dict Model test score. Observe your) – model score using your compose estimator for enhancement or your own pipelines.
for (processor Callable Compose piplenes and estimators) – as well as the compose estimator enhancement.
for – default model scorage.
======================================= –

Examples

>>> from watex.bases.modeling import BaseModel
>>> from sklearn.preprocessing import RobustScaler,  PolynomialFeatures
>>> from sklearn.feature_selection import SelectKBest, f_classif
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.compose import make_column_selector
>>> estimator2= RandomForestClassifier()
>>> modelObj = BaseModel(
...     data_fn ='data/geo_fdata/BagoueDataset2.xlsx',
...     pipelines = {
...            'num_column_selector_': make_column_selector(
...                dtype_include=np.number),
...            'cat_column_selector_': make_column_selector(
...                dtype_exclude=np.number),
...            'features_engineering_':PolynomialFeatures(
...                2, include_bias=False),
...            'selectors_': SelectKBest(f_classif, k=2),
...            'encodages_': RobustScaler()
...              },
...     estimator = RandomForestClassifier()
...        )

property feature_importances_#: Get the bar plot of features importances. If the estimator has not feature_importances_ attributes, it will raise an error.

get_learning_curve(estimator=None, X_train=None, y_train=None, learning_curve_kws=None, **kws)[source]#

Compute the train score and validation curve to visualize your learning curve.

Parameters:

estimator – The creating model. If None
X_train – pd.core.frame.DataFrame of selected trainset
x_test – pd.DataFrame of selected Data for testset
y_train – array_like of selected data for evaluation set.
y_test – array_like of selected data for model test

val_kws –

validation_curve keywords arguments. if none the default should be:

val_curve_kws = {"param_name":'C',
             "param_range": np.arange(1,210,10),
             "cv":4}

Returns:

train_score: float|dict of trainset score.
val_score : float/dict of valisation score.
switch: Turn on or off the learning curve of validation
curve.

-trigDec: Trigger the decorator. - N: number of param range for plotting.

Example:

>>> from watex.bases.modeling import BaseModel
>>> processObj = BaseModel(
    data_fn = 'data/geo_fdata/BagoueDataset2.xlsx')
>>> processObj.get_learning_curve (
    switch_plot='on', preprocessor=True)

get_model_prediction(estimator=None, X_test=None, y_test=None, **kws)[source]#

Get the model prediction and quick plot using the surche decorator.

The decorator holds many keyword arguments to customize plot. Refer to watex.utils.decorator.predPlot.

Parameters:

estimator – The creating model. If None
x_test – pd.DataFrame of selected Data for testset
y_test – array_like of selected data for model test
kws – Additional keywords arguments which refer to the data_fn df and pipelines parameters.
switch – Turn on or off the decorator.

Example:

>>> from watex.modeling.sl import Modeling
>>> modelObj = Modeling(
    data_fn ='data/geo_fdata/BagoueDataset2.xlsx',
    pipelines ={
        'num_column_selector_': make_column_selector(
            dtype_include=np.number),
        'cat_column_selector_': make_column_selector(
            dtype_exclude=np.number),
        'features_engineering_':PolynomialFeatures(2,
                                        include_bias=False),
        'selectors_': SelectKBest(f_classif, k=2),
        'encodages_': RobustScaler()
          }, estimator = SVC(C=1, gamma=0.1))
>>> modelObj.get_model_prediction(estimator =testim, switch ='on')

property model_#: Get a set of processor and eestimator composed of the composite model

property model_score#: Estimate your composite model prediction

permutation_feature_importance(estimator=None, X_train=None, y_train=None, pfi_kws=None, **kws)[source]#

Evaluation of features importance with tree estimators before shuffle and after shuffling trees.

Permutation feature importance is a model inspection technique that can be used for any fitted estimator when the data is tabular. This is especially useful for non-linear or opaque estimators. Refer to :ref:`this link <https://scikit-learn.org/stable/modules/permutation_importance.html>`_ for more details.

Parameters:

estimator – The estimator to evaluate the importance of features. The default is RandomForestClassifier.
X_train – pd.core.frame.DataFrame of selected trainset.
y_train – array_like of selected data for evaluation set.
n_estimators – Number of estimator composed the tree. The default is 100
n_repeats – Number of tree shuffling. The default is 10.
pfi_kws – permution_importance callable additional keywords arguments.
pfi_stype –
Type of plot. Can be : - pfi for permutation feature importance before

and after shuffling trees

-dendro for dendrogram plot . The default is pfi.
switch – Turn on or off the decorator.

Example:

>>> from watex.bases.modeling import BaseModel
>>> from sklearn.ensemble import AdaBoostClassifier
>>> modelObj = BaseModel()
>>> modelObj.permutation_feature_importance(
...    estimator = AdaBoostClassifier(random_state=7),
...    data_fn ='data/geo_fdata/BagoueDataset2.xlsx',
...     switch ='on', pfi_style='pfi')

property processor#: Get te processor after supplying the pipelines

tuning_hyperparameters(estimator=None, hyper_params=None, cv=4, grid_kws=None, **kws)[source]#

Tuning hyperparametres from existing estimator to evaluate performance. Boosting the model using the model best_param

Parameters:

estimator – Callable estimator or model to boost
hyper_params – dict of hyperparameters of the estimator
cv – Cross validation cutting off. the default is 4

:param grid_kws:dict of other gridSearch parameters

Example:

>>> from watex.modeling.basics import SLModeling
>>> from sklearn.preprocessing import RobustScaler,PolynomialFeatures
>>> from sklearn.feature_selection import SelectKBest, f_classif
>>> from sklearn.svm import SVC
>>> from sklearn.compose import make_column_selector
>>> my_own_pipelines= {
        'num_column_selector_': make_column_selector(
            dtype_include=np.number),
        'cat_column_selector_': make_column_selector(
            dtype_exclude=np.number),
        'features_engineering_':PolynomialFeatures(
            3, include_bias=False),
        'selectors_': SelectKBest(f_classif, k=3),
        'encodages_': RobustScaler()
          }
>>> my_estimator = SVC(C=1, gamma=1e-4, random_state=7)
>>> modelObj = SLModeling(data_fn ='data/geo_fdata/BagoueDataset2.xlsx',
               pipelines =my_own_pipelines ,
               estimator = my_estimator)
>>> hyperparams ={
    'columntransformer__pipeline-1__polynomialfeatures__degree':
        np.arange(2,10),
    'columntransformer__pipeline-1__selectkbest__k': np.arange(2,7),
    'svc__C': [1, 10, 100],
    'svc__gamma':[1e-1, 1e-2, 1e-3]}
>>> my_compose_estimator_ = modelObj.model_
>>> modelObj.tuning_hyperparameters(
                            estimator= my_compose_estimator_ ,
                            hyper_params= hyperparams,
                            search='rand')
>>> modelObj.best_params_
>>> modelObj.best_score_