- class watex.cases.modeling.BaseModel(data_fn=None, df=None, **kwargs)[source]#
Bases:
objectBase model class. The most interesting and challenging part of modeling is the tuning hyperparameters after designing a composite estimator. Getting the best params is a better way to reorginize the created pipeline {transformers +estimators} so to have a great capability of data generalization.
- Parameters:
*dataf_fn* (str) – Path to analysis data file.
*df* (pd.Core.DataFrame) – Dataframe of features for analysis . Must be contains of main parameters including the target name of pd.Core.series of columns of df.
arguments (Holds on others optionals infos in kwargs) –
======================================= (================= ============) –
Description (Attributes Type) –
======================================= –
estimator. (auto bool Trigger the composite) – If
Truea SVC-composite estimator preprocessor is given. default is False.model (pipelines dict Collect your own pipeline for) – preprocessor trigging. it should be find automatically.
None (estimators Callable A given estimator. If) – is auto-selected as default estimator.
SVM – is auto-selected as default estimator.
test (model_score float/dict Model test score. Observe your) – model score using your compose estimator for enhancement or your own pipelines.
for (processor Callable Compose piplenes and estimators) – as well as the compose estimator enhancement.
for – default model scorage.
======================================= –
Examples
>>> from watex.bases.modeling import BaseModel >>> from sklearn.preprocessing import RobustScaler, PolynomialFeatures >>> from sklearn.feature_selection import SelectKBest, f_classif >>> from sklearn.ensemble import RandomForestClassifier >>> from sklearn.compose import make_column_selector >>> estimator2= RandomForestClassifier() >>> modelObj = BaseModel( ... data_fn ='data/geo_fdata/BagoueDataset2.xlsx', ... pipelines = { ... 'num_column_selector_': make_column_selector( ... dtype_include=np.number), ... 'cat_column_selector_': make_column_selector( ... dtype_exclude=np.number), ... 'features_engineering_':PolynomialFeatures( ... 2, include_bias=False), ... 'selectors_': SelectKBest(f_classif, k=2), ... 'encodages_': RobustScaler() ... }, ... estimator = RandomForestClassifier() ... )
- property feature_importances_#
Get the bar plot of features importances. If the estimator has not feature_importances_ attributes, it will raise an error.
- get_learning_curve(estimator=None, X_train=None, y_train=None, learning_curve_kws=None, **kws)[source]#
Compute the train score and validation curve to visualize your learning curve.
- Parameters:
estimator – The creating model. If
NoneX_train – pd.core.frame.DataFrame of selected trainset
x_test – pd.DataFrame of selected Data for testset
y_train – array_like of selected data for evaluation set.
y_test – array_like of selected data for model test
val_kws –
validation_curve keywords arguments. if none the default should be:
val_curve_kws = {"param_name":'C', "param_range": np.arange(1,210,10), "cv":4}
- Returns:
train_score: float|dict of trainset score.
val_score : float/dict of valisation score.
- switch: Turn
onoroffthe learning curve of validation curve.
- switch: Turn
-trigDec: Trigger the decorator. - N: number of param range for plotting.
- Example:
>>> from watex.bases.modeling import BaseModel >>> processObj = BaseModel( data_fn = 'data/geo_fdata/BagoueDataset2.xlsx') >>> processObj.get_learning_curve ( switch_plot='on', preprocessor=True)
- get_model_prediction(estimator=None, X_test=None, y_test=None, **kws)[source]#
Get the model prediction and quick plot using the surche decorator.
The decorator holds many keyword arguments to customize plot. Refer to
watex.utils.decorator.predPlot.- Parameters:
estimator – The creating model. If
Nonex_test – pd.DataFrame of selected Data for testset
y_test – array_like of selected data for model test
kws – Additional keywords arguments which refer to the data_fn df and pipelines parameters.
switch – Turn on or off the decorator.
- Example:
>>> from watex.modeling.sl import Modeling >>> modelObj = Modeling( data_fn ='data/geo_fdata/BagoueDataset2.xlsx', pipelines ={ 'num_column_selector_': make_column_selector( dtype_include=np.number), 'cat_column_selector_': make_column_selector( dtype_exclude=np.number), 'features_engineering_':PolynomialFeatures(2, include_bias=False), 'selectors_': SelectKBest(f_classif, k=2), 'encodages_': RobustScaler() }, estimator = SVC(C=1, gamma=0.1)) >>> modelObj.get_model_prediction(estimator =testim, switch ='on')
- property model_#
Get a set of processor and eestimator composed of the composite model
- property model_score#
Estimate your composite model prediction
- permutation_feature_importance(estimator=None, X_train=None, y_train=None, pfi_kws=None, **kws)[source]#
Evaluation of features importance with tree estimators before shuffle and after shuffling trees.
Permutation feature importance is a model inspection technique that can be used for any fitted estimator when the data is tabular. This is especially useful for non-linear or opaque estimators. Refer to :ref:`this link <https://scikit-learn.org/stable/modules/permutation_importance.html>`_ for more details.
- Parameters:
estimator – The estimator to evaluate the importance of features. The default is
RandomForestClassifier.X_train – pd.core.frame.DataFrame of selected trainset.
y_train – array_like of selected data for evaluation set.
n_estimators – Number of estimator composed the tree. The default is 100
n_repeats – Number of tree shuffling. The default is 10.
pfi_kws – permution_importance callable additional keywords arguments.
pfi_stype –
Type of plot. Can be : -
pfifor permutation feature importance beforeand after shuffling trees
-
dendrofor dendrogram plot . The default is pfi.switch – Turn
onoroffthe decorator.
- Example:
>>> from watex.bases.modeling import BaseModel >>> from sklearn.ensemble import AdaBoostClassifier >>> modelObj = BaseModel() >>> modelObj.permutation_feature_importance( ... estimator = AdaBoostClassifier(random_state=7), ... data_fn ='data/geo_fdata/BagoueDataset2.xlsx', ... switch ='on', pfi_style='pfi')
- property processor#
Get te processor after supplying the pipelines
- tuning_hyperparameters(estimator=None, hyper_params=None, cv=4, grid_kws=None, **kws)[source]#
Tuning hyperparametres from existing estimator to evaluate performance. Boosting the model using the model best_param
- Parameters:
estimator – Callable estimator or model to boost
hyper_params – dict of hyperparameters of the estimator
cv – Cross validation cutting off. the default is 4
:param grid_kws:dict of other gridSearch parameters
- Example:
>>> from watex.modeling.basics import SLModeling >>> from sklearn.preprocessing import RobustScaler,PolynomialFeatures >>> from sklearn.feature_selection import SelectKBest, f_classif >>> from sklearn.svm import SVC >>> from sklearn.compose import make_column_selector >>> my_own_pipelines= { 'num_column_selector_': make_column_selector( dtype_include=np.number), 'cat_column_selector_': make_column_selector( dtype_exclude=np.number), 'features_engineering_':PolynomialFeatures( 3, include_bias=False), 'selectors_': SelectKBest(f_classif, k=3), 'encodages_': RobustScaler() } >>> my_estimator = SVC(C=1, gamma=1e-4, random_state=7) >>> modelObj = SLModeling(data_fn ='data/geo_fdata/BagoueDataset2.xlsx', pipelines =my_own_pipelines , estimator = my_estimator) >>> hyperparams ={ 'columntransformer__pipeline-1__polynomialfeatures__degree': np.arange(2,10), 'columntransformer__pipeline-1__selectkbest__k': np.arange(2,7), 'svc__C': [1, 10, 100], 'svc__gamma':[1e-1, 1e-2, 1e-3]} >>> my_compose_estimator_ = modelObj.model_ >>> modelObj.tuning_hyperparameters( estimator= my_compose_estimator_ , hyper_params= hyperparams, search='rand') >>> modelObj.best_params_ >>> modelObj.best_score_