watex.cases.processing.Processing#

class watex.cases.processing.Processing(pipeline=None, estimator=None, **kws)[source]#

Processing class for managing baseline model evaluation and learning.

Manages the validation curves after fiddling a little bit an estimator hyperparameters.

Processing is usefull before modeling step. To process data, a default implementation is given for data preprocessor build. It consists of creating a model pipeline using different transformers. If None pipeline is setting and auto is set to ‘True’, a default pipeline is created though the prepocessor`to raun the base model evaluation. Indeed a `preprocessor is a set of transformers + estimators.

Parameters:

auto (bool, default is {'False'}) – trigger the composite estimator.If True a composite preprocessor is built and use for base model evaluation. default is False.
pipeline (Callable, F or dict of callable F) – preprocessing steps encapsulated. If not supplied a default pipe is used as auto is set to True.
estimator (Callable,) – An object which manages the estimation and decoding of a model. Estimators must provide a fit method, and should provide set_params and get_params, although these are usually provided by inheritance from base.BaseEstimator. The core functionality of some estimators may also be available as a function.
tname (str,) – A target name or label. In supervised learning the target name is considered as the reference name of y or label variable.
drop_features (list or str, Optional) – List the useless features for predicting or list of column names to drop out.
random_state (int, default is 42) – The state of data shuffling. The default is 42.
default_estimator (callable, F or sckitlearn estimator) – The default estimator name for predicting the tname value. A predifined defaults estimators prameters are set and keep in cache for quick preprocessing like: - ‘dtc’: For DecisionTreeClassifier - ‘svc’: Support Vector Classifier - ‘sdg’: SGDClassifier - ‘knn’: KNeighborsClassifier - ‘rdf`: RandmForestClassifier - ‘ada’: AdaBoostClassifier - ‘vtc’: VotingClassifier - ‘bag’: BaggingClassifier - ‘stc’: StackingClassifier If estimator is not given the default is svm or svc.
test_size (float,) – The test set data size. Must be less than 1.The sample test size is 0.2 either 20% of dataset.
verbose (int, default is 0) – Control the level of verbosity. Higher value lead to more messages.

X#

training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample. X may also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.

Type:: Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)

y#

train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

Type:: array-like of shape (M, ) :math:`M=m-samples

Xt#

Shorthand for “test set”; data that is observed at testing and prediction time, used as independent variables in learning.The notation is uppercase to denote that it is ordinarily a matrix.

Type:: Ndarray ( M x N matrix where M=m-samples, & N=n-features)

yt#

test target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.

Type:: array-like, shape (M, ) M=m-samples,

data#

Path -like object or Dataframe. If data is given as path-like object, data is read, asserted and validated. Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be a file://localhost/path/to/table.csv. If you want to pass in a path object, pandas accepts any os.PathLike. By file-like object, we refer to objects with a read() method, such as a file handle e.g. via builtin open function or StringIO.

Type:: str, filepath_or_buffer or pandas.core.DataFrame

pipe_#

Pipeline can be buit by your own pipeline with different transformer. For base model prediction, it is possible to use the default pipeline. Call get_default_pipe to get the transformation list and steps.

Type:: Callable, preprocessor object from sklearn.pipeline

estimator#

Callable estimator method to fit the model:

estimators= SGDClassifier(random_state=13)

Type:: Callable, F or sklearn.metaestimator

model#

A model estimator. An object which manages the estimation and decoding of a model. The model is estimated as a deterministic function of:

parameters provided in object construction or with set_params;

the global numpy.random random state if the estimator’s random_state
parameter is set to None; and

any data or sample properties passed to the most recent call to fit,
fit_transform or fit_predict, or data similarly passed in a sequence of calls to partial_fit.

The estimated model is stored in public and private attributes on the estimator instance, facilitating decoding through prediction and transformation methods. Estimators must provide a fit method, and should provide set_params and get_params, although these are usually provided by inheritance from base.BaseEstimator. The core functionality of some estimators may also be available as a function.

Type:: callable, always as a function,

cat_features_#

list of categorical features list. If not given it should be find automatically.

Type:: list or str, Optional

num_features_#

list Numerical features list. If not given, should be find automatically.

Type:: list of str, Optional

model#

Use the predifined pipelines i.e can be a Pipeline can your build by your own pipeline with different composite estimator. If model is None , use the default model from the default preprocessor and estimator.

Type:: Callable, {preprocessor + estimator },

model_score_#

Model test score. Observe your test model score using your compose estimator for enhacement

Type:: float/dict

model_prediction_#

Observe your test model prediction for as well as the compose estimator enhancement.

Type:: array_like

preprocessor_#

Compose piplenes and estimators for default model scorage.

Type:: Callable , F

Examples

>>> from watex.cases.processing  import Processing
>>> from watex.exlib.sklearn import (StandardScaler,RandomForestClassifier,
                                     make_column_selector, PolynomialFeatures,
                                     SelectKBest, f_classif)
>>> data = fetch_data ('bagoue original').get('data=dfy2')
>>> my_own_pipeline= {'num_column_selector_':
...                       make_column_selector(dtype_include=np.number),
...                'cat_column_selector_':
...                    make_column_selector(dtype_exclude=np.number),
...                'features_engineering_':
...                    PolynomialFeatures(3,include_bias=True),
...                'selectors_': SelectKBest(f_classif, k=4),
...               'encodages_': StandardScaler()
...                 }
>>> my_estimator={
...    'RandomForestClassifier':RandomForestClassifier(
...    n_estimators=200, random_state=0)
...    }
>>> processObj= Processing (tname = 'flow', drop_features =['lwi', 'name', 'num'],
                            pipeline= my_own_pipeline, estimator=my_estimator)
>>> processObj.fit(data=data )
>>> processObj.baseEvaluation (eval_metric=True )
... 0.4942528735632184 # score is an ensemble score for both model