watex.cases.processing.Processing#
- class watex.cases.processing.Processing(pipeline=None, estimator=None, **kws)[source]#
Processing class for managing baseline model evaluation and learning.
Manages the validation curves after fiddling a little bit an estimator hyperparameters.
Processing is usefull before modeling step. To process data, a default implementation is given for data preprocessor build. It consists of creating a model pipeline using different transformers. If None pipeline is setting and auto is set to ‘True’, a default pipeline is created though the prepocessor`to raun the base model evaluation. Indeed a `preprocessor is a set of transformers + estimators.
- Parameters:
auto (bool, default is {'False'}) – trigger the composite estimator.If
Truea composite preprocessor is built and use for base model evaluation. default is False.pipeline (Callable, F or dict of callable F) – preprocessing steps encapsulated. If not supplied a default pipe is used as auto is set to
True.estimator (Callable,) – An object which manages the estimation and decoding of a model. Estimators must provide a fit method, and should provide set_params and get_params, although these are usually provided by inheritance from base.BaseEstimator. The core functionality of some estimators may also be available as a function.
tname (str,) – A target name or label. In supervised learning the target name is considered as the reference name of y or label variable.
drop_features (list or str, Optional) – List the useless features for predicting or list of column names to drop out.
random_state (int, default is
42) – The state of data shuffling. The default is42.default_estimator (callable, F or sckitlearn estimator) – The default estimator name for predicting the tname value. A predifined defaults estimators prameters are set and keep in cache for quick preprocessing like: - ‘dtc’: For DecisionTreeClassifier - ‘svc’: Support Vector Classifier - ‘sdg’: SGDClassifier - ‘knn’: KNeighborsClassifier - ‘rdf`: RandmForestClassifier - ‘ada’: AdaBoostClassifier - ‘vtc’: VotingClassifier - ‘bag’: BaggingClassifier - ‘stc’: StackingClassifier If estimator is not given the default is
svmorsvc.test_size (float,) – The test set data size. Must be less than 1.The sample test size is
0.2either 20% of dataset.verbose (int, default is
0) – Control the level of verbosity. Higher value lead to more messages.
- X#
training set; Denotes data that is observed at training and prediction time, used as independent variables in learning. The notation is uppercase to denote that it is ordinarily a matrix. When a matrix, each sample may be represented by a feature vector, or a vector of precomputed (dis)similarity with each training sample.
Xmay also not be a matrix, and may require a feature extractor or a pairwise metric to turn it into one before learning a model.- Type:
Ndarray of shape ( M x N), \(M=m-samples\) & \(N=n-features\)
- y#
train target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.
- Type:
array-like of shape (M, ) :math:`M=m-samples
- Xt#
Shorthand for “test set”; data that is observed at testing and prediction time, used as independent variables in learning.The notation is uppercase to denote that it is ordinarily a matrix.
- Type:
Ndarray ( M x N matrix where
M=m-samples, &N=n-features)
- yt#
test target; Denotes data that may be observed at training time as the dependent variable in learning, but which is unavailable at prediction time, and is usually the target of prediction.
- Type:
array-like, shape (M, )
M=m-samples,
- data#
Path -like object or Dataframe. If data is given as path-like object, data is read, asserted and validated. Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be a file://localhost/path/to/table.csv. If you want to pass in a path object, pandas accepts any
os.PathLike. By file-like object, we refer to objects with a read() method, such as a file handle e.g. via builtin open function or StringIO.- Type:
str, filepath_or_buffer or
pandas.core.DataFrame
- pipe_#
Pipeline can be buit by your own pipeline with different transformer. For base model prediction, it is possible to use the default pipeline. Call get_default_pipe to get the transformation list and steps.
- Type:
Callable, preprocessor object from
sklearn.pipeline
- estimator#
Callable estimator method to fit the model:
estimators= SGDClassifier(random_state=13)
- Type:
Callable, F or
sklearn.metaestimator
- model#
A model estimator. An object which manages the estimation and decoding of a model. The model is estimated as a deterministic function of:
parameters provided in object construction or with set_params;
- the global numpy.random random state if the estimator’s random_state
parameter is set to None; and
- any data or sample properties passed to the most recent call to fit,
fit_transform or fit_predict, or data similarly passed in a sequence of calls to partial_fit.
The estimated model is stored in public and private attributes on the estimator instance, facilitating decoding through prediction and transformation methods. Estimators must provide a fit method, and should provide set_params and get_params, although these are usually provided by inheritance from base.BaseEstimator. The core functionality of some estimators may also be available as a
function.- Type:
callable, always as a function,
- cat_features_#
list of categorical features list. If not given it should be find automatically.
- Type:
list or str, Optional
- num_features_#
list Numerical features list. If not given, should be find automatically.
- Type:
list of str, Optional
- model#
Use the predifined pipelines i.e can be a Pipeline can your build by your own pipeline with different composite estimator. If model is
None, use the default model from the default preprocessor and estimator.- Type:
Callable, {preprocessor + estimator },
- model_score_#
Model test score. Observe your test model score using your compose estimator for enhacement
- Type:
float/dict
- model_prediction_#
Observe your test model prediction for as well as the compose estimator enhancement.
- Type:
array_like
- preprocessor_#
Compose piplenes and estimators for default model scorage.
- Type:
Callable , F
Examples
>>> from watex.cases.processing import Processing >>> from watex.exlib.sklearn import (StandardScaler,RandomForestClassifier, make_column_selector, PolynomialFeatures, SelectKBest, f_classif) >>> data = fetch_data ('bagoue original').get('data=dfy2') >>> my_own_pipeline= {'num_column_selector_': ... make_column_selector(dtype_include=np.number), ... 'cat_column_selector_': ... make_column_selector(dtype_exclude=np.number), ... 'features_engineering_': ... PolynomialFeatures(3,include_bias=True), ... 'selectors_': SelectKBest(f_classif, k=4), ... 'encodages_': StandardScaler() ... } >>> my_estimator={ ... 'RandomForestClassifier':RandomForestClassifier( ... n_estimators=200, random_state=0) ... } >>> processObj= Processing (tname = 'flow', drop_features =['lwi', 'name', 'num'], pipeline= my_own_pipeline, estimator=my_estimator) >>> processObj.fit(data=data ) >>> processObj.baseEvaluation (eval_metric=True ) ... 0.4942528735632184 # score is an ensemble score for both model