watex.transformers.StratifiedWithCategoryAdder#

class watex.transformers.StratifiedWithCategoryAdder(base_num_feature=None, threshold_operator=1.0, return_train=False, max_category=3, n_splits=1, test_size=0.2, random_state=42)[source]#

Stratified sampling transformer based on new generated category from numerical attributes and return stratified trainset and test set.

Parameters:
  • base_num_feature (str,) – Numerical features to categorize.

  • threshold_operator (float,) – The coefficient to divised the numerical features value to normalize the data

  • max_category (Maximum value fits a max category to gather all) – value greather than.

  • return_train (bool,) – Return the whole stratified trainset if set to True. usefull when the dataset is not enough. It is convenient to train all the whole trainset rather than a small amount of stratified data. Sometimes all the stratified data are not the similar equal one to another especially when the dataset is not enough.

  • and (Another way to stratify dataset is to get insights from the dataset)

  • attributes (to add a new category as additional mileage. From this new)

:param : :param data could be stratified after categorizing numerical features.: :param Once data is tratified: :param the new category will be drop and return the: :param train set and testset stratified. For instance::: >>> from watex.transformers import StratifiedWithCategoryAdder

>>> stratifiedNumObj= StratifiedWithCatogoryAdder('flow')
>>> stratifiedNumObj.fit_transform(X=df)
>>> stats2 = stratifiedNumObj.statistics_
Parameters:
  • Usage

  • ------

  • example (In this)

  • using (we firstly categorize the flow attribute)

:param the ceilvalue (see discretizeCategoriesforStratification()): :param and groupby other values greater than the max_category value to the: :param max_category andput in the temporary features. From this features: :param the categorization is performed and stratified the trainset and: :param the test set.:

Notes

If base_num_feature is not given, dataset will be stratified using random sampling.