watex.transformers.StratifiedUsingBaseCategory#
- class watex.transformers.StratifiedUsingBaseCategory(base_column=None, test_size=0.2, random_state=42)[source]#
Transformer to stratified dataset to have data more representativce into the trainset and the test set especially when data is not large enough.
- Parameters:
*base_column* (str or int,) – Hyperparameters and can be index of the base mileage(category) for stratifications. If base_column is None, will return the purely random sampling.
*test_size* (float) – Size to put in the test set.
*random_state* (shuffled number of instance in the overall dataset.) – default is
42.Usage –
------ –
attributes (If data is not large enough especially relative number of) –
sampling (if much possible to run therisk of introducing a significant) –
avoid (biais.Therefore strafied sampling is a better way to) –
a significant biais of sampling survey. For instance:
>>> from watex.transformers import StratifiedUsingBaseCategory >>> from watex.utils.mlutils import load_data >>> df = load_data('data/geo_fdata') >>> stratifiedObj = StratifiedUsingBaseCategory(base_column='geol') >>> stratifiedObj.fit_transform(X=df) >>> stats= stratifiedObj.statistics_
Notes
An
statictics_inspection attribute is good way to observe the test set generated using purely random and the stratified sampling. The stratified sampling has categorybase_columnproportions almost indentical to those in the full dataset whereas the test set generated using purely random sampling is quite skewed.