watex.utils.resampling#
- watex.utils.resampling(X, y, kind='over', strategy='auto', random_state=None, verbose=Ellipsis, **kws)[source]#
Combining Random Oversampling and Undersampling
Resampling involves creating a new transformed version of the training dataset in which the selected examples have a different class distribution. This is a simple and effective strategy for imbalanced classification problems.
Applying re-sampling strategies to obtain a more balanced data distribution is an effective solution to the imbalance problem. There are two main approaches to random resampling for imbalanced classification; they are oversampling and undersampling.
Random Oversampling: Randomly duplicate examples in the minority class.
Random Undersampling: Randomly delete examples in the majority class.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training vector, where n_samples is the number of samples and n_features is the number of features.
y (array-like of shape (n_samples, )) – Target vector where n_samples is the number of samples.
kind (str, {"over", "under"} , default="over") – kind of sampling to perform.
"over"and"under"stand for oversampling and undersampling respectively.strategy (float, str, dict, callable, default='auto') –
Sampling information to sample the data set.
When
float, it corresponds to the desired ratio of the number of samples in the minority class over the number of samples in the majority class after resampling. Therefore, the ratio is expressed as \(\alpha_{us} = N_{m} / N_{rM}\) where \(N_{m}\) is the number of samples in the minority class and \(N_{rM}\) is the number of samples in the majority class after resampling.Warning
floatis only available for binary classification. An error is raised for multi-class classification.When
str, specify the class targeted by the resampling. The number of samples in the different classes will be equalized. Possible choices are:'majority': resample only the majority class;'not minority': resample all classes but the minority class;'not majority': resample all classes but the majority class;'all': resample all classes;'auto': equivalent to'not minority'.When
dict, the keys correspond to the targeted classes. The values correspond to the desired number of samples for each targeted class.When callable, function taking
yand returns adict. The keys correspond to the targeted classes. The values correspond to the desired number of samples for each class.
random_state (int, RandomState instance, default=None) –
Control the randomization of the algorithm.
If int,
random_stateis the seed used by the random number generator;If
RandomStateinstance, random_state is the random number generator;If
None, the random number generator is theRandomStateinstance used bynp.random.
verbose (bool, default=False) – Display the counting samples
- Returns:
X, y – Arraylike sampled
- Return type:
NDarray, Arraylike
Examples
>>> import watex as wx >>> from watex.utils.mlutils import resampling >>> data, target = wx.fetch_data ('bagoue analysed', as_frame =True) >>> data.shape, target.shape >>> data_us, target_us = resampling (data, target, kind ='under', verbose=True) >>> data_us.shape, target_us.shape Counters: Auto Raw counter y: Counter({0: 232, 1: 112}) UnderSampling counter y: Counter({0: 112, 1: 112}) Out[43]: ((224, 8), (224,))