watex.utils.mlutils.default_data_splitting#
- watex.utils.mlutils.default_data_splitting(X, y=None, *, test_size=0.2, target=None, random_state=42, fetch_target=False, **skws)[source]#
Splitting data function naively.
Split data into the training set and test set. If target y is not given and you want to consider a specific array as a target for supervised learning, just turn fetch_target argument to
Trueand set the target argument as a numpy columns index or pandas dataframe colums name.- Parameters
X – np.ndarray or pd.DataFrame
y – array_like
test_size – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split.
random_state – int, Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls
fetch_target – bool, use to retrieve the targetted value from the whole data X.
target – int, str If int itshould be the index of the targetted value otherwise should be the columns name of pandas DataFrame.
skws – additional scikit-lean keywords arguments https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
- Returns
list, length -List containing train-test split of inputs.
- Example
>>> from watex.datasets import fetch_data >>> data = fetch_data ('Bagoue original').get('data=df') >>> X, XT, y, yT= default_data_splitting(data.values, fetch_target=True, target =12 ) >>> X, XT, y, yT= default_data_splitting(data, fetch_target=True, target ='flow' ) >>> X0= data.copy() >>> X0.drop('flow', axis =1, inplace=True) >>> y0 = data ['flow'] >>> X, XT, y, yT= default_data_splitting(X0, y0)