watex.utils.mlutils.default_data_splitting#

watex.utils.mlutils.default_data_splitting(X, y=None, *, test_size=0.2, target=None, random_state=42, fetch_target=False, **skws)[source]#

Splitting data function naively.

Split data into the training set and test set. If target y is not given and you want to consider a specific array as a target for supervised learning, just turn fetch_target argument to True and set the target argument as a numpy columns index or pandas dataframe colums name.

Parameters
  • X – np.ndarray or pd.DataFrame

  • y – array_like

  • test_size – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split.

  • random_state – int, Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls

  • fetch_target – bool, use to retrieve the targetted value from the whole data X.

  • target – int, str If int itshould be the index of the targetted value otherwise should be the columns name of pandas DataFrame.

  • skws – additional scikit-lean keywords arguments https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

Returns

list, length -List containing train-test split of inputs.

Example
>>> from watex.datasets import fetch_data
>>> data = fetch_data ('Bagoue original').get('data=df')
>>> X, XT, y, yT= default_data_splitting(data.values,
                             fetch_target=True,
                             target =12 )
>>> X, XT, y, yT= default_data_splitting(data,
                     fetch_target=True,
                     target ='flow' )
>>> X0= data.copy()
>>> X0.drop('flow', axis =1, inplace=True)
>>> y0 = data ['flow']
>>> X, XT, y, yT= default_data_splitting(X0, y0)