watex.utils.mlutils.split_train_test_by_id#

watex.utils.mlutils.split_train_test_by_id(data, test_ratio, id_column=None, keep_colindex=True, hash=<built-in function openssl_md5>)[source]#

Ensure that data will remain consistent accross multiple runs, even if dataset is refreshed.

The new testset will contain 20%of the instance, but it will not contain any instance that was previously in the training set.

Parameters
  • data – Pandas.core.DataFrame

  • test_ratio – ratio of data to put in testset

  • id_colum – identifier index columns. If id_column is None, reset dataframe data index and set id_column equal to index

  • hash – secures hashes algorithms. Refer to test_set_check_id()

Returns

consistency trainset and testset