watex.utils.mlutils.split_train_test_by_id#
- watex.utils.mlutils.split_train_test_by_id(data, test_ratio, id_column=None, keep_colindex=True, hash=<built-in function openssl_md5>)[source]#
Ensure that data will remain consistent accross multiple runs, even if dataset is refreshed.
The new testset will contain 20%of the instance, but it will not contain any instance that was previously in the training set.
- Parameters
data – Pandas.core.DataFrame
test_ratio – ratio of data to put in testset
id_colum – identifier index columns. If id_column is None, reset dataframe data index and set id_column equal to
indexhash – secures hashes algorithms. Refer to
test_set_check_id()
- Returns
consistency trainset and testset