watex.utils.to_numeric_dtypes#
- watex.utils.to_numeric_dtypes(arr, *, columns=None, return_feature_types=Ellipsis, missing_values=nan, pop_cat_features=Ellipsis, sanitize_columns=Ellipsis, regex=None, fill_pattern='_', drop_nan_columns=True, how='all', reset_index=Ellipsis, drop_index=True, verbose=Ellipsis)[source]#
Convert array to dataframe and coerce arguments to appropriate dtypes.
Function includes additional tools to manipulate the transformed data such as:
pop_cat_features to remove the categorical attributes,
sanitize_columns to clean the columns of the dataframe by removing the undesirable characters,
drop_nan_columns to drop all the columns and/or rows that contains full NaN, …
- Parameters:
arr (Ndarray or Dataframe, shape (m_samples, n_features)) – Array of dataframe to create, to sanitize or to auto-detect feature categories ( numerical or categorical).
columns (list of str, optional) – Usefull to create a dataframe when array is given. Be aware to fit the number of array columns (shape[1])
return_feature_types (bool, default=False,) – return the list of numerical and categorial features.
missing_values (float, default='NaN') – Replace the missing or empty string if exist in the dataframe.
pop_cat_features (bool, default=False,) – remove the categorial features from the DataFrame.
sanitize_columns (bool, default=False,) –
remove undesirable character in the data columns using the default argument of regex parameters.
New in version 0.1.9.
regex (re object,) –
- Regular expresion object used to polish the data columns.
the default is:
>>> import re >>> re.compile (r'[_#&.)(*@!_,;\s-]\s*', flags=re.IGNORECASE)
New in version 0.1.9.
fill_pattern (str, default='') – Pattern to replace the non-alphabetic character in each item of columns.
drop_nan_columns (bool, default=True) –
Remove all columns filled by NaN values.
how (str, default='all') – Drop also the NaN row data. The row data which is composed entirely with NaN or Null values.
reset_index (bool, default=False) –
Reset the index of the dataframe.
drop_index (bool, default=True,) –
Drop index in the dataframe after reseting.
verbose (bool, default=False,) – outputs a message by listing the categorial items dropped from the dataframe if exists.
- Returns:
df or (df, nf, cf) – also return nf and cf if return_feature_types is set to``True``.
- Return type:
Dataframe of values casted to numeric types
Examples
>>> from watex.datasets.dload import load_bagoue >>> from watex.utils.funcutils import to_numeric_dtypes >>> X, y = load_bagoue (as_frame =True ) >>> X0 =X[['shape', 'power', 'magnitude']] >>> X0.dtypes ... shape object power object magnitude object dtype: object >>> df = to_numeric_dtypes(X0) >>> df.dtypes ... shape object power float64 magnitude float64 dtype: object