watex.utils.to_numeric_dtypes#

watex.utils.to_numeric_dtypes(arr, *, columns=None, return_feature_types=Ellipsis, missing_values=nan, pop_cat_features=Ellipsis, sanitize_columns=Ellipsis, regex=None, fill_pattern='_', drop_nan_columns=True, how='all', reset_index=Ellipsis, drop_index=True, verbose=Ellipsis)[source]#

Convert array to dataframe and coerce arguments to appropriate dtypes.

Function includes additional tools to manipulate the transformed data such as:

  • pop_cat_features to remove the categorical attributes,

  • sanitize_columns to clean the columns of the dataframe by removing the undesirable characters,

  • drop_nan_columns to drop all the columns and/or rows that contains full NaN, …

Parameters
  • arr (Ndarray or Dataframe, shape (m_samples, n_features)) – Array of dataframe to create, to sanitize or to auto-detect feature categories ( numerical or categorical).

  • columns (list of str, optional) – Usefull to create a dataframe when array is given. Be aware to fit the number of array columns (shape[1])

  • return_feature_types (bool, default=False,) – return the list of numerical and categorial features.

  • missing_values (float, default='NaN') – Replace the missing or empty string if exist in the dataframe.

  • pop_cat_features (bool, default=False,) – remove the categorial features from the DataFrame.

  • sanitize_columns (bool, default=False,) –

    remove undesirable character in the data columns using the default argument of regex parameters.

    New in version 0.1.9.

  • regex (re object,) –

    Regular expresion object used to polish the data columns.

    the default is:

    >>> import re
    >>> re.compile (r'[_#&.)(*@!_,;\s-]\s*', flags=re.IGNORECASE)
    

    New in version 0.1.9.

  • fill_pattern (str, default='') – Pattern to replace the non-alphabetic character in each item of columns.

  • drop_nan_columns (bool, default=True) –

    Remove all columns filled by NaN values.

  • how (str, default='all') – Drop also the NaN row data. The row data which is composed entirely with NaN or Null values.

  • reset_index (bool, default=False) –

    Reset the index of the dataframe.

  • drop_index (bool, default=True,) –

    Drop index in the dataframe after reseting.

  • verbose (bool, default=False,) – outputs a message by listing the categorial items dropped from the dataframe if exists.

Returns

df or (df, nf, cf) – also return nf and cf if return_feature_types is set to``True``.

Return type

Dataframe of values casted to numeric types

Examples

>>> from watex.datasets.dload import load_bagoue
>>> from watex.utils.funcutils import to_numeric_dtypes
>>> X, y = load_bagoue (as_frame =True )
>>> X0 =X[['shape', 'power', 'magnitude']]
>>> X0.dtypes
... shape        object
    power        object
    magnitude    object
    dtype: object
>>> df = to_numeric_dtypes(X0)
>>> df.dtypes
... shape         object
    power        float64
    magnitude    float64
    dtype: object