watex.utils.funcutils.sanitize_frame_cols#

watex.utils.funcutils.sanitize_frame_cols(d, /, func=None, regex=None, pattern=None, fill_pattern=None, inplace=False)[source]#

Remove an indesirable characters and returns new columns

Use regular expression for columns sanitizing

Parameters:
  • d (list, columns,) – columns to sanitize. It might contain a list of items to to polish. If dataframe or series are given, the dataframe columns and the name respectively will be polished and returns the same dataframe.

  • func (F, callable) – Universal function used to clean the columns

  • regex (re object,) –

    Regular expresion object. the default is:

    >>> import re
    >>> re.compile (r'[_#&.)(*@!_,;\s-]\s*', flags=re.IGNORECASE)
    

  • pattern (str, default = ‘[_#&.)(@!_,;s-]s’) – The base pattern to sanitize the text in each column names.

  • fill_pattern (str, default='') – pattern to replace the non-alphabetic character in each item of columns.

  • inplace (bool, default=False,) – transform the dataframe of series in place.

Returns:

return Serie or dataframe if one is given, otherwise it returns a sanitized columns.

Return type:

columns | pd.Series | dataframe.

Examples

>>> from watex.utils.funcutils import sanitize_frame_cols
>>> from watex.utils.coreutils import read_data
>>> h502= read_data ('data/boreholes/H502.xlsx')
>>> h502 = sanitize_frame_cols (h502, fill_pattern ='_' )
>>> h502.columns[:3]
... Index(['depth_top', 'depth_bottom', 'strata_name'], dtype='object')
>>> f = lambda r : r.replace ('_', "'s ")
>>> h502_f= sanitize_frame_cols( h502, func =f )
>>> h502_f.columns [:3]
... Index(['depth's top', 'depth's bottom', 'strata's name'], dtype='object')