watex.utils.bi_selector#

watex.utils.bi_selector(d, /, features=None, return_frames=False)[source]#

Auto-differentiates the numerical from categorical attributes.

This is usefull to select the categorial features from the numerical features and vice-versa when we are a lot of features. Enter features individually become tiedous and a mistake could probably happenned.

Parameters:
  • d (pandas dataframe) – Dataframe pandas

  • features (list of str) – List of features in the dataframe columns. Raise error is feature(s) does/do not exist in the frame. Note that if features is None, it returns the categorical and numerical features instead.

  • return_frames (bool, default =False) – return the difference columns (features) from the given features as a list. If set to True returns bi-frames composed of the given features and the remaining features.

Returns:

  • - Tuple ( list, list) – list of features and remaining features

  • - Tuple ( pd.DataFrame, pd.DataFrame ) – List of features and remaing features frames.

Example

>>> from watex.utils.mlutils import bi_selector
>>> from watex.datasets import load_hlogs
>>> data = load_hlogs().frame # get the frame
>>> data.columns
>>> Index(['hole_id', 'depth_top', 'depth_bottom', 'strata_name', 'rock_name',
       'layer_thickness', 'resistivity', 'gamma_gamma', 'natural_gamma', 'sp',
       'short_distance_gamma', 'well_diameter', 'aquifer_group',
       'pumping_level', 'aquifer_thickness', 'hole_depth_before_pumping',
       'hole_depth_after_pumping', 'hole_depth_loss', 'depth_starting_pumping',
       'pumping_depth_at_the_end', 'pumping_depth', 'section_aperture', 'k',
       'kp', 'r', 'rp', 'remark'],
      dtype='object')
>>> num_features, cat_features = bi_selector (data)
>>> num_features
...['gamma_gamma',
     'depth_top',
     'aquifer_thickness',
     'pumping_depth_at_the_end',
     'section_aperture',
     'remark',
     'depth_starting_pumping',
     'hole_depth_before_pumping',
     'rp',
     'hole_depth_after_pumping',
     'hole_depth_loss',
     'depth_bottom',
     'sp',
     'pumping_depth',
     'kp',
     'resistivity',
     'short_distance_gamma',
     'r',
     'natural_gamma',
     'layer_thickness',
     'k',
     'well_diameter']
>>> cat_features
... ['hole_id', 'strata_name', 'rock_name', 'aquifer_group',
     'pumping_level']