watex.cases.features.FeatureInspection#

class watex.cases.features.FeatureInspection(tname='flow', mapflow=True, sanitize=False, flow_classes=[0.0, 1.0, 3.0], set_index=False, col_name=None, **kws)[source]#

Summarizes the flow features.

It deals with data features categorization. When numericall values are provided standard qualitative or quantitative analysis is performed.

Parameters
  • *data* (str or pd.core.DataFrame) – Path-like object or pandas Dataframe. Must contain the main parameters including the target.

  • **tname** (str) – The tname for predicting purposes. Here for groundwater exploration, we specify the name of the target as flow.

  • **flow_classes** (list or array_like) – The way to classify the flow. Provide the main specific values to convert the categorial trends to numerical values. Different projects have different tnameing flow rate. Might specify either for village hydraulic, or improved village hydraulic or urban hydraulics.

  • **drop_columns** (list) – items for dropping. To analyse the data, we can drop some specific columns to not corrupt data analysis. In formal dataframe collected straighforwardly from GeoFeatures,the default drop_columns refer to coordinates positions as : [‘east’, ‘north’].

  • **mapflow (bool,) –

    if set to True, value in the target columns should map to categorical values. Commonly the flow rate values are given as a trend of numerical values. For a classification purpose, flow rate must be converted to categorical values which are mainly refered to the type of types of hydraulic. Mostly the type of hydraulic system is in turn tided to the the number of the living population in a specific area. For instance, flow classes can be ranged as follow:

    • FR = 0 is for dry boreholes

    • 0 < FR ≤ 3m3/h for village hydraulic (≤2000 inhabitants)

    • 3 < FR ≤ 6m3/h for improved village hydraulic(>2000-20 000inhbts)

    • 6 <FR ≤ 10m3/h for urban hydraulic (>200 000 inhabitants).

    Note that this flow range is not exhaustive and can be modified according to the type of hydraulic required on the project.

  • **set_index** (bool,) – condired a column as dataframe index. If set to True, please provided the col_name, otherwise it should be the id as as a default columns item.

  • **sanitize** – polish the data and remove inconsistent columns in the data which are not refer to the predicting features. It is able to change for instance the french name of water eau to ‘water` wich is related to the value of water inflow features lwi. This could be usefull when the data is given as a Path-Like object and features are not described correctly in the case of groundwater. Default is False

Examples

>>> from watex.cases.features import FeatureInspection
>>> data = 'data/geodata/main.bagciv.data.csv'
>>> fobj = FeatureInspection().fit(data)
>>> fobj.data_.columns
Out[117]:
Index(['num', 'name', 'east', 'north', 'power', 'magnitude', 'shape', 'type',
       'sfi', 'ohmS', 'lwi', 'geol', 'flow'],
      dtype='object')