watex.utils.find_aquifer_groups#

watex.utils.find_aquifer_groups(arr_k, /, arr_aq=None, kname=None, aqname=None, subjectivity=False, default_arr=None, keep_label_0=False, method='naive')[source]#

Fit the group of aquifer and find the representative of each true label in array ‘k’ in the aquifer group array.

The idea consists to find the corresponding aquifer group which fits the most the true label ‘X’ in ‘y_true’.

‘arr_k’ and ‘arr_aq’ must contain a class label, not continue values.

Parameters:

arr_k (array_like, pandas series or dataframe) – arraylike that contains the permeability coefficients ‘k’. If a dataframe is supplied, the permeabitlity coefficient column name ‘kname’ must be specified.
arr_aq (array-like , pandas series or dataframe) – array-like that contains the aquifer groups. If NAN values exists in the aquifer groups, it is suggested to imputed values before feediing to the algorithms. Missing values are not allowed. If dataframe is supplied, the aquifer group column name ‘aqname’ must be specified.
kname (str, int) –

Name of permeability coefficient columns. kname allows to retrieve the
permeability coefficient ‘k’ in a specific dataframe. If integer is passed, it assumes the index of the dataframe fits the ‘k’ columns. Note that integer value must not be out the dataframe size along axis 1. Commonly

kname needs to be supplied when a dataframe is passed as a positional
or keyword argument.
aqname (str, optional,) –

Name of aquifer group columns. aqname allows to retrieve the
aquifer group arr_aq value in a specific dataframe. Commonly

aqname needs to be supplied when a dataframe is passed as a positional
or keyword argument.
subjectivity (bool, default=False) – Considers each class label as a naive group of aquifer. Subjectivity occurs when a group of aquifer is not found in the data. Therefore, each class label is considered as a naive group of aquifer. It is strongly recommended to provide a default group passes to parameter default_arr to substitute the group of aquifers for more pratical reason. For instance it can be the layer collected at a specific depth like the ‘strata’ columns.
default_arr (array-like, pd.Series) – Array used as default to subsitute the group of aqquifer if the latter is missing. This is an heuristic option because it might lead to breaking code or invalid results.
keep_label_0 (bool, default=False) – The prediction already include the label 0. However, including 0 in the predicted label refers to ‘k=0’ i.e. permeability coefficient equals to 0, which is not True in principle, because all rocks have a permeability coefficient ‘k’. Here we considered ‘k=0’ as an undefined permeability coefficient. Therefore, ‘0’ , can be exclude since, it can also considered as a missing ‘k’-value. If predicted ‘0’ is in the target it should mean a missing ‘k’-value rather than being a concrete label. Therefore, to avoid any confusion, ‘0’ is altered to ‘1’ so the value +1 is used to move forward all class labels thereby excluding the ‘0’ label. To force include 0 in the label, set keep_label_0 to True.
method (str ['naive', 'strict'], default='naive') –
The kind of strategy to compute the representativity of a label in the predicted array ‘array_aq’. It can also be ‘strict’. Indeed:
- naive computes the importance of the label by the number of its
  occurence for this specific label in the array ‘k’. It does not take into account of the occurence of other existing labels. This is usefull for unbalanced class labels in arr_k.
- strict computes the importance of the label by the number of
  occurence in the whole valid arr_k i.e. under the total of occurence of all the labels that exist in the whole ‘arra_aq’. This can give a suitable anaylse results if the data is not unbalanced for each labels in arr_k.

Returns:

_Group – Use attribute .groups to find the group values.

Return type:

_Group class object

Examples

Use the real aquifer group collected in the area

>>> from watex.utils import naive_imputer, read_data, reshape
>>> from watex.datasets import load_hlogs
>>> from watex.utils.hydroutils import classify_k, find_aquifer_groups
>>> b= load_hlogs () #just taking the target names
>>> data = read_data ('data/boreholes/hf.csv') # read complete data
>>> y = data [b.target_names]
>>> # impute the missing values found in aquifer group columns
>>> # reshape 1d array along axis 0 for imputation
>>> agroup_imputed = naive_imputer ( reshape (y.aquifer_group, axis =0 ) ,
...                                    strategy ='most_frequent')
>>> # reshape back to array_like 1d
>>> y.aquifer_group =reshape (agroup_imputed)
>>> # categorize the 'k' continous value in 'y.k' using the default
>>> # 'k' mapping func
>>> y.k = classify_k (y.k , default_func =True)
>>> # get the group obj
>>> group_obj = find_aquifer_groups(y.k, y.aquifer_group)
>>> group_obj
_Group(Label=[' 1 ',
             Preponderance( rate = '53.141  %',
                           [('Groups', {'V': 0.32, 'IV': 0.266, 'II': 0.236,
                                        'III': 0.158, 'IV&V': 0.01,
                                        'II&III': 0.005, 'III&IV': 0.005}),
                            ('Representativity', ( 'V', 0.32)),
                            ('Similarity', 'V')])],
        Label=[' 2 ',
              Preponderance( rate = ' 19.11  %',
                           [('Groups', {'III': 0.274, 'II': 0.26, 'V': 0.26,
                                        'IV': 0.178, 'III&IV': 0.027}),
                            ('Representativity', ( 'III', 0.27)),
                            ('Similarity', 'III')])],
        Label=[' 3 ',
              Preponderance( rate = '27.749  %',
                           [('Groups', {'V': 0.443, 'IV': 0.311, 'III': 0.245}),
                            ('Representativity', ( 'V', 0.44)),
                            ('Similarity', 'V')])],
             )
(2) Use the subjectivity and set the strata columns as default array

>>> find_aquifer_groups(y.k, subjectivity=True, default_arr= X.strata_name )
_Group(Label=[' 1 ',
             Preponderance( rate = '53.141  %',
                           [('Groups', {'siltstone': 0.35, 'coal': 0.227,
                                        'fine-grained sandstone': 0.158,
                                        'medium-grained sandstone': 0.094,
                                        'mudstone': 0.079,
                                        'carbonaceous mudstone': 0.054,
                                        'coarse-grained sandstone': 0.03,
                                        'coarse': 0.01}),
                            ('Representativity', ( 'siltstone', 0.35)),
                            ('Similarity', 'siltstone')])],
        Label=[' 2 ',
              Preponderance( rate = ' 19.11  %',
                           [('Groups', {'mudstone': 0.288, 'siltstone': 0.205,
                                        'coal': 0.192,
                                        'coarse-grained sandstone': 0.137,
                                        'fine-grained sandstone': 0.137,
                                        'carbonaceous mudstone': 0.027,
                                        'medium-grained sandstone': 0.014}),
                            ('Representativity', ( 'mudstone', 0.29)),
                            ('Similarity', 'mudstone')])],
        Label=[' 3 ',
              Preponderance( rate = '27.749  %',
                           [('Groups', {'mudstone': 0.245, 'coal': 0.226,
                                        'siltstone': 0.217,
                                        'fine-grained sandstone': 0.123,
                                        'carbonaceous mudstone': 0.066,
                                        'medium-grained sandstone': 0.066,
                                        'coarse-grained sandstone': 0.057}),
                            ('Representativity', ( 'mudstone', 0.24)),
                            ('Similarity', 'mudstone')])],
             )