<no title> — watex 0.1.6.dev220+gcf54d39.d20230309 documentation

Hydrogeological parameters of aquifer are the essential and crucial basic data in the designing and construction progress of geotechnical engineering and groundwater dewatering, which are directly related to the reliability of these parameters.

Note

For strong and clear demonstration as examples in many scripts, we use the data ‘hf.csv’. This data is a confident data so it is not available in the package. The idea consists to show how scripts will works if many boreholes data are available.

watex.utils.hydroutils.categorize_target(arr, /, func=None, labels=None, rename_labels=None, coerce=False, order='strict')[source]#

Categorize array to hold the given identifier labels.

Classifier numerical values according to the given label values. Labels are a list of integers where each integer is a group of unique identifier of a sample in the dataset.

Parameters:

arr (array-like |pandas.Series) – array or series containing numerical values. If a non-numerical values is given , an errors will raises.
func (Callable,) – Function to categorize the target y.
labels (int, list of int,) – if an integer value is given, it should be considered as the number of category to split ‘y’. For instance label=3 and applied on the first ten number, the labels values should be [0, 1, 2]. If labels are given as a list, items must be self-contain in the target ‘y’.
rename_labels (list of str;) – list of string or values to replace the label integer identifier.
coerce (bool, default =False,) – force the new label names passed to rename_labels to appear in the target including or not some integer identifier class label. If coerce is True, the target array holds the dtype of new_array.

Returns:

arr – The category array with unique identifer labels

Return type:

Arraylike |pandas.Series

Examples

>>> from watex.utils.mlutils import cattarget
>>> def binfunc(v):
        if v < 3 : return 0
        else : return 1
>>> arr = np.arange (10 )
>>> arr
... array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> target = cattarget(arr, func =binfunc)
... array([0, 0, 0, 1, 1, 1, 1, 1, 1, 1], dtype=int64)
>>> cattarget(arr, labels =3 )
... array([0, 0, 0, 1, 1, 1, 2, 2, 2, 2])
>>> array([2, 2, 2, 2, 1, 1, 1, 0, 0, 0])
>>> cattarget(arr, labels =3 , order =None )
... array([0, 0, 0, 0, 1, 1, 1, 2, 2, 2])
>>> cattarget(arr[::-1], labels =3 , order =None )
... array([0, 0, 0, 1, 1, 1, 2, 2, 2, 2]) # reverse does not change
>>> cattarget(arr, labels =[0 , 2,  4]  )
... array([0, 0, 0, 2, 2, 4, 4, 4, 4, 4])

watex.utils.hydroutils.check_flow_objectivity(y, /, values, classes)[source]#

Function checks the flow rate objectivity

If objective is set to flow i.e the prediction focuses on the flow rate, there are some conditions that the target y needs to meet when values are passed for classes categorization.

Parameters:

values – list of values to encoding the numerical target y. for instance values=[0, 1, 2]
objective – str, relate to the flow rate prediction. Set to None for any other predictions.
prefix –
the prefix to add to the class labels. For instance, if the prefix equals to FR, class labels will become:
```
[0, 1, 2] => [FR0, FR1, FR2]
```
classes –
list of classes names to replace the default FR that is used to specify the flow rate. For instance, it can be:
```
[0, 1, 2] => [sf0, sf1, sf2]
```

Returns:

(y, classes): Tuple, - y: array-like 1d of categorized y - classes: list of flow rate classes.

watex.utils.hydroutils.classify_k(o, /, func=None, kname=None, inplace=False, string=False, default_func=False)[source]#

Categorize the permeability coefficient ‘k’

Map the continuous ‘k’ into categorial classes.

Parameters:

o (ndarray of pd.Series or Dataframe) – data containing the permeability coefficient k contineous values. If data is passsed as a pandas dataframe, the column containing the k-values kname needs to be specified.
func (callable) – Function to specifically map the permeability coefficient column in the dataframe of serie. If not given, the default function can be enabled instead from param default_func.
inplace (bool, default=False) – Modified object inplace and return None
string (bool,) – If set to “True”, categorized map from ‘k’ should be prefixed by “k”. However is string value is given , the prefix is changed according to this label.
default_ufunc (bool,) –
Default function for mapping k is setting to True. Note that, this could probably not fitted your own data. So it is recommended to provide your own function for mapping ‘k’. However the default ‘k’ mapping is given as follow:
- k0 {0}: k = 0
- k1 {1}: 0 < k <= .01
- k2 {2}: .01 < k <= .07
- k3 {3}: k> .07

Returns:

o – return None only if dataframe is given and inplace is set to True i.e modified object inplace.

Return type:

None, ndarray, Series or Dataframe

Examples

>>> import numpy as np
>>> from watex.datasets import load_hlogs
>>> from watex.utils.hydroutils import classify_k
>>> _, y0 = load_hlogs (as_frame =True)
>>> # let visualize four nonzeros values in y0
>>> y0.k.values [ ~np.isnan (y0.k ) ][:4]
...  array([0.054, 0.054, 0.054, 0.054])
>>> classify_k (y0 , kname ='k', inplace =True, use_default_func=True )
>>> # let see again the same four value in the dataframe
>>> y0.k.values [ ~np.isnan (y0.k ) ][:4]
... array([2., 2., 2., 2.])

watex.utils.hydroutils.find_aquifer_groups(arr_k, /, arr_aq=None, kname=None, aqname=None, subjectivity=False, default_arr=None, keep_label_0=False, method='naive')[source]#

Fit the group of aquifer and find the representative of each true label in array ‘k’ in the aquifer group array.

The idea consists to find the corresponding aquifer group which fits the most the true label ‘X’ in ‘y_true’.

‘arr_k’ and ‘arr_aq’ must contain a class label, not continue values.

Parameters:

arr_k (array_like, pandas series or dataframe) – arraylike that contains the permeability coefficients ‘k’. If a dataframe is supplied, the permeabitlity coefficient column name ‘kname’ must be specified.
arr_aq (array-like , pandas series or dataframe) – array-like that contains the aquifer groups. If NAN values exists in the aquifer groups, it is suggested to imputed values before feediing to the algorithms. Missing values are not allowed. If dataframe is supplied, the aquifer group column name ‘aqname’ must be specified.
kname (str, int) –

Name of permeability coefficient columns. kname allows to retrieve the
permeability coefficient ‘k’ in a specific dataframe. If integer is passed, it assumes the index of the dataframe fits the ‘k’ columns. Note that integer value must not be out the dataframe size along axis 1. Commonly

kname needs to be supplied when a dataframe is passed as a positional
or keyword argument.
aqname (str, optional,) –

Name of aquifer group columns. aqname allows to retrieve the
aquifer group arr_aq value in a specific dataframe. Commonly

aqname needs to be supplied when a dataframe is passed as a positional
or keyword argument.
subjectivity (bool, default=False) – Considers each class label as a naive group of aquifer. Subjectivity occurs when no group of aquifer is not found in the data. Therefore, each class label is considered as a naive group of aquifer. It is strongly recommended to provide a default group passes to parameter default_arr to substitute the group of aquifers for more pratical reason. For instance it can be the layer collected at a specific depth like the ‘strata’ columns.
default_arr (array-like, pd.Series) – Array used as deefault for subsitutue the group of aqquifer if the latter is missing. This is an heuristic option because it might lead to breaking code or invalid results.
keep_label_0 (bool, default=False) – The prediction already include the label 0. However, including 0 in the predicted label refers to ‘k=0’ i.e. no permeability coefficient equals to 0, which is not True in principle, because all rocks have a permeability coefficient ‘k’. Here we considered ‘k=0’ as an undefined permeability coefficient. Therefore, ‘0’ , can be exclude since, it can also considered as a missing ‘k’-value. If predicted ‘0’ is in the target it should mean a missing ‘k’-value rather than being a concrete label. Therefore, to avoid any confusion, ‘0’ is altered to ‘1’ so the value +1 is used to move forward all class labels thereby excluding the ‘0’ label. To force include 0 in the label, set keep_label_0 to True.
method (str ['naive', 'strict'], default='naive') –
The kind of strategy to compute the representativity of a label in the predicted array ‘array_aq’. It can also be ‘strict’. Indeed:
- naive computes the importance of the label by the number of its
  occurence for this specific label in the array ‘k’. It does not take into account of the occurence of other existing labels. This is usefull for unbalanced class labels in arr_k.
- strict computes the importance of the label by the number of
  occurence in the whole valid arr_k i.e. under the total of occurence of all the labels that exist in the whole ‘arra_aq’. This can give a suitable anaylse results if the data is not unbalanced for each labels in arr_k.

Returns:

_Group – Use attribute .groups to find the group values.

Return type:

_Group class object

Examples

Use the real aquifer group collected in the area

>>> from watex.utils import naive_imputer, read_data, reshape
>>> from watex.datasets import load_hlogs
>>> from watex.utils.hydroutils import classify_k, find_aquifer_groups
>>> b= load_hlogs () #just taking the target names
>>> data = read_data ('data/boreholes/hf.csv') # read complete data
>>> y = data [b.target_names]
>>> # impute the missing values found in aquifer group columns
>>> # reshape 1d array along axis 0 for imputation
>>> agroup_imputed = naive_imputer ( reshape (y.aquifer_group, axis =0 ) ,
...                                    strategy ='most_frequent')
>>> # reshape back to array_like 1d
>>> y.aquifer_group =reshape (agroup_imputed)
>>> # categorize the 'k' continous value in 'y.k' using the default
>>> # 'k' mapping func
>>> y.k = classify_k (y.k , default_func =True)
>>> # get the group obj
>>> group_obj = find_aquifer_groups(y.k, y.aquifer_group)
>>> group_obj
_Group(Label=[' 1 ',
             Preponderance( rate = '53.141  %',
                           [('Groups', {'V': 0.32, 'IV': 0.266, 'II': 0.236,
                                        'III': 0.158, 'IV&V': 0.01,
                                        'II&III': 0.005, 'III&IV': 0.005}),
                            ('Representativity', ( 'V', 0.32)),
                            ('Similarity', 'V')])],
        Label=[' 2 ',
              Preponderance( rate = ' 19.11  %',
                           [('Groups', {'III': 0.274, 'II': 0.26, 'V': 0.26,
                                        'IV': 0.178, 'III&IV': 0.027}),
                            ('Representativity', ( 'III', 0.27)),
                            ('Similarity', 'III')])],
        Label=[' 3 ',
              Preponderance( rate = '27.749  %',
                           [('Groups', {'V': 0.443, 'IV': 0.311, 'III': 0.245}),
                            ('Representativity', ( 'V', 0.44)),
                            ('Similarity', 'V')])],
             )
(2) Use the subjectivity and set the strata columns as default array

>>> find_aquifer_groups(y.k, subjectivity=True, default_arr= X.strata_name )
_Group(Label=[' 1 ',
             Preponderance( rate = '53.141  %',
                           [('Groups', {'siltstone': 0.35, 'coal': 0.227,
                                        'fine-grained sandstone': 0.158,
                                        'medium-grained sandstone': 0.094,
                                        'mudstone': 0.079,
                                        'carbonaceous mudstone': 0.054,
                                        'coarse-grained sandstone': 0.03,
                                        'coarse': 0.01}),
                            ('Representativity', ( 'siltstone', 0.35)),
                            ('Similarity', 'siltstone')])],
        Label=[' 2 ',
              Preponderance( rate = ' 19.11  %',
                           [('Groups', {'mudstone': 0.288, 'siltstone': 0.205,
                                        'coal': 0.192,
                                        'coarse-grained sandstone': 0.137,
                                        'fine-grained sandstone': 0.137,
                                        'carbonaceous mudstone': 0.027,
                                        'medium-grained sandstone': 0.014}),
                            ('Representativity', ( 'mudstone', 0.29)),
                            ('Similarity', 'mudstone')])],
        Label=[' 3 ',
              Preponderance( rate = '27.749  %',
                           [('Groups', {'mudstone': 0.245, 'coal': 0.226,
                                        'siltstone': 0.217,
                                        'fine-grained sandstone': 0.123,
                                        'carbonaceous mudstone': 0.066,
                                        'medium-grained sandstone': 0.066,
                                        'coarse-grained sandstone': 0.057}),
                            ('Representativity', ( 'mudstone', 0.24)),
                            ('Similarity', 'mudstone')])],
             )

watex.utils.hydroutils.find_similar_labels(y_true, y_pred, *, categorize_k=False, threshold=None, func=None, keep_label_0=False, method='naive', return_groups=False, **kwd)[source]#

Find similarities between y_true and y_pred and returns rate

Parameters:

y_true (array-like 1d or pandas.Series) – Array containing the true labels of ‘k’
y_pred (array_like, or pandas.Series) – array containing the predicted naive group of aquifers (NGA)
categorize_k (bool,) – If set to True, user needs to provide a function ufunc to map or categorize the permeability coefficient ‘k’ into an integer labels.
func (callable) – Function to specifically map the permeability coefficient column in the dataframe of serie. If not given, the default function can be enabled instead from param default_func.
threshold (float, default=None) – The threshold from which, label in ‘y_true’ can be considered similar than the one in NGA labels ‘y_pred’. The default is ‘None’ which means none rule is considered and the high preponderence or occurence in the data compared to other labels is considered as the most representative and similar. Setting the rule instead by fixing the threshold is recommended especially in a huge dataset.
keep_label_0 (bool, default=0) –
Force including 0 in the predicted label if include_label_0 is set to True. Mostly label ‘0’ refers to ‘k=0’ i.e. no permeability coefficient equals to 0, which is not True in principle, because all rocks have a permeability coefficient ‘k’. Here we considered ‘k=0’ as an undefined permeability coefficient. Therefore, ‘0’ , can be exclude since, it can also considered as a missing ‘k’-value. If predicted ‘0’ is in the target it should mean a missing ‘k’-value rather than being a concrete label. Therefore, to avoid any confusion, ‘0’ is removed by default in the ‘k’ categorization. However, when the prediction ‘y_pred’ is made from the the unsupervising method, the prediction ‘0’ straigthforwardly includes

’0’ i.e ‘k=0’ as a first class. So the value +1 is used to move forward

all class labels thereby excluding the ‘0’ label. To force include 0 in the label, set include_label_0 to True.
method (str ['naive', 'strict'], default='naive') –
The kind of strategy to compute the representativity of a label in the predicted array ‘y_pred’. It can also be ‘strict’. Indeed:
- naive computes the importance of the label by the number of its
  occurence for this specific label in the array ‘y_true’. It does not take into account of the occurence of other existing labels. This is usefull for unbalanced class labels in y_true.
- strict computes the importance of the label by the number of
  occurence in the whole valid y_true i.e. under the total of occurence of all the labels that exist in the whole ‘arra_aq’. This can give a suitable anaylse results if the data is not unbalanced for each labels in y_pred.
return_groups (bool, default=False) – Returns label groups and their values counts in the predicted labels y_pred where ‘k’ values are not missing.

Returns:

g.similarity (Tuple of labels found that are considered similar in) – predicted labels.
g.group (Tuple of group that have their similarity in the true labels)

Example

>>> from watex.utils import read_data
>>> from watex.utils.hydroutils import find_similar_labels, classify_k
>>> data = read_data ('data/boreholes/hf.csv')
>>> ymap = classify_k(data.k , default_func =True)
>>> # Note that for the demo we use the group of aquifer columns, however
>>> # in pratical example, y_pred must be a predicted NGA labels. This
>>> # is possible using the function <predict_NGA_labels>
>>> sim = find_similar_labels(y_true= ymap, y_pred=data.aquifer_group)
>>> sim
... ((1, 'V'), (2, 'III'), (3, 'V'))
>>> group= find_similar_labels(ymap, data.aquifer_group, return_groups=True)
>>> group
... ((1,
  {'V': 0.17,
   'IV': 0.141,
   'II': 0.126,
   'III': 0.084,
   'IV&V': 0.005,
   'II&III': 0.003,
   'III&IV': 0.003}),
 (2, {'III': 0.052, 'II': 0.05, 'V': 0.05, 'IV': 0.034, 'III&IV': 0.005}),
 (3, {'V': 0.123, 'IV': 0.086, 'III': 0.068}))
>>> find_similar_labels(y_true= ymap, y_pred=data.aquifer_group,
                              threshold = 0.15)
... [(1, 'V')]

watex.utils.hydroutils.get_aquifer_section(arr_k, /, zname=None, kname=None, z=None, return_index=False, return_sections=True)[source]#

Detect a single aquifer section (upper and lower) in depth.

This is useful trip to compute the thickness of the aquifer.

Parameters:

arr_k (ndarray or dataframe) – Data that contains mainly the aquifer values. It can also contains the depth values. If the depth is included in the arr_k, zname needs to be supplied for recovering and depth.
zname (str, int) – Name of depth columns. zname allows to retrieve the depth column in a dataframe. If integer is passed, it assumes the index of the dataframe fits the depth column. Integer value must not be out the dataframe size along axis 1. Commonly `zname`needs to be supplied when a dataframe is passed to a function argument.
kname (str, int) –

Name of permeability coefficient columns. kname allows to retrieve the
permeability coefficient ‘k’ in a specific dataframe. If integer is passed, it assumes the index of the dataframe fits the ‘k’ columns. Note that integer value must not be out the dataframe size along axis 1. Commonly

kname needs to be supplied when a dataframe is passed as a positional
or keyword argument.
z (array-like 1d, pandas.Series) – Array of depth or a pandas series that contains the depth values. Two dimensional array or more is not allowed. However when z is given as a dataframe and zname is not supplied, an error raises since zname is used to fetch and overwritten z from the dataframe.
return_index (bool, default =False ,) –

Returns the positions (indexes) of the upper and lower sections of the
aquifer found in the dataframe arr_k.
return_sections (bool, default=True,) – Returns the sections (upper and lower) of the aquifers.

Returns:

up, low –

(upix, lowix ): Tuple of indexes of lower and upper sections
(up, low): Tuple of aquifer sections (upper and lower)
(upix, lowix), (up, low)positions and sections values of aquifers
if return_index and return_sections` are True.

Return type:

list of upper and lower section values of aquifer.

Example

>>> from watex.datasets import load_hlogs
>>> from watex.utils.hydroutils import get_aquifer_section
>>> data = load_hlogs ().frame # return all data including the 'depth' values
>>> get_aquifer_section (data , zname ='depth', kname ='k')
... [197.12, 369.71] # section starts from 197.12 -> 369.71 m
>>> get_aquifer_section (data , zname ='depth', kname ='k', return_index=True)
... ([16, 29], [197.12, 369.71]) # upper and lower-> position 16 and 29.

watex.utils.hydroutils.get_aquifer_sections(*data, zname, kname, return_index=False, return_data=False, error='ignore', **kws)[source]#

Get the section of each aquifer form multiple dataframes.

The unique section ‘upper’ and ‘lower’ is the valid range of the whole data to consider as a valid data. The use of the index is necessary to shrunk the data of the whole boreholes. Mosly the data from the section is consided the valid data as the predictor Xr. Out of the range of aquifers ection, data can be discarded or compressed to top Xr.

Returns valid section indexes if ‘return_index’ is set to True.

Parameters:

data (list of pandas dataframe) – Data that contains mainly the aquifer values. It needs to specify the name of the depth column zname as well as the name of permeabiliy kname column.
zname (str, int) – Name of depth columns. zname allows to retrieve the depth column in a dataframe. If integer is passed, it assumes the index of the dataframe fits the depth column. Integer value must not be out the dataframe size along axis 1. Commonly `zname`needs to be supplied when a dataframe is passed to a function argument.
kname (str, int) –

Name of permeability coefficient columns. kname allows to retrieve the
permeability coefficient ‘k’ in a specific dataframe. If integer is passed, it assumes the index of the dataframe fits the ‘k’ columns. Note that integer value must not be out the dataframe size along axis 1. Commonly

kname needs to be supplied when a dataframe is passed as a positional
or keyword argument.
z (array-like 1d, pandas.Series) – Array of depth or a pandas series that contains the depth values. Two dimensional array or more is not allowed. However when z is given as a dataframe and zname is not supplied, an error raises since zname is used to fetch and overwritten z from the dataframe.
return_index (bool, default =False ,) – Returns the positions (indexes) of the upper and lower sections of the each aquifer found in each dataframe.
error (str, default='ignore') – Raise errors if trouble occurs when computing the section of each aquifer. If ‘ignore’, a UserWarning is displayed if invalid data is found. Any other value of error will set error to raise.
return_data (bool, default=False,) – Return valid data. It is usefull when ‘error’ is set to ‘ignore’ to collect the valid data.
kws (dict,) – Additional keywords arguments passed to get_aquifer_sections().

Returns:

up, low –

(upix, lowix ): Tuple of indexes of lower and upper sections
(up, low): Tuple of aquifer sections (upper and lower)
(upix, lowix), (up, low)positions and sections values of aquifers
if return_index and return_sections` are True.

Return type:

list of upper and lower section values of aquifer.

See also

predict_NGA_labels: Predicts Naive group of Aquifers labels.

Examples

>>> from watex.datasets import load_hlogs
>>> from watex.utils import read_data
>>> from watex.utils.hydroutils import classify_k, make_MXS_labels
>>> data = load_hlogs ().frame
>>> # map data.k to categorize k values
>>> ymap = classify_k(data.k , default_func =True)
>>> y_mxs = make_MXS_labels (ymap, data.aquifer_group)
>>> y_mxs[14:24]
...  array(['I', 'I', 2, 2, 2, 2, 2, 2, 2, 2], dtype=object)
>>> mxs_obj = make_MXS_labels (ymap, data.aquifer_group, return_obj=True )
>>> mxs_obj.mxs_labels_[14: 24]
... array(['I', 'I', 2, 2, 2, 2, 2, 2, 2, 2], dtype=object)
>>> # now we did the same task using the private data 'hf.csv'
>>> # composed of 11 boreholes. For default we alternatively uses
>>> # the aquifer groups like a fake NGA
>>> data = read_data ('data/boreholes/hf.csv')
>>> ymap =  classify_k(data.k , default_func =True)
>>> y_mxs= make_MXS_labels (ymap, data.aquifer_group)
>>> np.unique (y_mxs)
... array(['1', '1V', '2', '2III', '3', 'I', 'II', 'III&IV', 'IV'],
      dtype='<U6')
>>> # *comments:
    # label '1V' means the group V (expected to be a cluster)
    # and label 1 (true labels) have a similarity
    # the same of label '2III' while the remain label 3 does not
    #  any similarity in the other labels  in the 'y_pred' expected
    # to be NGA labels.

watex.utils.hydroutils.predict_NGA_labels(X, /, n_clusters, random_state=0, keep_label_0=False, return_cluster_centers=False, **kws)[source]#

Predict the Naive Group of Aquifer (NGA) labels.

Parameters:

X ({array-like, sparse matrix} of shape (n_samples, n_features)) – Training instances to cluster. It must be noted that the data will be converted to C ordering, which will cause a memory copy if the given data is not C-contiguous. If a sparse matrix is passed, a copy will be made if it’s not in CSR format.
n_clusters (int, default=8) – The number of clusters to form as well as the number of centroids to generate.
random_state (int, RandomState instance or None, default=42) – Determines random number generation for centroid initialization. Use an int to make the randomness deterministic.
keep_label_0 (bool, default=False) – The prediction already include the label 0. However, including 0 in the predicted label refers to ‘k=0’ i.e. no permeability coefficient equals to 0, which is not True in principle, because all rocks have a permeability coefficient ‘k’. Here we considered ‘k=0’ as an undefined permeability coefficient. Therefore, ‘0’ , can be exclude since, it can also considered as a missing ‘k’-value. If predicted ‘0’ is in the target it should mean a missing ‘k’-value rather than being a concrete label. Therefore, to avoid any confusion, ‘0’ is altered to ‘1’ so the value +1 is used to move forward all class labels thereby excluding the ‘0’ label. To force include 0 in the label, set keep_label_0 to True.
return_cluster_centers (bool, default=False,) – export the array of clusters centers if True.
kws (dict,) – Additional keyword arguments passed to sklearn.clusters.KMeans.

Returns:

NGA (array_like of shape (n_samples, n_features)) – Predicted NGA labels.
( NGA , cluster_centers) (Tuple of array-like,) – MGA and clusters centers if return_cluster_centers` is set to ``True.

watex.utils.hydroutils.reduce_samples(*data, sname, zname=None, kname=None, section_indexes=None, error='raise', strategy='average', verify_integrity=False, ignore_index=False, **kws)[source]#

Create a new dataframe by squeezing/compressing the non valid data.

The m-samples reduction is necessary for the dataset with a lot of missing k-values. The technique of shrinking the number of k0 –values (k-missing values ) seems a relevant idea. It consists to compressed the values of the missing \(k -values from the top ( depth equals 0 ) thin the upper section of the first aquifer with lower depth into a single vector :math:`x_r\) with dimension (1×n ) i.e. contains the n-features.

Parameters:

data (list of dataframes) – Data that contains mainly the aquifer values. It must contains the depth values refering at the column_name passed at zname and the permeability coefficient k passed to kname . Both argument need t supplied when datafame as passes as positional arguments.
sname (str, optional) – Name of column in the dataframe that contains the strata values. Dont confuse ‘sname’ with ‘stratum’ which is the name of the valid layer/rock in the array/Series of strata.
zname (str, int) – Name of depth columns. zname allows to retrieve the depth column in a dataframe. If integer is passed, it assumes the index of the dataframe fits the depth column. Integer value must not be out the dataframe size along axis 1. Commonly `zname`needs to be supplied when a dataframe is passed to a function argument.
kname (str, int) –

Name of permeability coefficient columns. kname allows to retrieve the
permeability coefficient ‘k’ in a specific dataframe. If integer is passed, it assumes the index of the dataframe fits the ‘k’ columns. Note that integer value must not be out the dataframe size along axis 1. Commonly

kname needs to be supplied when a dataframe is passed as a positional
or keyword argument.
z (array-like 1d, pandas.Series) – Array of depth or a pandas series that contains the depth values. Two dimensional array or more is not allowed. However when z is given as a dataframe and zname is not supplied, an error raises since zname is used to fetch and overwritten z from the dataframe.
strategy (str , default='average' or 'mean',) – strategy used to select or compute the numerical data into a singular series. It can be [‘naive’]. In that case , a single serie if randomly picked up into the base strata data.
section_indexes (tuple or list of int) – list of a pair tuple or list of integers. It is be the the valid sections( upper and lower ) indexes of of the aquifer. If the depth range z_range and zname are supplied, section_indexes can be None. Note that the last indix is considered as the last position, the bottom of the section therefore, its value is included in the data.
error (str, default='raise') – Raise errors if trouble occurs when computing the section of each aquifer. If ‘ignore’, a UserWarning is displayed when invalid data is found. Any other value of error will set error to raise.
verify_integrity (bool, default=False) –
Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method. if ‘True’, remove the duplicate rows from a DataFrame.

subset: By default, if the rows have the same values in all the columns, they are considered duplicates. This parameter is used to specify the columns that only need to be considered for identifying duplicates. keep: Determines which duplicates (if any) to keep. It takes inputs as, first – Drop duplicates except for the first occurrence. This is the default behavior. last – Drop duplicates except for the last occurrence. False – Drop all duplicates. inplace: It is used to specify whether to return a new DataFrame or update an existing one. It is a boolean flag with default False.
ignore_index (bool, default=False,) – It is a boolean flag to indicate if row index should be reset after dropping duplicate rows. False: It keeps the original row index. True: It reset the index, and the resulting rows will be labeled 0, 1, …, n – 1.

Returns:

df_new – new dataframes with reducing samples.

Return type:

List of pandas.dataframes

Example

>>> from watex.datasets import load_hlogs
>>> from watex.utils.hydroutils import reduce_samples
>>> data = load_hlogs ().frame # get the frames
>>> # add explicitly the aquifer section indices
>>> dfnew= reduce_samples (data.copy(), sname='strata_name',
                             section_indexes = (16, 29 ),)
>>> dfnew[0]
...    hole_number               strata_name     rock_name  ...      r     rp  remark
    0         H502                  mudstone           J2z  ...    NaN    NaN     NaN
    16        H502                 siltstone           NaN  ...  35.74  59.23     NaN
    17        H502    fine-grained sandstone           NaN  ...  35.74  59.23     NaN
    18        H502                 siltstone           NaN  ...  35.74  59.23     NaN
    19        H502    fine-grained sandstone           NaN  ...  35.74  59.23     NaN
    20        H502                  mudstone           NaN  ...  35.74  59.23     NaN
    21        H502                 siltstone           NaN  ...  35.74  59.23     NaN
    22        H502    fine-grained sandstone           NaN  ...  59.61  59.23     NaN
    23        H502                 siltstone           NaN  ...  59.61  59.23     NaN
    24        H502    fine-grained sandstone           NaN  ...  59.61  59.23     NaN
    25        H502  Coarse-grained sandstone           NaN  ...  59.61  59.23     NaN
    26        H502                  mudstone           NaN  ...  82.33  59.23     NaN
    27        H502    fine-grained sandstone           NaN  ...  82.33  59.23     NaN
    28        H502  Coarse-grained sandstone           J2z  ...  82.33  59.23     NaN
    29        H502                      coal  (J2y)  2coal  ...  82.33  59.23     NaN
    0         H502                 siltstone           NaN  ...    NaN    NaN     NaN

[16 rows x 23 columns] >>> # specify the column name and kname without section indexes >>> dfnew= reduce_samples (

data.copy(), sname=’strata_name’, data, zname=’depth’, kname=’k’, ignore_index= True )[0]

… dfnew[0].index # index is reset … RangeIndex(start=0, stop=16, step=1)

watex.utils.hydroutils.rename_labels_in(arr, new_names, coerce=False)[source]#

Rename label by a new names

Parameters:

arr – arr: array-like |pandas.Series array or series containing numerical values. If a non-numerical values is given , an errors will raises.
new_names – list of str; list of string or values to replace the label integer identifier.
coerce – bool, default =False, force the ‘new_names’ to appear in the target including or not some integer identifier class label. coerce is True, the target array hold the dtype of new_array; coercing the label names will not yield error. Consequently can introduce an unexpected results.

Returns:

array-like, An array-like with full new label names.

watex.utils.hydroutils.select_base_stratum(d, /, sname=None, stratum=None, return_rate=False, return_counts=False)[source]#

Selects base stratum from the the strata column in the logging data.

Find the most recurrent stratum in the data and compute the rate of occurrence.

Parameters:

d (array-like 1D , pandas.Series or DataFrame) – Valid data containing the strata. If dataframe is passed, ‘sname’ is needed to fetch strata values.
sname (str, optional) – Name of column in the dataframe that contains the strata values. Dont confuse ‘sname’ with ‘stratum’ which is the name of the valid layer/rock in the array/Series of strata.
stratum (str, optional) – Name of the base stratum. Must be self contain as an item of the strata data. Note that if stratum is passed, the auto-detection of base stratum is not triggered. It returns the same stratum , however it can gives the rate and occurence of this stratum if return_rate or return_counts is set to True.
return_rate (bool,default=False,) – Returns the rate of occurence of the base stratum in the data.
return_counts (bool, default=False,) – Returns each stratum name and the occurences (count) in the data.

Returns:

bs (str) – base stratum , self contain in the data
r (float) – rate of occurence in base stratum in the data
c (tuple (str, int)) – Tuple of each stratum whith their occurrence in the data.

Example

>>> from watex.datasets import load_hlogs
>>> from watex.utils.hydroutils import select_base_stratum
>>> data = load_hlogs().frame # get only the frame
>>> select_base_stratum(data, sname ='strata_name')
... 'siltstone'
>>> select_base_stratum(data, sname ='strata_name', return_rate =True)
... 0.287292817679558
>>> select_base_stratum(data, sname ='strata_name', return_counts=True)
... [('siltstone', 52),
     ('fine-grained sandstone', 40),
     ('mudstone', 37),
     ('coal', 24),
     ('Coarse-grained sandstone', 15),
     ('carbonaceous mudstone', 9),
     ('medium-grained sandstone', 2),
     ('topsoil', 1),
     ('gravel layer', 1)]

watex.utils.hydroutils.transmissibility(s, d, time)[source]#

Transmissibility T represents the ability of aquifer’s water conductivity.

It is the numeric equivalent of the product of hydraulic conductivity times aquifer’s thickness (T = KM), which means it is the seepage flow under the condition of unit hydraulic gradient, unit time, and unit width

watex.utils.hydroutils.validate_labels(t, /, labels, return_bool=False)[source]#

Assert the validity of the label in the target and return the label or the boolean whether all items of label are in the target.

Parameters:

t – array-like, target that is expected to contain the labels.
labels – int, str or list of (str or int) that is supposed to be in the target t.
return_bool – bool, default=False; returns ‘True’ or ‘False’ rather the labels if set to True.

Returns:

bool or labels; ‘True’ or ‘False’ if return_bool is set to True and labels otherwise.

Example:

>>> from watex.datasets import fetch_data
>>> from watex.utils.mlutils import cattarget, labels_validator
>>> _, y = fetch_data ('bagoue', return_X_y=True, as_frame=True)
>>> # binarize target y into [0 , 1]
>>> ybin = cattarget(y, labels=2 )
>>> validate_labels (ybin, [0, 1])
... [0, 1] # all labels exist.
>>> validate_labels (y, [0, 1, 3])
... ValueError: Value '3' is missing in the target.
>>> validate_labels (ybin, 0 )
... [0]
>>> validate_labels (ybin, [0, 5], return_bool=True ) # no raise error
... False