Hydrogeological parameters of aquifer are the essential and crucial basic data in the designing and construction progress of geotechnical engineering and groundwater dewatering, which are directly related to the reliability of these parameters.

Note

For strong and clear demonstration as examples in many scripts, we use the data ‘hf.csv’. This data is a confident data so it is not available in the package. The idea consists to show how scripts will works if many boreholes data are available.

watex.utils.hydroutils.categorize_target(arr, /, func=None, labels=None, rename_labels=None, coerce=False, order='strict')[source]#

Categorize array to hold the given identifier labels.

Classifier numerical values according to the given label values. Labels are a list of integers where each integer is a group of unique identifier of a sample in the dataset.

Parameters:
  • arr (array-like |pandas.Series) – array or series containing numerical values. If a non-numerical values is given , an errors will raises.

  • func (Callable,) – Function to categorize the target y.

  • labels (int, list of int,) – if an integer value is given, it should be considered as the number of category to split ‘y’. For instance label=3 and applied on the first ten number, the labels values should be [0, 1, 2]. If labels are given as a list, items must be self-contain in the target ‘y’.

  • rename_labels (list of str;) – list of string or values to replace the label integer identifier.

  • coerce (bool, default =False,) – force the new label names passed to rename_labels to appear in the target including or not some integer identifier class label. If coerce is True, the target array holds the dtype of new_array.

Returns:

arr – The category array with unique identifer labels

Return type:

Arraylike |pandas.Series

Examples

>>> from watex.utils.mlutils import cattarget
>>> def binfunc(v):
        if v < 3 : return 0
        else : return 1
>>> arr = np.arange (10 )
>>> arr
... array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> target = cattarget(arr, func =binfunc)
... array([0, 0, 0, 1, 1, 1, 1, 1, 1, 1], dtype=int64)
>>> cattarget(arr, labels =3 )
... array([0, 0, 0, 1, 1, 1, 2, 2, 2, 2])
>>> array([2, 2, 2, 2, 1, 1, 1, 0, 0, 0])
>>> cattarget(arr, labels =3 , order =None )
... array([0, 0, 0, 0, 1, 1, 1, 2, 2, 2])
>>> cattarget(arr[::-1], labels =3 , order =None )
... array([0, 0, 0, 1, 1, 1, 2, 2, 2, 2]) # reverse does not change
>>> cattarget(arr, labels =[0 , 2,  4]  )
... array([0, 0, 0, 2, 2, 4, 4, 4, 4, 4])
watex.utils.hydroutils.check_flow_objectivity(y, /, values, classes)[source]#

Function checks the flow rate objectivity

If objective is set to flow i.e the prediction focuses on the flow rate, there are some conditions that the target y needs to meet when values are passed for classes categorization.

Parameters:
  • values – list of values to encoding the numerical target y. for instance values=[0, 1, 2]

  • objective – str, relate to the flow rate prediction. Set to None for any other predictions.

  • prefix

    the prefix to add to the class labels. For instance, if the prefix equals to FR, class labels will become:

    [0, 1, 2] => [FR0, FR1, FR2]
    

  • classes

    list of classes names to replace the default FR that is used to specify the flow rate. For instance, it can be:

    [0, 1, 2] => [sf0, sf1, sf2]
    

Returns:

(y, classes): Tuple, - y: array-like 1d of categorized y - classes: list of flow rate classes.

watex.utils.hydroutils.classify_k(o, /, func=None, kname=None, inplace=False, string=False, default_func=False)[source]#

Categorize the permeability coefficient ‘k’

Map the continuous ‘k’ into categorial classes.

Parameters:
  • o (ndarray of pd.Series or Dataframe) – data containing the permeability coefficient k contineous values. If data is passsed as a pandas dataframe, the column containing the k-values kname needs to be specified.

  • func (callable) – Function to specifically map the permeability coefficient column in the dataframe of serie. If not given, the default function can be enabled instead from param default_func.

  • inplace (bool, default=False) – Modified object inplace and return None

  • string (bool,) – If set to “True”, categorized map from ‘k’ should be prefixed by “k”. However is string value is given , the prefix is changed according to this label.

  • default_ufunc (bool,) –

    Default function for mapping k is setting to True. Note that, this could probably not fitted your own data. So it is recommended to provide your own function for mapping ‘k’. However the default ‘k’ mapping is given as follow:

    • k0 {0}: k = 0

    • k1 {1}: 0 < k <= .01

    • k2 {2}: .01 < k <= .07

    • k3 {3}: k> .07

Returns:

o – return None only if dataframe is given and inplace is set to True i.e modified object inplace.

Return type:

None, ndarray, Series or Dataframe

Examples

>>> import numpy as np
>>> from watex.datasets import load_hlogs
>>> from watex.utils.hydroutils import classify_k
>>> _, y0 = load_hlogs (as_frame =True)
>>> # let visualize four nonzeros values in y0
>>> y0.k.values [ ~np.isnan (y0.k ) ][:4]
...  array([0.054, 0.054, 0.054, 0.054])
>>> classify_k (y0 , kname ='k', inplace =True, use_default_func=True )
>>> # let see again the same four value in the dataframe
>>> y0.k.values [ ~np.isnan (y0.k ) ][:4]
... array([2., 2., 2., 2.])
watex.utils.hydroutils.find_aquifer_groups(arr_k, /, arr_aq=None, kname=None, aqname=None, subjectivity=False, default_arr=None, keep_label_0=False, method='naive')[source]#

Fit the group of aquifer and find the representative of each true label in array ‘k’ in the aquifer group array.

The idea consists to find the corresponding aquifer group which fits the most the true label ‘X’ in ‘y_true’.

‘arr_k’ and ‘arr_aq’ must contain a class label, not continue values.

Parameters:
  • arr_k (array_like, pandas series or dataframe) – arraylike that contains the permeability coefficients ‘k’. If a dataframe is supplied, the permeabitlity coefficient column name ‘kname’ must be specified.

  • arr_aq (array-like , pandas series or dataframe) – array-like that contains the aquifer groups. If NAN values exists in the aquifer groups, it is suggested to imputed values before feediing to the algorithms. Missing values are not allowed. If dataframe is supplied, the aquifer group column name ‘aqname’ must be specified.

  • kname (str, int) –

    Name of permeability coefficient columns. kname allows to retrieve the

    permeability coefficient ‘k’ in a specific dataframe. If integer is passed, it assumes the index of the dataframe fits the ‘k’ columns. Note that integer value must not be out the dataframe size along axis 1. Commonly

    kname needs to be supplied when a dataframe is passed as a positional

    or keyword argument.

  • aqname (str, optional,) –

    Name of aquifer group columns. aqname allows to retrieve the

    aquifer group arr_aq value in a specific dataframe. Commonly

    aqname needs to be supplied when a dataframe is passed as a positional

    or keyword argument.

  • subjectivity (bool, default=False) – Considers each class label as a naive group of aquifer. Subjectivity occurs when no group of aquifer is not found in the data. Therefore, each class label is considered as a naive group of aquifer. It is strongly recommended to provide a default group passes to parameter default_arr to substitute the group of aquifers for more pratical reason. For instance it can be the layer collected at a specific depth like the ‘strata’ columns.

  • default_arr (array-like, pd.Series) – Array used as deefault for subsitutue the group of aqquifer if the latter is missing. This is an heuristic option because it might lead to breaking code or invalid results.

  • keep_label_0 (bool, default=False) – The prediction already include the label 0. However, including 0 in the predicted label refers to ‘k=0’ i.e. no permeability coefficient equals to 0, which is not True in principle, because all rocks have a permeability coefficient ‘k’. Here we considered ‘k=0’ as an undefined permeability coefficient. Therefore, ‘0’ , can be exclude since, it can also considered as a missing ‘k’-value. If predicted ‘0’ is in the target it should mean a missing ‘k’-value rather than being a concrete label. Therefore, to avoid any confusion, ‘0’ is altered to ‘1’ so the value +1 is used to move forward all class labels thereby excluding the ‘0’ label. To force include 0 in the label, set keep_label_0 to True.

  • method (str ['naive', 'strict'], default='naive') –

    The kind of strategy to compute the representativity of a label in the predicted array ‘array_aq’. It can also be ‘strict’. Indeed:

    • naive computes the importance of the label by the number of its

      occurence for this specific label in the array ‘k’. It does not take into account of the occurence of other existing labels. This is usefull for unbalanced class labels in arr_k.

    • strict computes the importance of the label by the number of

      occurence in the whole valid arr_k i.e. under the total of occurence of all the labels that exist in the whole ‘arra_aq’. This can give a suitable anaylse results if the data is not unbalanced for each labels in arr_k.

Returns:

_Group – Use attribute .groups to find the group values.

Return type:

_Group class object

Examples

  1. Use the real aquifer group collected in the area

>>> from watex.utils import naive_imputer, read_data, reshape
>>> from watex.datasets import load_hlogs
>>> from watex.utils.hydroutils import classify_k, find_aquifer_groups
>>> b= load_hlogs () #just taking the target names
>>> data = read_data ('data/boreholes/hf.csv') # read complete data
>>> y = data [b.target_names]
>>> # impute the missing values found in aquifer group columns
>>> # reshape 1d array along axis 0 for imputation
>>> agroup_imputed = naive_imputer ( reshape (y.aquifer_group, axis =0 ) ,
...                                    strategy ='most_frequent')
>>> # reshape back to array_like 1d
>>> y.aquifer_group =reshape (agroup_imputed)
>>> # categorize the 'k' continous value in 'y.k' using the default
>>> # 'k' mapping func
>>> y.k = classify_k (y.k , default_func =True)
>>> # get the group obj
>>> group_obj = find_aquifer_groups(y.k, y.aquifer_group)
>>> group_obj
_Group(Label=[' 1 ',
             Preponderance( rate = '53.141  %',
                           [('Groups', {'V': 0.32, 'IV': 0.266, 'II': 0.236,
                                        'III': 0.158, 'IV&V': 0.01,
                                        'II&III': 0.005, 'III&IV': 0.005}),
                            ('Representativity', ( 'V', 0.32)),
                            ('Similarity', 'V')])],
        Label=[' 2 ',
              Preponderance( rate = ' 19.11  %',
                           [('Groups', {'III': 0.274, 'II': 0.26, 'V': 0.26,
                                        'IV': 0.178, 'III&IV': 0.027}),
                            ('Representativity', ( 'III', 0.27)),
                            ('Similarity', 'III')])],
        Label=[' 3 ',
              Preponderance( rate = '27.749  %',
                           [('Groups', {'V': 0.443, 'IV': 0.311, 'III': 0.245}),
                            ('Representativity', ( 'V', 0.44)),
                            ('Similarity', 'V')])],
             )
(2) Use the subjectivity and set the strata columns as default array
>>> find_aquifer_groups(y.k, subjectivity=True, default_arr= X.strata_name )
_Group(Label=[' 1 ',
             Preponderance( rate = '53.141  %',
                           [('Groups', {'siltstone': 0.35, 'coal': 0.227,
                                        'fine-grained sandstone': 0.158,
                                        'medium-grained sandstone': 0.094,
                                        'mudstone': 0.079,
                                        'carbonaceous mudstone': 0.054,
                                        'coarse-grained sandstone': 0.03,
                                        'coarse': 0.01}),
                            ('Representativity', ( 'siltstone', 0.35)),
                            ('Similarity', 'siltstone')])],
        Label=[' 2 ',
              Preponderance( rate = ' 19.11  %',
                           [('Groups', {'mudstone': 0.288, 'siltstone': 0.205,
                                        'coal': 0.192,
                                        'coarse-grained sandstone': 0.137,
                                        'fine-grained sandstone': 0.137,
                                        'carbonaceous mudstone': 0.027,
                                        'medium-grained sandstone': 0.014}),
                            ('Representativity', ( 'mudstone', 0.29)),
                            ('Similarity', 'mudstone')])],
        Label=[' 3 ',
              Preponderance( rate = '27.749  %',
                           [('Groups', {'mudstone': 0.245, 'coal': 0.226,
                                        'siltstone': 0.217,
                                        'fine-grained sandstone': 0.123,
                                        'carbonaceous mudstone': 0.066,
                                        'medium-grained sandstone': 0.066,
                                        'coarse-grained sandstone': 0.057}),
                            ('Representativity', ( 'mudstone', 0.24)),
                            ('Similarity', 'mudstone')])],
             )
watex.utils.hydroutils.find_similar_labels(y_true, y_pred, *, categorize_k=False, threshold=None, func=None, keep_label_0=False, method='naive', return_groups=False, **kwd)[source]#

Find similarities between y_true and y_pred and returns rate

Parameters:
  • y_true (array-like 1d or pandas.Series) – Array containing the true labels of ‘k’

  • y_pred (array_like, or pandas.Series) – array containing the predicted naive group of aquifers (NGA)

  • categorize_k (bool,) – If set to True, user needs to provide a function ufunc to map or categorize the permeability coefficient ‘k’ into an integer labels.

  • func (callable) – Function to specifically map the permeability coefficient column in the dataframe of serie. If not given, the default function can be enabled instead from param default_func.

  • threshold (float, default=None) – The threshold from which, label in ‘y_true’ can be considered similar than the one in NGA labels ‘y_pred’. The default is ‘None’ which means none rule is considered and the high preponderence or occurence in the data compared to other labels is considered as the most representative and similar. Setting the rule instead by fixing the threshold is recommended especially in a huge dataset.

  • keep_label_0 (bool, default=0) –

    Force including 0 in the predicted label if include_label_0 is set to True. Mostly label ‘0’ refers to ‘k=0’ i.e. no permeability coefficient equals to 0, which is not True in principle, because all rocks have a permeability coefficient ‘k’. Here we considered ‘k=0’ as an undefined permeability coefficient. Therefore, ‘0’ , can be exclude since, it can also considered as a missing ‘k’-value. If predicted ‘0’ is in the target it should mean a missing ‘k’-value rather than being a concrete label. Therefore, to avoid any confusion, ‘0’ is removed by default in the ‘k’ categorization. However, when the prediction ‘y_pred’ is made from the the unsupervising method, the prediction ‘0’ straigthforwardly includes

    ’0’ i.e ‘k=0’ as a first class. So the value +1 is used to move forward

    all class labels thereby excluding the ‘0’ label. To force include 0 in the label, set include_label_0 to True.

  • method (str ['naive', 'strict'], default='naive') –

    The kind of strategy to compute the representativity of a label in the predicted array ‘y_pred’. It can also be ‘strict’. Indeed:

    • naive computes the importance of the label by the number of its

      occurence for this specific label in the array ‘y_true’. It does not take into account of the occurence of other existing labels. This is usefull for unbalanced class labels in y_true.

    • strict computes the importance of the label by the number of

      occurence in the whole valid y_true i.e. under the total of occurence of all the labels that exist in the whole ‘arra_aq’. This can give a suitable anaylse results if the data is not unbalanced for each labels in y_pred.

  • return_groups (bool, default=False) – Returns label groups and their values counts in the predicted labels y_pred where ‘k’ values are not missing.

Returns:

  • g.similarity (Tuple of labels found that are considered similar in) – predicted labels.

  • g.group (Tuple of group that have their similarity in the true labels)

Example

>>> from watex.utils import read_data
>>> from watex.utils.hydroutils import find_similar_labels, classify_k
>>> data = read_data ('data/boreholes/hf.csv')
>>> ymap = classify_k(data.k , default_func =True)
>>> # Note that for the demo we use the group of aquifer columns, however
>>> # in pratical example, y_pred must be a predicted NGA labels. This
>>> # is possible using the function <predict_NGA_labels>
>>> sim = find_similar_labels(y_true= ymap, y_pred=data.aquifer_group)
>>> sim
... ((1, 'V'), (2, 'III'), (3, 'V'))
>>> group= find_similar_labels(ymap, data.aquifer_group, return_groups=True)
>>> group
... ((1,
  {'V': 0.17,
   'IV': 0.141,
   'II': 0.126,
   'III': 0.084,
   'IV&V': 0.005,
   'II&III': 0.003,
   'III&IV': 0.003}),
 (2, {'III': 0.052, 'II': 0.05, 'V': 0.05, 'IV': 0.034, 'III&IV': 0.005}),
 (3, {'V': 0.123, 'IV': 0.086, 'III': 0.068}))
>>> find_similar_labels(y_true= ymap, y_pred=data.aquifer_group,
                              threshold = 0.15)
... [(1, 'V')]
watex.utils.hydroutils.get_aquifer_section(arr_k, /, zname=None, kname=None, z=None, return_index=False, return_sections=True)[source]#

Detect a single aquifer section (upper and lower) in depth.

This is useful trip to compute the thickness of the aquifer.

Parameters:
  • arr_k (ndarray or dataframe) – Data that contains mainly the aquifer values. It can also contains the depth values. If the depth is included in the arr_k, zname needs to be supplied for recovering and depth.

  • zname (str, int) – Name of depth columns. zname allows to retrieve the depth column in a dataframe. If integer is passed, it assumes the index of the dataframe fits the depth column. Integer value must not be out the dataframe size along axis 1. Commonly `zname`needs to be supplied when a dataframe is passed to a function argument.

  • kname (str, int) –

    Name of permeability coefficient columns. kname allows to retrieve the

    permeability coefficient ‘k’ in a specific dataframe. If integer is passed, it assumes the index of the dataframe fits the ‘k’ columns. Note that integer value must not be out the dataframe size along axis 1. Commonly

    kname needs to be supplied when a dataframe is passed as a positional

    or keyword argument.

  • z (array-like 1d, pandas.Series) – Array of depth or a pandas series that contains the depth values. Two dimensional array or more is not allowed. However when z is given as a dataframe and zname is not supplied, an error raises since zname is used to fetch and overwritten z from the dataframe.

  • return_index (bool, default =False ,) –

    Returns the positions (indexes) of the upper and lower sections of the

    aquifer found in the dataframe arr_k.

  • return_sections (bool, default=True,) – Returns the sections (upper and lower) of the aquifers.

Returns:

up, low

  • (upix, lowix ): Tuple of indexes of lower and upper sections

  • (up, low): Tuple of aquifer sections (upper and lower)

  • (upix, lowix), (up, low)positions and sections values of aquifers

    if return_index and return_sections` are True.

Return type:

list of upper and lower section values of aquifer.

Example

>>> from watex.datasets import load_hlogs
>>> from watex.utils.hydroutils import get_aquifer_section
>>> data = load_hlogs ().frame # return all data including the 'depth' values
>>> get_aquifer_section (data , zname ='depth', kname ='k')
... [197.12, 369.71] # section starts from 197.12 -> 369.71 m
>>> get_aquifer_section (data , zname ='depth', kname ='k', return_index=True)
... ([16, 29], [197.12, 369.71]) # upper and lower-> position 16 and 29.
watex.utils.hydroutils.get_aquifer_sections(*data, zname, kname, return_index=False, return_data=False, error='ignore', **kws)[source]#

Get the section of each aquifer form multiple dataframes.

The unique section ‘upper’ and ‘lower’ is the valid range of the whole data to consider as a valid data. The use of the index is necessary to shrunk the data of the whole boreholes. Mosly the data from the section is consided the valid data as the predictor Xr. Out of the range of aquifers ection, data can be discarded or compressed to top Xr.

Returns valid section indexes if ‘return_index’ is set to True.

Parameters:
  • data (list of pandas dataframe) – Data that contains mainly the aquifer values. It needs to specify the name of the depth column zname as well as the name of permeabiliy kname column.

  • zname (str, int) – Name of depth columns. zname allows to retrieve the depth column in a dataframe. If integer is passed, it assumes the index of the dataframe fits the depth column. Integer value must not be out the dataframe size along axis 1. Commonly `zname`needs to be supplied when a dataframe is passed to a function argument.

  • kname (str, int) –

    Name of permeability coefficient columns. kname allows to retrieve the

    permeability coefficient ‘k’ in a specific dataframe. If integer is passed, it assumes the index of the dataframe fits the ‘k’ columns. Note that integer value must not be out the dataframe size along axis 1. Commonly

    kname needs to be supplied when a dataframe is passed as a positional

    or keyword argument.

  • z (array-like 1d, pandas.Series) – Array of depth or a pandas series that contains the depth values. Two dimensional array or more is not allowed. However when z is given as a dataframe and zname is not supplied, an error raises since zname is used to fetch and overwritten z from the dataframe.

  • return_index (bool, default =False ,) – Returns the positions (indexes) of the upper and lower sections of the each aquifer found in each dataframe.

  • error (str, default='ignore') – Raise errors if trouble occurs when computing the section of each aquifer. If ‘ignore’, a UserWarning is displayed if invalid data is found. Any other value of error will set error to raise.

  • return_data (bool, default=False,) – Return valid data. It is usefull when ‘error’ is set to ‘ignore’ to collect the valid data.

  • kws (dict,) – Additional keywords arguments passed to get_aquifer_sections().

Returns:

up, low

  • (upix, lowix ): Tuple of indexes of lower and upper sections

  • (up, low): Tuple of aquifer sections (upper and lower)

  • (upix, lowix), (up, low)positions and sections values of aquifers

    if return_index and return_sections` are True.

Return type:

list of upper and lower section values of aquifer.

See also

watex.utils.hydroutils.get_aquifer_sections

compute multiples aquifer sections

Example

>>> from watex.datasets import load_hlogs
>>> from watex.utils.hydroutils import get_aquifer_sections
>>> data = load_hlogs ().frame
>>> get_aquifer_sections (data, data , zname ='depth', kname ='k' )
... [[197.12, 369.71], [197.12, 369.71]]
>>> get_aquifer_sections (data, data , zname ='depth', kname ='k' ,
                           return_index =True )
...  [[16, 29], [16, 29]]
watex.utils.hydroutils.get_compressed_vector(d, /, sname, stratum=None, strategy='average', as_frame=False, random_state=None)[source]#

Compresses base stratum data into a singular vector composed of all feature names in the targetted data d.

Parameters:
  • d (pandas DataFrame) – Valid data containing the strata. If dataframe is passed, ‘sname’ is needed to fetch strata values.

  • sname (str, optional) – Name of column in the dataframe that contains the strata values. Dont confuse ‘sname’ with ‘stratum’ which is the name of the valid layer/rock in the array/Series of strata.

  • stratum (str, optional) – Name of the base stratum. Must be self contain as an item of the strata data. Note that if stratum is passed, the auto-detection of base stratum is not triggered. It returns the same stratum , however it can gives the rate and occurence of this stratum if return_rate or return_counts is set to True.

  • strategy (str , default='average' or 'mean',) – strategy used to select or compute the numerical data into a singular series. It can be [‘naive’]. In that case , a single serie if randomly picked up into the base strata data.

  • as_frame (bool, default='False') – Returns compressed vector into a dataframe rather that keeping in series.

  • random_state (int, optional,) – State for randomly selected a compressed vector when naive is passed as strategy.

Returns:

ms – returns a compressed vector in pandas series compose of all features. Note , the vector here does not refer as math vector compose of numerical values only. A compressed vector here is a series that is the result of averaging the numerical features of the base stratum and incluing its corresponding categorical values. Note there, the ms can contain categorical values and has the same number and features as the original frame d.

Return type:

pandas series/dataframe

Example

>>> from watex.datasets import load_hlogs
>>> from watex.utils.hydroutils import get_compressed_vector
>>> data = load_hlogs().frame # get only the frame
>>> get_compressed_vector (data, sname='strata_name')[:4]
... hole_number           H502
    strata_name      siltstone
    aquifer_group           II
    pumping_level       ZFSAII
    dtype: object
>>> get_compressed_vector (data, sname='strata_name', as_frame=True )
...   hole_number strata_name aquifer_group  ...        r     rp remark
    0        H502   siltstone            II  ...  41.7075  59.23    NaN
    [1 rows x 23 columns]
>>> get_compressed_vector (data, sname='strata_name', strategy='naive')
... hole_number          H502
    depth_top          379.15
    depth_bottom        379.7
    strata_name     siltstone
    Name: 39, dtype: object
watex.utils.hydroutils.get_sections_from_depth(z, z_range, return_index=False)[source]#

Gets aquifer sections (‘upper’, ‘lower’) in data ‘z’ from the depth range.

This might be usefull to compute the thickness of the aquifer.

Parameters:
  • z (array-like 1d or pd.Series) – Array or pandas series contaning the depth values

  • z_range (tuple (float),) –

    Section [‘upper’, ‘lower’] of the aquifer at differnt depth. The range of the depth must a pair values and could not be

    greater than the maximum depth of the well.

  • return_index (bool, default=False) – returns the indices of the sections [‘upper’, ‘lower’] of the aquifer and non-valid sections too.

Returns:

  • sections (Tuple (float, float)) – Real values of the upper and lower sections of the aquifer.

  • If return_index is ‘True’, function returns –

    (upix, lowix): Tuple (int, int )

    indices of upper and lower sections in the depth array z

    (invix): list of Tuple (int, int)

    list of indices of invalid sections

Example

>>> from watex.datasets import load_hlogs
>>> from watex.utils.hydroutils import get_sections_from_depth
>>> data= load_hlogs().frame
>>> # get real sections from depth 16.25 to 125.83 m
>>> get_sections_from_depth ( data.depth_top, ( 16.25, 125.83))
...  (22.46, 128.23)
>>> # aquifer depth from 16.25 m to the end
>>> get_sections_from_depth ( data.depth_top, ( 16.25,))
... (22.46, 693.37)
>>> get_sections_from_depth ( data.depth_top, ( 16.25, 125.83),
                             return_index =True )
... ((3, 11), [(0, 3), (11, 180)])
>>> get_sections_from_depth ( data.depth_top, ( 16.25,),
                             return_index =True )
... ((3, 181), [(0, 3)])
watex.utils.hydroutils.get_unique_section(*data, zname, kname, return_index=False, return_data=False, error='raise', **kws)[source]#

Get the section to consider unique in multiple aquifers.

The unique section ‘upper’ and ‘lower’ is the valid range of the whole sections of each aquifers. It is considered as the main valid section from which data can not be compressed and not altered. For instance, the use of indexes is necessary to shrunk the data except this valid section. Mosly the data from the section is considered the valid data as the predictor Xr. Out of the range of aquifers ection, data can be discarded or compressed to top Xr.

Returns valid section indexes if ‘return_index’ is set to True.

Parameters:
  • d (list of pandas dataframe) – Data that contains mainly the aquifer values. It needs to specify the name of the depth column zname as well as the name of permeabiliy kname column.

  • zname (str, int) – Name of depth columns. zname allows to retrieve the depth column in a dataframe. If integer is passed, it assumes the index of the dataframe fits the depth column. Integer value must not be out the dataframe size along axis 1. Commonly `zname`needs to be supplied when a dataframe is passed to a function argument.

  • kname (str, int) –

    Name of permeability coefficient columns. kname allows to retrieve the

    permeability coefficient ‘k’ in a specific dataframe. If integer is passed, it assumes the index of the dataframe fits the ‘k’ columns. Note that integer value must not be out the dataframe size along axis 1. Commonly

    kname needs to be supplied when a dataframe is passed as a positional

    or keyword argument.

  • z (array-like 1d, pandas.Series) – Array of depth or a pandas series that contains the depth values. Two dimensional array or more is not allowed. However when z is given as a dataframe and zname is not supplied, an error raises since zname is used to fetch and overwritten z from the dataframe.

  • return_index (bool, default =False ,) – Returns the positions (indexes) of the upper and lower sections of the shallower and deep aquifers found in the whole dataframes.

  • return_data (bool, default=False,) – Return valid data. It is usefull when ‘error’ is set to ‘ignore’ to collect the valid data.

  • error (str, default='raise') – Raise errors if trouble occurs when computing the section of each aquifer. If ‘ignore’, a UserWarning is displayed when invalid data is found. Any other value of error will set error to raise.

  • kws (dict,) – Additional keywords arguments passed to get_aquifer_sections().

Returns:

up, low

  • (upix, lowix ): Tuple of indexes of lower and upper sections

  • (up, low): Tuple of aquifer sections (upper and lower)

  • (upix, lowix), (up, low)positions and sections values of aquifers

    if return_index and return_sections` are True.

Return type:

list of upper and lower section values of aquifer.

See also

watex.utils.hydroutils.get_aquifer_section

compute single section

watex.utils.hydroutils.get_aquifer_sections

compute multiple sections

Example

>>> from watex.datasets import load_hlogs
>>> data = load_hlogs ().frame
>>> get_unique_section (data.copy() , zname ='depth', kname ='k', )
... array([197.12, 369.71], dtype=float32)
>>> get_unique_sections (data.copy() , zname ='depth', kname ='k',
                                return_index =True)
... array([16, 29])
watex.utils.hydroutils.get_xs_xr_splits(data, /, z_range=None, zname=None, section_indexes=None)[source]#

Split data into matrix \(X_s\) with sample \(ms\) (unwanted data ) and \(X_r\) of samples :math:`m_r`( valid aquifer data )

Parameters:
  • data (pandas dataframe) – Dataframe for compressing.

  • zname (str,int ,) – the name of depth column. ‘name’ needs to be supplied when section_indexes is not provided.

  • z_range (tuple (float),) – Section [‘upper’, ‘lower’] of the aquifer at different depth. The range of the depth must a pair values and could not be greater than the maximum depth of the well.

  • section_indexes (tuple or list of int) – list of a pair tuple or list of integers. It is be the the valid sections( upper and lower ) indexes of of the aquifer. If the depth range z_range and zname are supplied, section_indexes can be None. Note that the last indix is considered as the last position, the bottom of the section therefore, its value is included in the data.

Returns:

  • - xs (list of pandas dataframe) –

    • shrinking part of data for compressing. Note that it is on list

    because if dataframe corresponds to the non-valid dataframe sections.

  • - xr (pandas dataframe) –

    • valid data reflecting to the aquifer part or including the

    aquifer data.

Example

>>> from watex.datasets import load_hlogs
>>> from watex.utils.hydroutils import get_xs_xr_splits
>>> data = load_hlogs ().frame
>>> xs, xr = get_xs_xr_splits (data, 3.11, section_indexes = (17, 20 ) )
watex.utils.hydroutils.is_valid_depth(z, /, zname=None, return_z=False)[source]#

Assert whether depth is valid in dataframe of two-dimensional array passed to z argument.

Parameters:
  • z (ndarray, pandas series or dataframe) – If Dataframe is given, ‘zname’ must be supplied to fetch or assert the depth existence of the depth in z.

  • zname (str,int ,) – the name of depth column. ‘name’ needs to be supplied when z is given whereas index is needed when z is an ndarray with two dimensional.

  • return_X_z (bool, default =False) – returns z series or array if set to True.

Returns:

z0, is_z – An array-like 1d of z or ‘True/False’ whether z exists or not.

Return type:

array /bool,

Example

>>> from watex.datasets import load_hlogs
>>> from watex.utils.hydroutils import is_valid_depth
>>> d= load_hlogs ()
>>> X= d.frame
>>> is_valid_depth(X, zname='depth') # is dataframe , need to pass 'zname'
... True
>>> is_valid_depth (X, zname = 'depth', return_z = True)
... 0        0.00
    1        2.30
    2        8.24
    3       22.46
    4       44.76

176 674.02 177 680.18 178 681.68 179 692.97 180 693.37 Name: depth_top, Length: 181, dtype: float64

watex.utils.hydroutils.label_importance(label, arr_k, arr_aq, *, method='naive')[source]#
Compute the score for the label and its representativity in the valid

array ‘arr_k’

Parameters:
label: int, or string

class label from the true labels array of permeability coefficient ‘k’. If string, be sure to convert the array to hold the dtype str. It is recommnended to provide data with no NaN to have full control the occurence results.

arr_k: array-like 1d

True labels of array containing the permeability coefficient ‘k’.

arr_aq: array_like 1d
True labels of the groups of aquifers or predicted naive group of

aquifer (NGA labels). See predict_NGA_labels().

method: str [‘naive’, ‘strict’], default=’naive’

The kind of strategy to compute the representativity of a label in the predicted array ‘array_aq’. It can also be ‘strict’. Indeed:

  • ‘naive’ computes the importance of the label by the number of its

    occurence for this specific label in the array ‘k’. It does not take into account of the occurence of other existing labels. This is usefull for unbalanced class labels in ‘arr_k’

  • ‘strict’ computes the importance of the label by the number of

    occurence in the whole valid ‘arr_k’ i.e. under the total of occurence of all the labels that exist in the whole ‘arra_aq’. This can give a suitable anaylse results if the data is not unbalanced for each labels in ‘arr_k’.

Returns:
label_dict_group_rate: dict,

Dictionnary of the label and its rate of occurence in the arr_aq. Thus each group in arr_aq has its rate of representativity of the label in arr_k.

‘ , r)
label k = 1 :
{‘V’: 0.17, ‘IV’: 0.141, ‘II’: 0.126, ‘III’: 0.084, ‘IV&V’: 0.005,

‘II&III’: 0.003, ‘III&IV’: 0.003}

label k = 2 :

{‘III’: 0.052, ‘II’: 0.05, ‘V’: 0.05, ‘IV’: 0.034, ‘III&IV’: 0.005}

label k = 3 :

{‘V’: 0.123, ‘IV’: 0.086, ‘III’: 0.068}

>>> # **comments:
    # label k=1 is 17% importance for group V, 12.3% for group II whereas
    # label k=2 has a weak rate in the whole dataset ~=0.19% for all groups
    # the most dominate labels are k=1 and k=3 with 53.14% and 27.74 %
    # respectively in the dataset.
    # If threshold of representativity is set to 50% , none of the true
    # label k will fit any aquifer group since the max representativity
    # score is 17% and is for the group V especially for k=1.
watex.utils.hydroutils.make_MXS_labels(y_true, y_pred, threshold=None, similar_labels=None, sep=None, prefix=None, method='naive', trailer='*', return_obj=False, **kws)[source]#

Create a Mixture Learning Strategy (MXS) labels from true labels ‘y_true’ and the predicted Naive Group of Aquifer (NGA) labels ‘y_pred’

Parameters:
  • y_true (array-like 1d, pandas.Series) – Array composed of valid k-values and possible missing k-values.

  • y_pred (Array-like 1d, pandas.Series) – Array composing the valid NGA labels. Note that NGA labels is a predicted labels mostly using the unsupervising learning.

  • threshold (float, default=None) – The threshold from which, label in ‘y_true’ can be considered similar than the one in NGA labels ‘y_pred’. The default is ‘None’ which means none rule is considered and the high preponderence or occurence in the data compared to other labels is considered as the most representative and similar. Setting the rule instead by fixing the threshold is recommended especially in a huge dataset.

  • similar_labels (list of tuple, optional) – list of tuple in pair (label and similar group). If given, the similar group must be the label existing in the predicted NGA. If None, the auto-similarity is triggered.

  • sep (str, default'') –

    Separator between the true labels ‘y_true’ and predicted NGA labels. Sep is used to rewrite the MXS labels. Mostly the MXS labels is a combinaison with the true label of permeability coefficient ‘k’ and the label of NGA to compose new similarity labels. For instance

    >>> true_labels=['k1', 'k2', 'k3'] ; NGA_labels =['II', 'I', 'UV']
    >>> # gives
    >>> MXS_labels= ['k1_II', 'k2_I', 'k3_UV']
    

    where the seperator sep is set to _. This happens especially when one of the label (NGA or true_labels) is not a numeric datatype and a similariy is found between ‘k1’ and ‘II’, ‘k2’ and ‘I’ and so on.

  • prefix (str, default='') –

    prefix is used to rename the true_labels i.e the true valid-k. For instance:

    >>> k_valid =[1, 2, ..] -> k_new = [k1, k2, ...]
    

    where ‘k’ is the prefix.

  • method (str ['naive', 'strict'], default='naive') –

    The kind of strategy to compute the representativity of a label in the predicted array ‘y_pred’. It can also be ‘strict’. Indeed:

    • naive computes the importance of the label by the number of its

      occurence for this specific label in the array ‘y_true’. It does not take into account of the occurence of other existing labels. This is usefull for unbalanced class labels in y_true.

    • strict computes the importance of the label by the number of

      occurence in the whole valid y_true i.e. under the total of occurence of all the labels that exist in the whole ‘arra_aq’. This can give a suitable anaylse results if the data is not unbalanced for each labels in y_pred.

  • trailer (str, default='*') –

    The Mixture strategy marker to differentiate the existing class label in ‘y_true’ with the predicted labels ‘y_pred’ especially when the the same class labels are also present the true label with the same label-identifier name. This usefull to avoid any confusion for both labels in y_true and y_pred for better demarcation and distinction. Note that if the trailer`is set to ``None` and both y_true and y_pred are numeric data, the labels in y_pred are systematically renamed to be distinct with the ones in the ‘y_true’. For instance

    >>> true_labels=[1, 2, 3] ; NGA_labels =[0, 1, 2]
    >>> # with trailer , MXS labels should be
    >>>  MXS_labels= ['0', '1*', '2*', '3'] # 1 and 2 are in true_labels
    >>> # with no trailer
    >>> MXS_labels= [0, 4, 5, 3] # 1 and 2 have been changed to [4, 5]
    

  • return_obj (watex.utils.box.Boxspace) –

    If True, returns a MXS object with usefull attributes such as:
    • mxs_classes_ = the MXS class labels

    • mxs_labels_= the array-like of MXS labels. It also includes some

      non similar labels from NGA

    mxs_map_classes_= a dict or original class labels of the array

    ’k’ <’y_true’> and their temporary integer class labels. Indeed, if ‘y_true’ class labels are not a numeric dtype, New labels with integer dtype is created. The dict is used to wrap the true labels (original ones) during the MXS creation. Thus, the original labels are not altered and will be map in turn at the end to recover their positions as well in new MXS array. It is set to ‘None’ if ‘y_true’ has a numeric dtype.

    mxs_group_classes_: dict of all the similar group labels with the

    MXS labels related from the modified existing groups of NGA. Note that the non-similar group are modified if their labels are also found in the true_labels to avoid any confusion. Thus the dict wrap the non-similar label with their new temporay labels.

    mxs_similar_groups_= list of the similar labels found in

    y_true that have a similarity in NGA.

    mxs_similarity_= Tuple of similarity in pair (label, group)

    existing between the label class in y_true and NGA.

    mxs_group_labels_= list of the similar groups found in the

    predicted NGA that have a similarity in true labels ‘y_true’

Returns:

MXS – array like of MXS labels or MXS object containing the usefull attributes.

Return type:

array-like 1d or Boxspace

See also

predict_NGA_labels

Predicts Naive group of Aquifers labels.

Examples

>>> from watex.datasets import load_hlogs
>>> from watex.utils import read_data
>>> from watex.utils.hydroutils import classify_k, make_MXS_labels
>>> data = load_hlogs ().frame
>>> # map data.k to categorize k values
>>> ymap = classify_k(data.k , default_func =True)
>>> y_mxs = make_MXS_labels (ymap, data.aquifer_group)
>>> y_mxs[14:24]
...  array(['I', 'I', 2, 2, 2, 2, 2, 2, 2, 2], dtype=object)
>>> mxs_obj = make_MXS_labels (ymap, data.aquifer_group, return_obj=True )
>>> mxs_obj.mxs_labels_[14: 24]
... array(['I', 'I', 2, 2, 2, 2, 2, 2, 2, 2], dtype=object)
>>> # now we did the same task using the private data 'hf.csv'
>>> # composed of 11 boreholes. For default we alternatively uses
>>> # the aquifer groups like a fake NGA
>>> data = read_data ('data/boreholes/hf.csv')
>>> ymap =  classify_k(data.k , default_func =True)
>>> y_mxs= make_MXS_labels (ymap, data.aquifer_group)
>>> np.unique (y_mxs)
... array(['1', '1V', '2', '2III', '3', 'I', 'II', 'III&IV', 'IV'],
      dtype='<U6')
>>> # *comments:
    # label '1V' means the group V (expected to be a cluster)
    # and label 1 (true labels) have a similarity
    # the same of label '2III' while the remain label 3 does not
    #  any similarity in the other labels  in the 'y_pred' expected
    # to be NGA labels.
watex.utils.hydroutils.predict_NGA_labels(X, /, n_clusters, random_state=0, keep_label_0=False, return_cluster_centers=False, **kws)[source]#

Predict the Naive Group of Aquifer (NGA) labels.

Parameters:
  • X ({array-like, sparse matrix} of shape (n_samples, n_features)) – Training instances to cluster. It must be noted that the data will be converted to C ordering, which will cause a memory copy if the given data is not C-contiguous. If a sparse matrix is passed, a copy will be made if it’s not in CSR format.

  • n_clusters (int, default=8) – The number of clusters to form as well as the number of centroids to generate.

  • random_state (int, RandomState instance or None, default=42) – Determines random number generation for centroid initialization. Use an int to make the randomness deterministic.

  • keep_label_0 (bool, default=False) – The prediction already include the label 0. However, including 0 in the predicted label refers to ‘k=0’ i.e. no permeability coefficient equals to 0, which is not True in principle, because all rocks have a permeability coefficient ‘k’. Here we considered ‘k=0’ as an undefined permeability coefficient. Therefore, ‘0’ , can be exclude since, it can also considered as a missing ‘k’-value. If predicted ‘0’ is in the target it should mean a missing ‘k’-value rather than being a concrete label. Therefore, to avoid any confusion, ‘0’ is altered to ‘1’ so the value +1 is used to move forward all class labels thereby excluding the ‘0’ label. To force include 0 in the label, set keep_label_0 to True.

  • return_cluster_centers (bool, default=False,) – export the array of clusters centers if True.

  • kws (dict,) – Additional keyword arguments passed to sklearn.clusters.KMeans.

Returns:

  • NGA (array_like of shape (n_samples, n_features)) – Predicted NGA labels.

  • ( NGA , cluster_centers) (Tuple of array-like,) – MGA and clusters centers if return_cluster_centers` is set to ``True.

watex.utils.hydroutils.reduce_samples(*data, sname, zname=None, kname=None, section_indexes=None, error='raise', strategy='average', verify_integrity=False, ignore_index=False, **kws)[source]#

Create a new dataframe by squeezing/compressing the non valid data.

The m-samples reduction is necessary for the dataset with a lot of missing k-values. The technique of shrinking the number of k0 –values (k-missing values ) seems a relevant idea. It consists to compressed the values of the missing \(k -values from the top ( depth equals 0 ) thin the upper section of the first aquifer with lower depth into a single vector :math:`x_r\) with dimension (1×n ) i.e. contains the n-features.

Parameters:
  • data (list of dataframes) – Data that contains mainly the aquifer values. It must contains the depth values refering at the column_name passed at zname and the permeability coefficient k passed to kname . Both argument need t supplied when datafame as passes as positional arguments.

  • sname (str, optional) – Name of column in the dataframe that contains the strata values. Dont confuse ‘sname’ with ‘stratum’ which is the name of the valid layer/rock in the array/Series of strata.

  • zname (str, int) – Name of depth columns. zname allows to retrieve the depth column in a dataframe. If integer is passed, it assumes the index of the dataframe fits the depth column. Integer value must not be out the dataframe size along axis 1. Commonly `zname`needs to be supplied when a dataframe is passed to a function argument.

  • kname (str, int) –

    Name of permeability coefficient columns. kname allows to retrieve the

    permeability coefficient ‘k’ in a specific dataframe. If integer is passed, it assumes the index of the dataframe fits the ‘k’ columns. Note that integer value must not be out the dataframe size along axis 1. Commonly

    kname needs to be supplied when a dataframe is passed as a positional

    or keyword argument.

  • z (array-like 1d, pandas.Series) – Array of depth or a pandas series that contains the depth values. Two dimensional array or more is not allowed. However when z is given as a dataframe and zname is not supplied, an error raises since zname is used to fetch and overwritten z from the dataframe.

  • strategy (str , default='average' or 'mean',) – strategy used to select or compute the numerical data into a singular series. It can be [‘naive’]. In that case , a single serie if randomly picked up into the base strata data.

  • section_indexes (tuple or list of int) – list of a pair tuple or list of integers. It is be the the valid sections( upper and lower ) indexes of of the aquifer. If the depth range z_range and zname are supplied, section_indexes can be None. Note that the last indix is considered as the last position, the bottom of the section therefore, its value is included in the data.

  • error (str, default='raise') – Raise errors if trouble occurs when computing the section of each aquifer. If ‘ignore’, a UserWarning is displayed when invalid data is found. Any other value of error will set error to raise.

  • verify_integrity (bool, default=False) –

    Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method. if ‘True’, remove the duplicate rows from a DataFrame.

    subset: By default, if the rows have the same values in all the columns, they are considered duplicates. This parameter is used to specify the columns that only need to be considered for identifying duplicates. keep: Determines which duplicates (if any) to keep. It takes inputs as, first – Drop duplicates except for the first occurrence. This is the default behavior. last – Drop duplicates except for the last occurrence. False – Drop all duplicates. inplace: It is used to specify whether to return a new DataFrame or update an existing one. It is a boolean flag with default False.

  • ignore_index (bool, default=False,) – It is a boolean flag to indicate if row index should be reset after dropping duplicate rows. False: It keeps the original row index. True: It reset the index, and the resulting rows will be labeled 0, 1, …, n – 1.

Returns:

df_new – new dataframes with reducing samples.

Return type:

List of pandas.dataframes

Example

>>> from watex.datasets import load_hlogs
>>> from watex.utils.hydroutils import reduce_samples
>>> data = load_hlogs ().frame # get the frames
>>> # add explicitly the aquifer section indices
>>> dfnew= reduce_samples (data.copy(), sname='strata_name',
                             section_indexes = (16, 29 ),)
>>> dfnew[0]
...    hole_number               strata_name     rock_name  ...      r     rp  remark
    0         H502                  mudstone           J2z  ...    NaN    NaN     NaN
    16        H502                 siltstone           NaN  ...  35.74  59.23     NaN
    17        H502    fine-grained sandstone           NaN  ...  35.74  59.23     NaN
    18        H502                 siltstone           NaN  ...  35.74  59.23     NaN
    19        H502    fine-grained sandstone           NaN  ...  35.74  59.23     NaN
    20        H502                  mudstone           NaN  ...  35.74  59.23     NaN
    21        H502                 siltstone           NaN  ...  35.74  59.23     NaN
    22        H502    fine-grained sandstone           NaN  ...  59.61  59.23     NaN
    23        H502                 siltstone           NaN  ...  59.61  59.23     NaN
    24        H502    fine-grained sandstone           NaN  ...  59.61  59.23     NaN
    25        H502  Coarse-grained sandstone           NaN  ...  59.61  59.23     NaN
    26        H502                  mudstone           NaN  ...  82.33  59.23     NaN
    27        H502    fine-grained sandstone           NaN  ...  82.33  59.23     NaN
    28        H502  Coarse-grained sandstone           J2z  ...  82.33  59.23     NaN
    29        H502                      coal  (J2y)  2coal  ...  82.33  59.23     NaN
    0         H502                 siltstone           NaN  ...    NaN    NaN     NaN

[16 rows x 23 columns] >>> # specify the column name and kname without section indexes >>> dfnew= reduce_samples (

data.copy(), sname=’strata_name’, data, zname=’depth’, kname=’k’, ignore_index= True )[0]

… dfnew[0].index # index is reset … RangeIndex(start=0, stop=16, step=1)

watex.utils.hydroutils.rename_labels_in(arr, new_names, coerce=False)[source]#

Rename label by a new names

Parameters:
  • arr – arr: array-like |pandas.Series array or series containing numerical values. If a non-numerical values is given , an errors will raises.

  • new_names – list of str; list of string or values to replace the label integer identifier.

  • coerce – bool, default =False, force the ‘new_names’ to appear in the target including or not some integer identifier class label. coerce is True, the target array hold the dtype of new_array; coercing the label names will not yield error. Consequently can introduce an unexpected results.

Returns:

array-like, An array-like with full new label names.

watex.utils.hydroutils.select_base_stratum(d, /, sname=None, stratum=None, return_rate=False, return_counts=False)[source]#

Selects base stratum from the the strata column in the logging data.

Find the most recurrent stratum in the data and compute the rate of occurrence.

Parameters:
  • d (array-like 1D , pandas.Series or DataFrame) – Valid data containing the strata. If dataframe is passed, ‘sname’ is needed to fetch strata values.

  • sname (str, optional) – Name of column in the dataframe that contains the strata values. Dont confuse ‘sname’ with ‘stratum’ which is the name of the valid layer/rock in the array/Series of strata.

  • stratum (str, optional) – Name of the base stratum. Must be self contain as an item of the strata data. Note that if stratum is passed, the auto-detection of base stratum is not triggered. It returns the same stratum , however it can gives the rate and occurence of this stratum if return_rate or return_counts is set to True.

  • return_rate (bool,default=False,) – Returns the rate of occurence of the base stratum in the data.

  • return_counts (bool, default=False,) – Returns each stratum name and the occurences (count) in the data.

Returns:

  • bs (str) – base stratum , self contain in the data

  • r (float) – rate of occurence in base stratum in the data

  • c (tuple (str, int)) – Tuple of each stratum whith their occurrence in the data.

Example

>>> from watex.datasets import load_hlogs
>>> from watex.utils.hydroutils import select_base_stratum
>>> data = load_hlogs().frame # get only the frame
>>> select_base_stratum(data, sname ='strata_name')
... 'siltstone'
>>> select_base_stratum(data, sname ='strata_name', return_rate =True)
... 0.287292817679558
>>> select_base_stratum(data, sname ='strata_name', return_counts=True)
... [('siltstone', 52),
     ('fine-grained sandstone', 40),
     ('mudstone', 37),
     ('coal', 24),
     ('Coarse-grained sandstone', 15),
     ('carbonaceous mudstone', 9),
     ('medium-grained sandstone', 2),
     ('topsoil', 1),
     ('gravel layer', 1)]
watex.utils.hydroutils.transmissibility(s, d, time)[source]#

Transmissibility T represents the ability of aquifer’s water conductivity.

It is the numeric equivalent of the product of hydraulic conductivity times aquifer’s thickness (T = KM), which means it is the seepage flow under the condition of unit hydraulic gradient, unit time, and unit width

watex.utils.hydroutils.validate_labels(t, /, labels, return_bool=False)[source]#

Assert the validity of the label in the target and return the label or the boolean whether all items of label are in the target.

Parameters:
  • t – array-like, target that is expected to contain the labels.

  • labels – int, str or list of (str or int) that is supposed to be in the target t.

  • return_bool – bool, default=False; returns ‘True’ or ‘False’ rather the labels if set to True.

Returns:

bool or labels; ‘True’ or ‘False’ if return_bool is set to True and labels otherwise.

Example:

>>> from watex.datasets import fetch_data
>>> from watex.utils.mlutils import cattarget, labels_validator
>>> _, y = fetch_data ('bagoue', return_X_y=True, as_frame=True)
>>> # binarize target y into [0 , 1]
>>> ybin = cattarget(y, labels=2 )
>>> validate_labels (ybin, [0, 1])
... [0, 1] # all labels exist.
>>> validate_labels (y, [0, 1, 3])
... ValueError: Value '3' is missing in the target.
>>> validate_labels (ybin, 0 )
... [0]
>>> validate_labels (ybin, [0, 5], return_bool=True ) # no raise error
... False