<no title> — watex 0.3.3 documentation

hydro computes Hydrogeological parameters of aquifer that are the essential and crucial basic data in the designing and construction progress of geotechnical engineering and groundwater dewatering.

class watex.methods.hydro.AqGroup(kname=None, aqname=None, method='naive', keep_label_0=False, **kws)[source]#

Bases: HData

Group of Aquifer is mostly related to area information after multiple boreholes collected.

However when predicted ‘k’ with a missing k-values using the Mixture Learning Strategy (MXS), we intend to solve this problem by creating a Naive Group of Aquifer (NGA) to compensate the missing k-values in the dataset. This could be a good idea to avoid introducing a lot of bias since the group of aquifer is mostly tied to the permeability coefficient ‘k’. To do this, an unsupervised learning is used to predict the NGA labels then the NGA labels are used in turn to fill the missing k-values. The best strategy for operting this trick is to seek for some importances between the true k-values with their corresponding aquifer groups at each depth, and find the most representative group. Once the most representative group is found for each true label ‘k’, the group of aquifer can be renamed as the naive similarity with the true k-label. For instance if true k-value is the label 1 and label 1 is most representative with the group of aquifer ‘IV’, therefore this group can be replaced throughout the column with ‘k1’+’IV=> i.e. ‘k14’. This becomes a new label created and is used to fill the true label ‘y_true’ to become a MXS target ( include NGA label). Note that the true label with valid ‘k-value’ remained intact and unchanged. The same process is done for label 2, 3 and so on. The selection of MXS label from NGA strongly depends on its preponderance or importance rate in the whole dataset.

The following example is the demonstration to how to compute the group representativity in datasets.

Parameters:

kname (str, int) –

Name of permeability coefficient columns. kname allows to retrieve the
permeability coefficient ‘k’ in a specific dataframe. If integer is passed, it assumes the index of the dataframe fits the ‘k’ columns. Note that integer value must not be out the dataframe size along axis 1. Commonly

kname needs to be supplied when a dataframe is passed as a positional
or keyword argument.
aqname (str, optional,) –

Name of aquifer group column. aqname allows to retrieve the
aquifer group arr_aq value in a specific dataframe. Commonly

aqname needs to be supplied when a dataframe is passed as a positional
or keyword argument. Note that it is not mandatory to have a group of aquifer in the log data. It is needed only if the label similarity needs to be calculated.
g (dict,) – Dictionnary compose of occurence between the true labels and the group of aquifer as a function of occurence and repesentativity

Example

>>> from watex.methods.hydro import AqGroup
>>> hg = AqGroup (kname ='k', aqname='aquifer_group').fit(hdata )
>>> hg.findGroups ()
Out[25]:
 _Group(Label=[' 0 ',
                   Preponderance( rate = ' 100.0  %',
                                [('Groups', {'II': 1.0}),
                                 ('Representativity', ( 'II', 1.0)),
                                 ('Similarity', 'II')])],
             )

findGroups(method='naive', default_arr=None, **g_kws)[source]#

Find the existing group between the permeability coefficient k and the group of aquifer.

It computes the occurence between the true labels and the group of aquifer as a function of occurence and repesentativity.

Parameters:

keep_label_0 (bool, default=False) – The prediction already include the label 0. However, including 0 in the predicted label refers to ‘k=0’ i.e. no permeability coefficient equals to 0, which is not True in principle, because all rocks have a permeability coefficient ‘k’. Here we considered ‘k=0’ as an undefined permeability coefficient. Therefore, ‘0’ , can be exclude since, it can also considered as a missing ‘k’-value. If predicted ‘0’ is in the target it should mean a missing ‘k’-value rather than being a concrete label. Therefore, to avoid any confusion, ‘0’ is altered to ‘1’ so the value +1 is used to move forward all class labels thereby excluding the ‘0’ label. To force include 0 in the label, set keep_label_0 to True.
method (str ['naive', 'strict'], default='naive') –
The kind of strategy to compute the representativity of a label in the predicted array ‘y_pred’. It can also be ‘strict’. Indeed:
- naive computes the importance of the label by the number of its
  occurence for this specific label in the array ‘y_true’. It does not take into account of the occurence of other existing labels. This is usefull for unbalanced class labels in y_true.
- strict computes the importance of the label by the number of
  occurence in the whole valid y_true i.e. under the total of occurence of all the labels that exist in the whole ‘arra_aq’. This can give a suitable anaylse results if the data is not unbalanced for each labels in y_pred.

Returns:

g – Use attribute .groups to find the group values.

Return type:

_Group: _Group class object

class watex.methods.hydro.AqSection(aqname=None, kname=None, zname=None, **kws)[source]#

Bases: HData

Aquifer section class

Get the section of each aquifer from dataframe.

The unique section ‘upper’ and ‘lower’ is the valid range of the whole data to consider as a valid data. Indeed, the aquifer section computing is necessary to shrunk the data of the whole boreholes. Mosly the data from the section is consided the valid data as the predictor Xr. Out of the range of aquifers ection, data can be discarded or compressed to top Xr.

Parameters:

aqname (str, optional,) –

Name of aquifer group column. aqname allows to retrieve the
aquifer group arr_aq value in a specific dataframe. Commonly

aqname needs to be supplied when a dataframe is passed as a positional
or keyword argument. Note that it is not mandatory to have a group of aquifer in the log data. It is needed only if the label similarity needs to be calculated.
kname (str, int) –

Name of permeability coefficient columns. kname allows to retrieve the
permeability coefficient ‘k’ in a specific dataframe. If integer is passed, it assumes the index of the dataframe fits the ‘k’ columns. Note that integer value must not be out the dataframe size along axis 1. Commonly

kname needs to be supplied when a dataframe is passed as a positional
or keyword argument.
zname (str, int) – Name of depth columns. zname allows to retrieve the depth column in a dataframe. If integer is passed, it assumes the index of the dataframe fits the depth column. Integer value must not be out the dataframe size along axis 1. Commonly `zname`needs to be supplied when a dataframe is passed to a function argument.

findSection(z=None, depth_unit='m')[source]#

Find aquifer valid section (upper and lower section )

Parameters:: z (array-like 1d, pandas.Series) – Array of depth or a pandas series that contains the depth values. Two dimensional array or more is not allowed. However when z is given as a dataframe and zname is not supplied, an error raises since zname is used to fetch and overwritten z from the dataframe.
Returns:: self.section_ – valid upper and lower section in SI units (m) if depth values are given in meters.
Return type:: list of float

class watex.methods.hydro.Hydrogeology(**kwd)[source]#

Bases: ABC

A branch of geology concerned with the occurrence, use, and functions of surface water and groundwater.

Hydrogeology is the study of groundwater – it is sometimes referred to as geohydrology or groundwater hydrology. Hydrogeology deals with how water gets into the ground (recharge), how it flows in the subsurface (through aquifers) and how groundwater interacts with the surrounding soil and rock (the geology).

Indeed, hydrogeologists apply this knowledge to many practical uses. They might:

Design and construct water wells for drinking water supply, irrigation
schemes and other purposes;
Try to discover how much water is available to sustain water supplies
so that these do not adversely affect the environment – for example, by depleting natural baseflows to rivers and important wetland ecosystems;
Investigate the quality of the water to ensure that it is fit for its
intended use;
Where the groundwater is polluted, they design schemes to try and
clean up this pollution; Design construction dewatering schemes and deal with groundwater problems associated with mining; Help to harness geothermal energy through groundwater-based heat pumps.

class watex.methods.hydro.Logging(zname=None, kname=None, verbose=0)[source]#

Bases: object

Logging class

Only deal with numerical values. If categorical values are find in the logging dataset, they should be discarded.

Parameters:

zname (str, default='depth' or 'None') – The name of the depth column in data. If the name ‘depth’ is not specified as the main depth columns, an other name in the columns that matches the depth can also be indicated so the function will put aside this columm as depth column for plot purpose. If set to None, zname holds the name depth and assumes that depth exists in data columns.
kname (str, int) –

Name of permeability coefficient columns. kname allows to retrieve the
permeability coefficient ‘k’ in a specific dataframe. If integer is passed, it assumes the index of the dataframe fits the ‘k’ columns. Note that integer value must not be out the dataframe size along axis 1. Commonly

kname needs to be supplied when a dataframe is passed as a positional
or keyword argument.

Examples

>>> from watex.datasets import load_hlogs
>>> from watex.methods.hydro import Logging
>>> # get the logging data
>>> h = load_hlogs ()
>>> h.feature_names
Out[29]:
['hole_id',
 'depth_top',
 'depth_bottom',
 'strata_name',
 'rock_name',
 'layer_thickness',
 'resistivity',
 'gamma_gamma',
 'natural_gamma',
 'sp',
 'short_distance_gamma',
 'well_diameter']
>>> # we can fit to collect the valid logging data
>>> log= Logging(kname ='k', zname='depth_top' ).fit(h.frame[h.feature_names])
>>> log.feature_names_in_ # categorical features should be discarded.
Out[33]:
['depth_top',
 'depth_bottom',
 'layer_thickness',
 'resistivity',
 'gamma_gamma',
 'natural_gamma',
 'sp',
 'short_distance_gamma',
 'well_diameter']
>>> log.plot ()
Out[34]: Logging(zname= depth_top, kname= k, verbose= 0)
>>> # plot log including the target y
>>> log.plot (y = h.frame.k , posiy =0 )# first position
Logging(zname= depth_top, kname= k, verbose= 0)

fit(data, **fit_params)[source]#

Fit logging data and populate attributes

Parameters:

data (Dataframe of shape (n_samples, n_features)) – where n_samples is the number of data, expected to be the data collected at different depths and n_features is the number of columns (features) that supposed to be plot. Note that X must include the depth columns. If not given a relative depth should be created according to the number of samples that composes data.
fit_params (dict,) – Additional keyword arguments passed to to_numeric_dtypes().

Returns:

self

Return type:

object instanciated for chaining methods.

property inspect#: Inspect object whether is fitted or not

plot(normalize=False, impute_nan=True, log10=False, posiy=None, fill_value=None, **plot_kws)[source]#

Plot the logging data

Parameters:

normalize (bool, default = False) – Normalize all the data to be range between (0, 1) except the depth,
impute_nan (bool, default=True,) – Replace the NaN values in the dataframe. Note that the default behaviour for replacing NaN is the mean. However if the argument of fill_value is provided,the latter should be used to replace ‘NaN’ in X.
log10 (bool, default=False) – Convert values to log10. This can be usefull when using the logarithm data. However, it seems not all the data can be used this operation, for instance, a negative data. In that case, column_to_skip argument is usefull to provide so to skip that columns when converting values to log10.
fill_value (str or numerical value, optional) – When strategy == “constant”, fill_value is used to replace all occurrences of missing_values. If left to the default, fill_value will be 0 when imputing numerical data and “missing_value” for strings or object data types. If not given and impute_nan is True, the mean strategy is used instead.
posiy (int, optional) – the position to place the target plot y . By default the target plot if given is located at the last position behind the logging plots.

class watex.methods.hydro.MXS(kname=None, aqname=None, threshold=None, method='naive', trailer='*', keep_label_0=False, random_state=42, n_groups=3, sep=None, prefix=None, **kws)[source]#

Bases: HData

Mixture Learning Strategy (MXS)

The use of machine learning for k-parameter prediction seems an alternative way to reduce the cost of data collection thereby saving money. However, the borehole data comes with a lot of missing k since the parameter is strongly tied to the aquifer after the pumping test. In other words, the k-parameter collection is feasible if the layer in the well is an aquifer. Unfortunately, predicting some samples of k in a large set of missing data remains an issue using the classical supervised learning methods. We, therefore propose an alternative approach called a mixture learning strategy (MXS) to solve these double issues. It entails predicting upstream a naïve group of aquifers (NGA) combined with the real values k to counterbalance the missing values and yield an optimal prediction score. The method, first, implies the K-Means and Hierarchical Agglomerative Clustering (HAC) algorithms. K-Means and HAC are used for NGA label predicting necessary the MXS label merging.

Parameters:

kname (str, int) –

Name of permeability coefficient columns. kname allows to retrieve the
permeability coefficient ‘k’ in a specific dataframe. If integer is passed, it assumes the index of the dataframe fits the ‘k’ columns. Note that integer value must not be out the dataframe size along axis 1. Commonly

kname needs to be supplied when a dataframe is passed as a positional
or keyword argument.
aqname (str, optional,) –

Name of aquifer group column. aqname allows to retrieve the
aquifer group arr_aq value in a specific dataframe. Commonly

aqname needs to be supplied when a dataframe is passed as a positional
or keyword argument. Note that it is not mandatory to have a group of aquifer in the log data. It is needed only if the label similarity needs to be calculated.
threshold (float, default=None) – The threshold from which, label in ‘k’ array can be considered similar than the one in NGA labels ‘y_pred’. The default is ‘None’ which means none rule is considered and the high preponderence or occurence in the data compared to other labels is considered as the most representative and similar. Setting the rule instead by fixing the threshold is recommended especially in a huge dataset.
n_groups (int, default=3) – The number of aquifer n_groups to form as well as the number of centroids to generate. If a idea about the number of aquifer group in the areas, it should be used instead. Hiwever, it is recommended to validate this number using the ‘elbow plot’ or the ‘silhouette plot’ or the Hierachical Agglomerative Clustering dendrogram. Refer to plot_elbow() or plotSilhouette() or :func:~.watex.view.plotDendrogram` for plotting purpose.
keep_label_0 (bool, default=False) –
The prediction already include the label 0. However, including 0 in
the predicted label refers to ‘k=0’ i.e. no permeability coefficient equals to 0, which is not True in principle, because all rocks have a permeability coefficient ‘k’. Here we considered ‘k=0’ as an undefined permeability coefficient. Therefore, ‘0’ , can be exclude since, it can also considered as a missing ‘k’-value. If predicted ‘0’ is in the target it should mean a missing ‘k’-value rather than being a concrete label. Therefore, to avoid any confusion, ‘0’ is altered to ‘1’ so the value +1 is used to move forward all class labels thereby excluding the ‘0’ label. To force include 0 in the label, set keep_label_0 to True.

sep: str, default’’
Separator between the true labels ‘y_true’ and predicted NGA labels. Sep is used to rewrite the MXS labels. Mostly the MXS labels is a combinaison with the true label of permeability coefficient ‘k’ and the label of NGA to compose new similarity labels. For instance
>>> true_labels=['k1', 'k2', 'k3'] ; NGA_labels =['II', 'I', 'UV'] >>> # gives >>> MXS_labels= ['k1_II', 'k2_I', 'k3_UV']
where the seperator sep is set to _. This happens especially when one of the label (NGA or true_labels) is not a numeric datatype and a similariy is found between ‘k1’ and ‘II’, ‘k2’ and ‘I’ and so on.
prefix: str, default=’’
prefix is used to rename the true_labels i.e the true valid-k. For instance:
>>> k_valid =[1, 2, ..] -> k_new = [k1, k2, ...]
where ‘k’ is the prefix.
method: str [‘naive’, ‘strict’], default=’naive’
The kind of strategy to compute the representativity of a label in the predicted array ‘y_pred’. It can also be ‘strict’. Indeed:
- naive computes the importance of the label by the number of its
  occurence for this specific label in the array ‘y_true’. It does not take into account of the occurence of other existing labels. This is usefull for unbalanced class labels in y_true.
- strict computes the importance of the label by the number of
  occurence in the whole valid y_true i.e. under the total of occurence of all the labels that exist in the whole ‘arra_aq’. This can give a suitable anaylse results if the data is not unbalanced for each labels in y_pred.
trailer: str, default=’*’
The Mixture strategy marker to differentiate the existing class label in ‘y_true’ with the predicted labels ‘y_pred’ especially when the the same class labels are also present the true label with the same label-identifier name. This usefull to avoid any confusion for both labels in y_true and y_pred for better demarcation and distinction. Note that if the trailer`is set to ``None` and both y_true and y_pred are numeric data, the labels in y_pred are systematically renamed to be distinct with the ones in the ‘y_true’. For instance
>>> true_labels=[1, 2, 3] ; NGA_labels =[0, 1, 2] >>> # with trailer , MXS labels should be >>> MXS_labels= ['0', '1*', '2*', '3'] # 1 and 2 are in true_labels >>> # with no trailer >>> MXS_labels= [0, 4, 5, 3] # 1 and 2 have been changed to [4, 5]
verbose (int, default is 0) – Control the level of verbosity. Higher value lead to more messages.

Examples

>>> from watex.datasets import load_hlogs
>>> from watex.methods.hydro import MXS
>>> hdata= load_hlogs (as_frame =True)
>>> # drop the 'remark' columns since there is no valid data
>>> hdata.drop (columns ='remark', inplace =True)
>>> mxs = MXS (kname ='k').fit(hdata)
>>> # predict the default NGA
>>> mxs.predictNGA() # default prediction with n_groups =3
>>> # make MXS labels using the default 'k' categorization
>>> ymxs=mxs.makeyMXS(categorize_k=True, default_func=True)
>>> mxs.yNGA_ [62:74]
Out[43]: array([1, 2, 2, 2, 3, 1, 2, 1, 2, 2, 1, 2])
>>> ymxs[62:74]
Out[44]: array([ 1, 22, 22, 22,  3,  1, 22,  1, 22, 22,  1, 22])
>>> # to get the label similariry , need to provide the
>>> # the column name of aquifer group and fit again like
>>> mxs = MXS (kname ='k', aqname ='aquifer_group').fit(hdata)
>>> sim = mxs.labelSimilarity()
>>> sim
Out[47]: [(0, 'II')] # group II and label 0 are very similar

aqname = 'aquifer_group'#

kname = 'k'#

labelSimilarity(func=None, categorize_k=False, default_func=False, **sm_kws)[source]#

Find label similarities

Parameters:

func (callable) – Function to specifically map the permeability coefficient column in the dataframe of serie. If not given, the default function can be enabled instead from param default_func.
string (bool,) – If set to “True”, categorized map from ‘k’ should be prefixed by “k”. However is string value is given , the prefix is changed according to this label.
default_ufunc (bool,) –
Default function for mapping k is setting to True. Note that, this could probably not fitted your own data. So it is recommended to provide your own function for mapping ‘k’. However the default ‘k’ mapping is given as follow:
- k0 {0}: k = 0
- k1 {1}: 0 < k <= .01
- k2 {2}: .01 < k <= .07
- k3 {3}: k> .07
sm_kws (dict,) – Additional keyword arguments passed to find_similar_labels().

makeyMXS(y_pred=None, func=None, categorize_k=False, default_func=False, **mxs_kws)[source]#

Construct the MXS target \(y*\)

Parameters:

y_pred (Array-like 1d, pandas.Series) –
Array composing the valid NGA labels. Note that NGA labels is a predicted labels mostly using the unsupervising learning.

seealso:

predict_NGA_labels() for further details.
func (callable) – Function to specifically map the permeability coefficient column in the dataframe of serie. If not given, the default function can be enabled instead from param default_func.
string (bool,) – If set to “True”, categorized map from ‘k’ should be prefixed by “k”. However is string value is given , the prefix is changed according to this label.
default_ufunc (bool,) –
Default function for mapping k is setting to True. Note that, this
could probably not fitted your own data. So it is recommended to provide your own function for mapping ‘k’. However the default ‘k’ mapping is given as follow:
- k0 {0}: k = 0
- k1 {1}: 0 < k <= .01
- k2 {2}: .01 < k <= .07
- k3 {3}: k> .07
mxs_kws:dict,
Additional keyword arguments passed to make_MXS_labels().

Returns:

MXS.mxs_labels_ – array like of MXS labels

Return type:

array-like 1d `

Example

>>> from watex.datasets import load_hlogs
>>> from watex.methods.hydro import MXS
>>> hdata = load_hlogs ().frame
>>> # drop the 'remark' columns since there is no valid data
>>> hdata.drop (columns ='remark', inplace=True)
>>> mxs =MXS (kname ='k').fit(hdata) # specify the 'k'columns
>>> # we can predict the NGA labels and yMXS with single line
>>> # of code snippet using the default 'k' classification.
>>> ymxs = mxs.predictNGA().makeyMXS(categorize_k=True, default_func=True)
>>> mxs.yNGA_[:7]
... array([2, 2, 2, 2, 2, 2, 2])
>>> ymxs[:7]
Out[40]: array([22, 22, 22, 22, 22, 22, 22])
>>> mxs.mxs_group_classes_
Out[56]: {1: 1, 2: 22, 3: 3} # transform classes
>>> mxs.mxs_group_labels_
Out[57]: (2,)
>>> # **comment:
    # # only the label '2' is tranformed to '22' since
    # it is the only one that has similariry with the true label 2

predictNGA(n_components=2, return_label=False, **NGA_kws)[source]#

Predicts Naive Group of Aquifer from Hydro-Log data.

Parameters:

n_components (int, default=2) – Number of dimension to preserve. If`n_components` is ranged between float 0. to 1., it indicates the number of variance ratio to preserve. If None as default value the number of variance to preserve is 95%.
return_label (bool,default=False) – If True, return the NGA label predicted, otherwise return MXS instanciated object. if False, NGA label can be fetch using the attribute watex.hydro.MXS.yNGA_
NGA_kws (dict,) – keyword argument passed to watex.utils.predict_NGA_labels()

Returns:

yNGA_ or self – MXS instanciated object.

Return type:

arraylike-1d of naive group of aquifer or

Example

>>> from watex.datasets import load_hlogs
>>> from watex.methods.hydro import MXS
>>> hdata = load_hlogs ().frame
>>> # drop the 'remark' columns since there is no valid data
>>> hdata.drop (columns ='remark', inplace=True)
>>> mxs =MXS (kname ='k').fit(hdata) # specify the 'k' column
>>> y_pred = mxs.predictNGA(return_label=True )
>>> y_pred [-12:]
Out[52]: array([1, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3])

sname = None#

verbose = 0#

zname = None#