watex package#

A machine learning research in water exploration#

watex stands for WAT-er EX-ploration. Packages and/or modules are written to solve engineering problems in the field of groundwater exploration (GWE). Currently, dealing with:

geophysical (from DC-Electrical to Electromagnetic);
hydrogeology (from drilling to parameters calculation);
hydrogeophysic (predicting permeability coefficient (k), flow rate);
EM (processing NSAMT noised data and recover missing tensors);
geology (for stratigraphic model generation);
more…

WATex contributes to minimize the risk of unsucessfull drillings, unustainable boreholes and could hugely reduce the cost of the hydrogeology parameter collections.

watex.bi_selector(d, /, features=None, return_frames=False)[source]#

Auto-differentiates the numerical from categorical attributes.

This is usefull to select the categorial features from the numerical features and vice-versa when we are a lot of features. Enter features individually become tiedous and a mistake could probably happenned.

Parameters:

d (pandas dataframe) – Dataframe pandas
features (list of str) – List of features in the dataframe columns. Raise error is feature(s) does/do not exist in the frame. Note that if features is None, it returns the categorical and numerical features instead.
return_frames (bool, default =False) – return the difference columns (features) from the given features as a list. If set to True returns bi-frames composed of the given features and the remaining features.

Returns:

- Tuple ( list, list) – list of features and remaining features
- Tuple ( pd.DataFrame, pd.DataFrame ) – List of features and remaing features frames.

Example

>>> from watex.utils.mlutils import bi_selector
>>> from watex.datasets import load_hlogs
>>> data = load_hlogs().frame # get the frame
>>> data.columns
>>> Index(['hole_id', 'depth_top', 'depth_bottom', 'strata_name', 'rock_name',
       'layer_thickness', 'resistivity', 'gamma_gamma', 'natural_gamma', 'sp',
       'short_distance_gamma', 'well_diameter', 'aquifer_group',
       'pumping_level', 'aquifer_thickness', 'hole_depth_before_pumping',
       'hole_depth_after_pumping', 'hole_depth_loss', 'depth_starting_pumping',
       'pumping_depth_at_the_end', 'pumping_depth', 'section_aperture', 'k',
       'kp', 'r', 'rp', 'remark'],
      dtype='object')
>>> num_features, cat_features = bi_selector (data)
>>> num_features
...['gamma_gamma',
     'depth_top',
     'aquifer_thickness',
     'pumping_depth_at_the_end',
     'section_aperture',
     'remark',
     'depth_starting_pumping',
     'hole_depth_before_pumping',
     'rp',
     'hole_depth_after_pumping',
     'hole_depth_loss',
     'depth_bottom',
     'sp',
     'pumping_depth',
     'kp',
     'resistivity',
     'short_distance_gamma',
     'r',
     'natural_gamma',
     'layer_thickness',
     'k',
     'well_diameter']
>>> cat_features
... ['hole_id', 'strata_name', 'rock_name', 'aquifer_group',
     'pumping_level']

watex.cleaner(data, /, columns=None, inplace=False, labels=None, func=None, mode='clean', **kws)[source]#

Sanitize data or columns by dropping specified labels from rows or columns.

If data is not a pandas dataframe, should be converted to dataframe and uses index to drop the labels.

Parameters:

data (pd.Dataframe or arraylike2D.) – Dataframe pandas or Numpy two dimensional arrays. If 2D array is passed, it should prior be converted to a daframe by default and drop row index from index parameters
columns (single label or list-like) –

Alternative to specifying axis (
labels, axis=1 is equivalent to columns=labels).
labels (single label or list-like) – Index or column labels to drop. A tuple will be used as a single label and not treated as a list-like.
func (F, callable) – Universal function used to clean the columns. If performs only when mode is on clean option.
inplace (bool, default False) – If False, return a copy. Otherwise, do operation inplace and return None.
mode (str, default='clean') – Options or mode of operation to do on the data. It could be [‘clean’|’drop’]. If drop, it behaves like dataframe.drop of pandas.

Returns:

DataFrame cleaned or without the removed index or column labels or None if inplace=True or array is data is passed as an array.

Return type:

DataFrame, array2D or None

watex.erpSelector(f, columns=Ellipsis, force=False, utm_zone=None, epsg=None, verbose=0.0, **kws)[source]#

Read and sanitize the data collected from the survey.

data should be an array, a dataframe, series, or arranged in .csv or .xlsx formats. Be sure to provide the header of each columns in’ the worksheet. In a file is given, header columns should be aranged as ['station','resistivity' ,'longitude', 'latitude']. Note that coordinates columns (longitude and latitude) are not compulsory.

Parameters:

f (Path-like object, ndarray, Series or Dataframe,) – If a path-like object is given, can only parse .csv and .xlsx file formats. However, if ndarray is given and shape along axis 1 is greater than 4, the ndarray should be shrunked.
columns (list) – list of the valuable columns. It can be used to fix along the axis 1 of the array the specific values. It should contain the prefix or the whole name of each item in ['station','resistivity' ,'longitude', 'latitude'].
force (bool, default=False,) – If Vertical electrical (VES) is passed while expecting ERP data, force set to True will consider the VES data as ERP data and will use only the resistivity values in VES data. This will will an invalid results especially when parameters computation are needed.
verbose (int,) – Show the verbosity; outputs more messages if True.
utm_zone (string, optional) –
zone number and ‘S’ or ‘N’ e.g. ‘55S’. Default to the centre point of the provided points. If given, the longitude/latitude are computed from valid easting/northing coordinates.

New in version 0.2.1.
epsg (int) – epsg number defining projection (see http://spatialreference.org/ref/ for moreinfo). Overrides utm_zone if both are provided
kws (dict) – Additional pandas pd.read_csv and pd.read_excel methods keyword arguments. Be sure to provide the right argument. when reading f. For instance, provide sep= ',' argument when the file to read is xlsx format will raise an error. Indeed, sep parameter is acceptable for parsing the .csv file format only.

Return type:

DataFrame with valuable column(s).

Notes

The length of acceptable columns is 4. If the size of the columns is higher than 4, the data should be shrunked to match the expected columns. Futhermore, if the header is not specified in f , the defaut column arrangement should be used. Therefore, the second column should be considered as the resistivity column.

Examples

>>> import numpy as np
>>> from watex.utils.coreutils import erpSelector
>>> df = erpSelector ('data/erp/testsafedata.csv')
>>> df.shape
... (45, 4)
>>> list(df.columns)
... ['station','resistivity', 'longitude', 'latitude']
>>> df = erp_selector('data/erp/testunsafedata.xlsx')
>>> list(df.columns)
... ['easting', 'station', 'resistivity', 'northing']
>>> df = erpSelector(np.random.randn(7, 7))
>>> df.shape
... (7, 4)
>>> list(df.columns)
... ['station', 'resistivity', 'longitude', 'latitude']

watex.erpSmartDetector(constr, erp, station=None, coerce=False, return_cz=False, view=False, raise_warn=True, **plot_kws)[source]#

Automatically detect the drilling location by involving the constraints observed in the survey area.

Consider the constraints on the survey area and detect the suitable drilling location. Commonly the station is not needed when using the constraintssince the station indicates that the user is aware about the reason to select this station. However in the case, doubts raise, user can set the parameter coerce to True.

Parameters:

constr (list, dict) –
List of restricted station. The constraint or restricted stations are the station where to ignore when selecting the best drilling location. Indeed, this is useful since in DWSC, not the station are presumed to be suitable to propose the drilling in technical view. For instance, if some stations are close to the household waste site, the stations must be list and ignored.

If the constr is passed in a dictionnary, it might be contain, the key for the restricted stations and the value for the reason why the station is restricted. For instance:
```
constr = {"s02": "station close to the household waste"
          "S25": "station is located in a marsh area."
          }
```
erp (array-like 1d) – DC profiling ERP resistivity values
station (str, optional) – The station of the presumed location for drilling operations. Commonly the station is not need when using the constraints. If the station is given whereas coerce=False an errors will raise top warnm the users, To force considering the station in the auto-detection, coerce must be set to True.
coerce (bool, default=False,) – Allow the station to be consider in the auto-detection.
raise_warn (bool, default=True,) – warn the user whether a suitable location is found or not. Returns None otherwise.
view (bool, default=False,) – Plot the conductive zone and restricted stations.
plot_kws (dict,) – Additional plotting keywords arguments passed to plotAnomaly().

Returns:

(station |None) or cz, cs – staion for the drilling operations detected automatically. If no station is detected, will return None. if return_cz is True, station and the conductive zone are returned as well as the restricted station position number.

Return type:

str,

See also

watex.plotAnomaly: Plot DC profiling ERP and conductive zone.

Examples

>>> import numpy as np
>>> from watex.datasets import make_erp
>>> from watex.utils.coreutils import erpSmartDetector
>>> resistivity = make_erp (n_stations =50 , as_frame=True, seed=125).resistivity
>>> # get the min value of the resistivity
>>> resmin_index = np.where ( resistivity==resistivity.min())
42
>>> erpSmartDetector (constr =['s42'], resistivity )
'S13'
>>> # S42 is rejected and selected another zone presumed to be better.
>>> constraints ={"S00": "Marsh area. ",
                  "S10": " Municipality square, no authorization to make drill",
                  "S29": "Heritage site",
                  "S46": "Household waste site",
                  "S42": "Household waste site"
                  }
>>> erpSmartDetector (constraints, resistivity)
'S16'
>>> erpSmartDetector (['s12', 's40'], resistivity)
'S29'
>>> # station 42 close s40 is rejected too.

watex.fetch_data(tag, **kws)[source]#

Fetch dataset from tag.

A tag corresponds to the name area of data collection or each level of data processing.

Parameters:

tag (str, ['bagoue', 'tankesse', 'semien', 'iris', 'boundiali', 'gbalo']) –

name of the area of data to fetch. For instance set the tag to bagoue will load the bagoue datasets. If the tag name is following by a suffix, the later specifies the stage of the data processing. As an example, bagoue original or bagoue prepared will retrieve the original data and the transformed data after applying default transformers respectively.

There are different options to retrieve data such as:

[‘original’] => original or raw data -& returns a dict of details
contex combine with get method to get the dataframe like:
>>> fetch_data ('bagoue original').get ('data=df')
[‘stratified’] => stratification data
[‘mid’ |'semi'|’preprocess’|’fit’]=> data cleaned with
attributes experience combinaisons.
[‘pipe’]=> default pipeline created during the data preparing.
[‘analyses’|’pca’|’reduce dimension’]=> data with text attributes
only encoded using the ordinal encoder + attributes combinaisons.
[‘test’] => stratified test set data

Returns:

dict, X, y –

If tag is following by suffix in the case of ‘bagoue’ area, it returns:

data: Original data
X, y : Stratified train set and training target
X0, y0: data cleaned after dropping useless features and combined
numerical attributes combinaisons if True
X_prepared, y_prepared: Data prepared after applying all the
transformation via the transformer (pipeline).
XT, yT : stratified test set and test label
_X: Stratified training set for data analysis. So None sparse
matrix is contained. The text attributes (categorical) are converted using Ordianal Encoder.
_pipeline: the default pipeline.

Return type:

frame of Boxspace object

Examples

>>> from watex.datasets import fetch_data
>>> b = fetch_data('bagoue' ) # no suffix returns 'Boxspace' object
>>> b.tnames
... array(['flow'], dtype='<U4')
>>> b.feature_names
... ['num',
     'name',
     'east',
     'north',
     'power',
     'magnitude',
     'shape',
     'type',
     'sfi',
     'ohmS',
     'lwi',
     'geol']
>>> X, y = fetch_data('bagoue prepared' )
>>> X # is transformed  # ready for prediction
>>> X[0]
... <1x18 sparse matrix of type '<class 'numpy.float64'>'
        with 8 stored elements in Compressed Sparse Row format>
>>> y
... array([2, 1, 2, 2, 1, 0, ... , 3, 2, 3, 3, 2], dtype=int64)

watex.fittensor(refreq, compfreq, z, fill_value=nan)[source]#

Fit each tensor component to the complete frequency range.

The complete frequency is the frequency with clean data. It contain all the frequency range on the site. During the survey, the missing frequencies lead to missing tensor data. So the function will indicate where the tensor data is missing and fit to the prior frequencies.

Parameters:

refreq (ArrayLike) – Reference frequency - Should be the complete frequency collected in the field.
comfreq (array-like,) – The specific frequency collect in the site. Sometimes due to the interferences, the frequency at individual site could be different from the complete. However, the frequency values at the individual site must be included in the complete frequency refreq.
z (array-like,) – should be the tensor value (real or imaginary part ) at the component xx, xy, yx, yy.
fill_value (float . default='NaN') – Value to replace the missing data in tensors.

Returns:

Z – new Z filled by invalid value NaN where the frequency is missing in the data.

Return type:

Arraylike

Examples

>>> import numpy as np
>>> from watex.utils.exmath import fittensor
>>> refreq = np.linspace(7e7, 1e0, 20) # 20 frequencies as reference
>>> freq_ = np.hstack ((refreq.copy()[:7], refreq.copy()[12:] ))
>>> z = np.random.randn(len(freq_)) *10 # assume length of  freq as
...                 # the same like the tensor Z value
>>> zn  = fittensor (refreq, freq_, z)
>>> z # some frequency values are missing but not visible.
...array([-23.23448367,   2.93185982,  10.81194723, -12.46326732,
         1.57312908,   7.23926576, -14.65645799,   9.85956253,
         3.96269863, -10.38325124,  -4.29739755,  -8.2591703 ,
        21.7930423 ,   0.21709129,   4.07815217])
>>> # zn show where the frequencies are missing
>>> # the NaN value means in a missing value in  tensor Z at specific frequency
>>> zn
... array([-23.23448367,   2.93185982,  10.81194723, -12.46326732,
         1.57312908,   7.23926576, -14.65645799,          nan,
                nan,          nan,          nan,          nan,
         9.85956253,   3.96269863, -10.38325124,  -4.29739755,
        -8.2591703 ,  21.7930423 ,   0.21709129,   4.07815217])
>>> # let visualize where the missing frequency value in tensor Z
>>> refreq
... array([7.00000000e+07, 6.63157895e+07, 6.26315791e+07, 5.89473686e+07,
       5.52631581e+07, 5.15789476e+07, 4.78947372e+07, 4.42105267e+07*,
       4.05263162e+07*, 3.68421057e+07*, 3.31578953e+07*, 2.94736848e+07*,
       2.57894743e+07, 2.21052638e+07, 1.84210534e+07, 1.47368429e+07,
       1.10526324e+07, 7.36842195e+06, 3.68421147e+06, 1.00000000e+00])
>>> refreq[np.isnan(zn)] #we can see the missing value between [7:12](*) in refreq
... array([44210526.68421052, 40526316.21052632, 36842105.73684211,
       33157895.2631579 , 29473684.78947368])

watex.get2dtensor(z_or_edis_obj_list, /, tensor='z', component='xy', kind='modulus', return_freqs=False, **kws)[source]#

Make tensor into two dimensional array from a collection of Impedance tensors Z.

Out 2D resistivity, phase-error and tensor matrix from a collection of EDI-objects.

Matrix depends of the number of frequency times number of sites. The function asserts whether all data from all frequencies are available. The missing values should be filled by NaN. Note that each element of z is (nfreq, 2, 2) dimension for:

xx ( 0, 0) ------- xy ( 0, 1)
yx ( 1, 0) ------- yy ( 1, 1)

Parameters:

z_or_edis_obj_list (list of watex.edi.Edi or watex.externals.z.Z) – A collection of EDI- or Impedances tensors objects.
tensor (str, default='z') – Tensor name. Can be [ resistivity|phase|z|frequency]
component (str, default='xy' (TE mode)) – EM mode. Can be [‘xx’, ‘xy’, ‘yx’, ‘yy’]
out (str) – kind of data to output. Be sure to provide the component to retrieve the attribute from the collection object. Except the error and frequency attribute, the missing component to the attribute will raise an error. for instance resxy for xy component. Default is resxy.
kind (str , default='modulus') – focuses on the tensor output. Note that the tensor is a complex number of ndarray (nfreq, 2,2 ). If set to``modulus`, the modulus of the complex tensor should be outputted. If real or``imag``, it returns only the specific one. Default is complex.
return_freqs (Arraylike ,) – If True , returns also the full frequency ranges.
kws (dict) – Additional keywords arguments from :meth:`~EM.getfullfrequency `.

Returns:

mat2d – the matrix of number of frequency and number of Edi-collectes which correspond to the number of the stations/sites.

Return type:

arraylike2d

Examples

>>> from watex.datasets import load_huayuan
>>> from watex.methods import get2dtensor
>>> box= load_huayuan ( key ='raw', clear_cache = True, samples =7)
>>> data = box.data
>>> phase_yx = get2dtensor ( data, tensor ='phase', component ='yx')
>>> phase_yx.shape
(56, 7)
>>> phase_yx [0, :]
array([        nan,         nan,         nan,         nan, 18.73244951,
       35.00516522, 59.91093054])

watex.magnitude(cz)[source]#

Compute the magnitude of selected conductive zone.

The magnitude parameter is the absolute resistivity value between the minimum \(\min \rho_a\) and maximum \(\max \rho_a\) value of selected anomaly:

\[magnitude=|\min\rho_a -\max\rho_a|\]

Parameters:: cz – array-like. Array of apparent resistivity values composing the conductive zone.
Returns:: Absolute value of anomaly magnitude in ohm.meters.

watex.make_erp(*, n_stations=42, max_rho=1000.0, min_rho=1.0, step=20.0, reflong='110:29:09.00', reflat='26:03:05.00', utm_zone='29N', order='+', full_coordinates=True, raise_warning=False, as_frame=False, seed=None, is_utm=False, epsg=None, **coord_kws)[source]#

Generate Electrical Resistivity Profiling (ERP) data from stations and coordinates points.

To generate samples from specific area, it is better to provide both latitude and longitude values from a single station of this area as arguments passed to parameters reflat and reflong respectively. Also specify the utm_zone for the lat/lon coordinates conversion into UTM if necessary. If not useful, can turn off the parameter full_coordinates to False.

Parameters:

n_stations (int, default=42) – number of measurements stations
max_rho (float, default=1e3) – maximum resistivity value on the survey area in \(\Omega.m\)
min_rho (float, default=1e0) – minimum resistivity value on the survey area in \(\Omega.m\)
reflong (float or string or list of [start, stop], default='110:29:09.00') – Reference longitude in degree decimal or in DD:MM:SS for the first station considered as the origin of the landmark.
reflat (float or string or list of [start, stop], default='26:03:05.00') – Reference latitude in degree decimal or in DD:MM:SS for the reference site considered as the landmark origin. If value is given in a list, it can contain the start point and the stop point.
step (float or str , default=20) – Offset or the distance of seperation between different sites in meters. If the value is given as string type, except the km, it should be considered as a m value. Only meters and kilometers are accepables.
order (str , default='-') – Direction of the projection line. By default the projected line is in ascending order i.e. from SW to NE with angle r set to 45 degrees. Could be - for descending order. Any other value should be in ascending order.
utm_zone (string (##N or ##S), default='29N') – utm zone in the form of number and North or South hemisphere, 10S or 03N Must be given if utm2deg is set to True.
full_coordinates (bool, default=True,) – Convert latitude and longitude to approximate UTM values. Easting and northing are gotten using the reference ellipsoid =23 with WGS84. If False, easting and northing are not computed and set to null.
raise_warning (bool, default=True,) – Raises warnings if GDAL is not set or the coordinates accurately status.
as_frame (bool, default=False,) – if True, outputs the data into as a pandas dataframe, Boxspace object otherwise.
seed (int, Optional,) – It allows reproducing the same data. If value is passed, it reproduces the same data at that sample points.
is_utm (bool, default=False) –
Type of coordinates passed to reflat and reflong` params for generating longitude-latitude coordinates. If is_utm is explicity set to True, that means values reflong and reflat arein UTM coordinates. Then the conversion to longitude-latitude should be operated. However if is_utm is False when reflat and reflong values are greater than 90 and 180 degrees respectively, an errors should raise.

New in version 0.2.1.
epsg (int, str, Optional) – EPSG number defining projection. See http://spatialreference.org/ref/ for moreinfo. Overrides utm_zone if both are provided
coord_kws (dict,) – Additional keywords passed to makeCoords().

Return type:

(pd.Dataframe | Boxspace )

Examples

>>> from watex.datasets.gdata import make_erp
>>> erp_data = make_erp (n_stations =50 , step =30  , as_frame =True)
>>> erp_data.head(3)
Out[256]:
   station  longitude  latitude        easting    northing  resistivity
0        0 -13.488511  0.000997  668210.580864  110.183287   225.265306
1       30 -13.488511  0.000997  668210.581109  110.183482   327.204082
2       60 -13.488510  0.000997  668210.581355  110.183676   204.877551

watex.make_naive_pipe(X, y=None, *, num_features=None, cat_features=None, label_encoding='LabelEncoder', scaler='StandardScaler', missing_values=nan, impute_strategy='median', sparse_output=True, for_pca=False, transform=False)[source]#

make a pipeline to transform data at once.

make a naive pipeline is usefull to fast preprocess the data at once for quick prediction.

Work with a pandas dataframe. If None features is set, the numerical and categorial features are automatically retrieved.

Parameters:

X (pandas dataframe of shape (n_samples, n_features)) – The input samples. Use dtype=np.float32 for maximum efficiency. Sparse matrices are also supported, use sparse csc_matrix for maximum efficiency.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – Target relative to X for classification or regression; None for unsupervised learning.
num_features (list or str, optional) – Numerical features put on the list. If num_features are given whereas cat_features are None, cat_features are figured out automatically.
cat_features (list of str, optional) – Categorial features put on the list. If num_features are given whereas num_features are None, num_features are figured out automatically.
label_encoding (callable or str, default='sklearn.preprocessing.LabelEncoder') – kind of encoding used to encode label. This assumes ‘y’ is supplied.
scaler (callable or str , default='sklearn.preprocessing.StandardScaler') – kind of scaling used to scaled the numerical data. Note that for the categorical data encoding, ‘sklearn.preprocessing.OneHotEncoder’ is implemented under the hood instead.
missing_values (int, float, str, np.nan, None or pandas.NA, default=np.nan) – The placeholder for the missing values. All occurrences of missing_values will be imputed. For pandas’ dataframes with nullable integer dtypes with missing values, missing_values can be set to either np.nan or pd.NA.
impute_strategy (str, default='mean') –
The imputation strategy.
- If “mean”, then replace missing values using the mean along each column. Can only be used with numeric data.
- If “median”, then replace missing values using the median along each column. Can only be used with numeric data.
- If “most_frequent”, then replace missing using the most frequent value along each column. Can be used with strings or numeric data. If there is more than one such value, only the smallest is returned.
- If “constant”, then replace missing values with fill_value. Can be used with strings or numeric data.
  
  strategy=”constant” for fixed value imputation.
sparse_output (bool, default=False) – Is used when label y is given. Binarize labels in a one-vs-all fashion. If True, returns array from transform is desired to be in sparse CSR format.
for_pca (bool, default=False,) – Transform data for principal component ( PCA) analysis. If set to True, watex.exlib.sklearn.OrdinalEncoder` is used insted of watex.exlib.sklearn.OneHotEncoder`.
transform (bool, default=False,) – Tranform data inplace rather than returning the naive pipeline.

Returns:

full_pipeline (watex.exlib.sklearn.FeatureUnion) –
- Full pipeline composed of numerical and categorical pipes
(X_transformed &| y_transformed) ({array-like, sparse matrix} of shape (n_samples, n_features)) –
- Transformed data.

Examples

>>> from watex.utils.mlutils import make_naive_pipe
>>> from watex.datasets import load_hlogs

(1) Make a naive simple pipeline with RobustScaler, StandardScaler >>> from watex.exlib.sklearn import RobustScaler >>> X_, y_ = load_hlogs (as_frame=True )# get all the data >>> pipe = make_naive_pipe(X_, scaler =RobustScaler )

(2) Transform X in place with numerical and categorical features with StandardScaler (default). Returned CSR matrix

>>> make_naive_pipe(X_, transform =True )
... <181x40 sparse matrix of type '<class 'numpy.float64'>'
    with 2172 stored elements in Compressed Sparse Row format>

watex.make_ves(*, samples=31, min_rho=10.0, max_rho=1000.0, max_depth=100.0, order='-', as_frame=False, seed=None, iorder=3, xy=None, is_utm=False, add_xy=False, utm_zone=None, epsg=None)[source]#

Generate Vertical Electrical Sounding (VES) data from pseudo-depth measurements.

For a large pseudo-depth measurements, one can change the number of samples to a large values. The default samples presumed collected is samples=31 measurements in deeper.

Parameters:

samples (int, default=42) – number of measurements depth AB/2 in meters.
max_rho (float, default=1e3) – maximum resistivity value expected in deeeper on the survey area in \(\\Omega.m\)
min_rho (float, default=1e1) – minimum resistivity value expected in deeper on the survey area in \(\\Omega.m\)
order (str , default='-') – Direction of the projection line. By default the projected line is in ascending order i.e. from SW to NE with angle r set to 45 degrees. Could be - for descending order. Any other value should be in ascending order.
max_depth (float, default=100) – Value of the measurement in deeper expected to reach by AB/2 in meters.
as_frame (bool, default=False,) – if True, outputs the data into as a pandas dataframe, Boxspace object otherwise.
seed (int, Optional,) – It allows reproducing the same data. If value is passed, it reproduces the same data at that sample points.
iorder (int, default=3) – Inflexion order. It is a positive value greater than 0. If None, it should be computed using the length of extrema (local + global). It also might be lower as possible to let the fitting VES curve more realistic.
xy (tuple, optional) –
Coordinates point ( easting, northing ) or (lon, lat) corresponding to the VES points sves. If coordinates values are not given coordinates are randomly generated into (lon, lat) and stored into the attribute xy. To returns the xy auto-coordinates when as_frame=True set add_xy to True.

New in version 0.2.1.
is_utm (bool, default=False) – In principle, xy expects to be in longitude-latitude coordinates. However if coordinates are passed into a UTM such as easting-northing, user can specify the utm_zone to convert the xy values into a valid longitude and latitude coordinates.
add_xy (bool, default=False) – Add xy coordinates to the VES dataframe.
utm_zone (str, Optional) – To generate coordinates xy from a specific zone, utm_zone can be specified, otherwise 29N is used instead.
epsg (int, str, Optional) – EPSG number defining projection. See http://spatialreference.org/ref/ for moreinfo. Overrides utm_zone if both are provided

Return type:

(pd.Dataframe | Boxspace )

Notes

when returning the Boxspace object, each columns of ‘VES’ data can be retrieved as an attributes. Check the examples below

Examples

>>> from watex.datasets.gdata import make_ves
>>> b = make_ves (samples =50 , order ='+') # 50 measurements in deeper
>>> b.resistivity [:-7]
Out[314]:
array([429.873 , 434.255 , 438.5707, 442.8203, 447.0042, 451.1228,
       457.5775])
>>> b.frame.head(3)
Out[315]:
    AB   MN  resistivity
0  1.0  0.6   429.872999
1  2.0  0.6   434.255018
2  3.0  0.6   438.570675
>>> ves_data = make_ves (samples =50 , min_rho =10, max_rho =1e5 ,
                         as_frame =True, add_xy= True ,
                         xy = ( 3143965.855 , 336704.455) ,
                         is_utm = True , utm_zone = '49N', epsg =None)
>>> ves_data.head(2)
Out[316]:
    AB   MN   resistivity   longitude   latitude
0  1.0  0.6  51544.426685  107.901553 -61.802165
1  2.0  0.6  51420.739513  107.901553 -61.802165

watex.naive_imputer(X, y=None, strategy='mean', mode=None, drop_features=False, missing_values=nan, fill_value=None, verbose='deprecated', add_indicator=False, copy=True, keep_empty_features=False, **fit_params)[source]#

Imput missing values in the data.

Whatever data contains categorial features, ‘bi-impute’ argument passed to ‘kind’ parameters has a strategy to both impute the numerical and categorical features rather than raising an error when the ‘strategy’ is not set to ‘most_frequent’.

Parameters:

X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The data used to compute the mean and standard deviation used for later scaling along the features axis.
y (None) – Not used, present here for API consistency by convention.
strategy (str, default='mean') –
The imputation strategy.
- If “mean”, then replace missing values using the mean along each column. Can only be used with numeric data.
- If “median”, then replace missing values using the median along each column. Can only be used with numeric data.
- If “most_frequent”, then replace missing using the most frequent value along each column. Can be used with strings or numeric data. If there is more than one such value, only the smallest is returned.
- If “constant”, then replace missing values with fill_value. Can be used with strings or numeric data.
  
  strategy=”constant” for fixed value imputation.
mode (str, [bi-impute'], default= None) – If mode is set to ‘bi-impute’, it imputes the both numerical and categorical features and returns a single imputed dataframe.
drop_features (bool or list, default =False,) – drop a list of features in the dataframe before imputation. If True and no list of features is supplied, the categorial features are dropped.
missing_values (int, float, str, np.nan, None or pandas.NA, default=np.nan) – The placeholder for the missing values. All occurrences of missing_values will be imputed. For pandas’ dataframes with nullable integer dtypes with missing values, missing_values can be set to either np.nan or pd.NA.
fill_value (str or numerical value, default=None) – When strategy == “constant”, fill_value is used to replace all occurrences of missing_values. If left to the default, fill_value will be 0 when imputing numerical data and “missing_value” for strings or object data types.
keep_empty_features (bool, default=False) –
If True, features that consist exclusively of missing values when fit is called are returned in results when transform is called. The imputed value is always 0 except when strategy=”constant” in which case fill_value will be used instead.

New in version 0.2.0.
verbose (int, default=0) – Controls the verbosity of the imputer.
copy (bool, default=True) –
If True, a copy of X will be created. If False, imputation will be done in-place whenever possible. Note that, in the following cases, a new copy will always be made, even if copy=False:
- If X is not an array of floating values;
- If X is encoded as a CSR matrix;
- If add_indicator=True.
add_indicator (bool, default=False) – If True, a MissingIndicator transform will stack onto output of the imputer’s transform. This allows a predictive estimator to account for missingness despite imputation. If a feature has no missing values at fit/train time, the feature won’t appear on the missing indicator even if there are missing values at transform/test time.
fit_params (dict,) – keywords arguments passed to the scikit-learn fitting parameters More details on https://scikit-learn.org/stable/

Returns:

Xi – Data imputed

Return type:

Dataframe, array-like, sparse matrix of shape (n_samples, n_features)

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from watex.utils.mlutils import naive_imputer
>>> X= np.random.randn ( 7, 4 )
>>> X[3, :] =np.nan  ; X[:, 3][-4:]=np.nan
>>> naive_imputer  (X)
... array([[ 1.34783528,  0.53276798, -1.57704281,  0.43455785],
           [ 0.36843174, -0.27132106, -0.38509441, -0.29371997],
           [-1.68974996,  0.15268509, -2.54446498,  0.18939122],
           [ 0.06013775,  0.36687602, -0.21973368,  0.11007637],
           [-0.27129147,  1.18103398,  1.78985393,  0.11007637],
           [ 1.09223954,  0.12924661,  0.52473794,  0.11007637],
           [-0.48663864,  0.47684353,  0.87360825,  0.11007637]])
>>> frame = pd.DataFrame (X, columns =['a', 'b', 'c', 'd']  )
>>> # change [bc] types to categorical values.
>>> frame['b']=['pineaple', '', 'cabbage', 'watermelon', 'onion',
                'cabbage', 'onion']
>>> frame['c']=['lion', '', 'cat', 'cat', 'dog', '', 'mouse']
>>> naive_imputer(frame, kind ='bi-impute')
...             b      c         a         d
    0    pineaple   lion  1.347835  0.434558
    1     cabbage    cat  0.368432 -0.293720
    2     cabbage    cat -1.689750  0.189391
    3  watermelon    cat  0.060138  0.110076
    4       onion    dog -0.271291  0.110076
    5     cabbage    cat  1.092240  0.110076
    6       onion  mouse -0.486639  0.110076

watex.naive_scaler(X, y=None, *, kind=<class 'sklearn.preprocessing._data.StandardScaler'>, copy=True, with_mean=True, with_std=True, feature_range=(0, 1), clip=False, norm='l2', **fit_params)[source]#

Quick data scaling using both strategies implemented in scikit-learn with StandardScaler and MinMaxScaler.

Function returns scaled frame if dataframe is passed or ndarray. For other scaling, call scikit-learn instead.

Parameters:

X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The data used to compute the mean and standard deviation used for later scaling along the features axis.
y (None) – Ignored.
kind (str, default='StandardScaler') – Kind of data scaling. Can also be [‘MinMaxScaler’, ‘Normalizer’]. The default is ‘StandardScaler’
copy (bool, default=True) – If False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be returned.
with_mean (bool, default=True) – If True, center the data before scaling. This does not work (and will raise an exception) when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.
with_std (bool, default=True) – If True, scale the data to unit variance (or equivalently, unit standard deviation).
feature_range (tuple (min, max), default=(0, 1)) – Desired range of transformed data.
norm ({'l1', 'l2', 'max'}, default='l2') – The norm to use to normalize each non zero sample. If norm=’max’ is used, values will be rescaled by the maximum of the absolute values.
clip (bool, default=False) – Set to True to clip transformed values of held-out data to provided feature range.
fit_params (dict,) – keywords arguments passed to the scikit-learn fitting parameters More details on https://scikit-learn.org/stable/

Returns:

X_sc – Transformed array.

Return type:

{ndarray, sparse matrix} or dataframe of shape (n_samples, n_features)

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from watex.utils.mlutils import naive_scaler
>>> X= np.random.randn (7 , 3 )
>>> X_std = naive_scaler (X )
... array([[ 0.17439644,  1.55683005,  0.24115109],
       [-0.59738672,  1.3166854 ,  1.23748004],
       [-1.6815365 , -1.19775838,  0.71381357],
       [-0.1518278 , -0.32063059, -0.47483155],
       [-0.41335886,  0.13880519,  0.69258621],
       [ 1.45221902, -1.03852015, -0.40157981],
       [ 1.21749443, -0.45541153, -2.00861955]])
>>> # use dataframe
>>> Xdf = pd.DataFrame (X, columns =['a', 'c', 'c'])
>>> naive_scaler (Xdf , kind='Normalizer') # return data frame
...           a         c         c
    0  0.252789  0.967481 -0.008858
    1 -0.265161  0.908862  0.321961
    2 -0.899863 -0.416231  0.130380
    3  0.178203  0.039443 -0.983203
    4 -0.418487  0.800306  0.429394
    5  0.933933 -0.309016 -0.179661
    6  0.795234 -0.051054 -0.604150

watex.ohmicArea(data=None, search=45.0, sum=False, objective='ohmS', **kws)[source]#

Compute the ohmic-area from the Vertical Electrical Sounding data collected in exploration area.

Parameters:

* data: Dataframe pandas - contains the depth measurement AB from current: electrodes, the potentials electrodes MN and the collected apparents resistivities.
* search: float - The depth in meters from which one expects to find a: fracture zone outside of pollutions. Indeed, the search parameter is used to speculate about the expected groundwater in the fractured rocks under the average level of water inrush in a specific area. For instance in Bagoue region , the average depth of water inrush is around 45m. So the search can be specified via the water inrush average value.
* objective: str - Type operation to outputs. By default, the function: outputs the value of pseudo-area in \(\Omega .m^2\). However, for plotting purpose by setting the argument to view, its gives an alternatively outputs of X and Y, recomputed and projected as weel as the X and Y values of the expected fractured zone. Where X is the AB dipole spacing when imaging to the depth and Y is the apparent resistivity computed
kws: dict - Additionnal keywords arguments from |VES| data operations.: See watex.utils.exmath.vesDataOperator() for futher details.

Returns:

List of twice tuples:

Tuple(ohmS, error, roots):
- `ohmS`is the pseudo-area computed expected to be a fractured zone
- error is the integration error
- roots is the integration boundaries of the expected fractured
  zone where the basement rocks is located above the resistivity transform function. At these points both curves values equal to null.
Tuple (XY, fit XY,XYohmSarea):
- XY is the ndarray(nvalues, 2) of the operated of AB dipole
  spacing and resistivity rhoa values.
- fit XY is the fitting ndarray(nvalues, 2) uses to redraw the
  dummy resistivity transform function.
- XYohmSarea is ndarray(nvalues, 2) of the dipole spacing and
  resistiviy values of the expected fracture zone.

Raises:

VESError: If the search is greater or equal to the maximum investigation depth in meters.

Notes

The ohmS value calculated from pseudo-area is a fully data-driven parameter and is used to evaluate a pseudo-area of the fracture zone from the depth where the basement rock is supposed to start. Usually, when exploring deeper using the Vertical Electrical Sounding, we are looking for groundwater in thefractured rock that is outside the anthropic pollution (Biemi, 1992). Since the VES is an indirect method, we cannot ascertain whether the presumed fractured rock contains water inside. However, we assume that the fracture zone could exist and should contain groundwater. Mathematically, based on the VES1D model proposed by `Koefoed, O. (1976)`_ , we consider a function \(\rho_T(l)\), a set of reducing resistivity transform function to lower the boundary plane at half the current electrode spacing \((l)\). From the sounding curve \(\rho_T(l)\), curve an imaginary basement rock \(b_r (l)\) of slope equal to 45° with the horizontal \(h(l)\) was created. A pseudo-area \(S(l)\) should be defined by extending from \(h(l)\) the \(b_r (l)\) curve when the sounding curve \(\rho_T(l)\) is below \(b_r(l)\), otherwise \(S(l)\) is equal to null. The computed area is called the ohmic-area \(ohmS\) expressed in \(\Omega .m^2\) and constitutes the expected fractured zone. Thus \(ohmS\) ≠ \(0\) confirms the existence of the fracture zone while of \(Ohms=0\) raises doubts. The equation to determine the parameter is given as:

\[ \begin{align}\begin{aligned}ohmS & = &\int_{ l_i}^{l_{i+1}} S(l)dl \quad {s.t.}\\\begin{split}S(l) & = & b_r (l) - \rho_T (l) \quad \text{if} \quad b_r (l) > \rho_T (l) \\ & = & 0. \quad \text{if} \quad b_r (l) \leq \rho_T (l)\end{split}\\b_r(l) & = & l + h(l) \quad ; \quad h(l) = \beta\\\rho_T(l) & = & l^2 \int_{0}^{\infty} T_i( \lambda ) h_1( \lambda l) \lambda d\lambda\end{aligned}\end{align} \]

where \(l_i \quad \text{and} \quad l_{i+1}\) solve the equation \(S(l=0)\); \(l\) is half the current electrode spacing \(AB/2\), and \(h_1\) denotes the first-order of the Bessel function of the first kind, \(\beta\) is the coordinate value on y-axis direction of the intercept term of the \(b_r(l)\) and \(h(l)\), \(T_i(\lambda )\) resistivity transform function, \(lamda\) denotes the integral variable, where n denotes the number of layers, \(rho_i\) and \(h_i\) are the resistivity and thickness of the \(i-th\) layer, respectively. Get more explanations and cleareance of formula in the paper of `Kouadio et al 2022`_.

. _Cote d’Ivoire: https://en.wikipedia.org/wiki/Ivory_Coast

Examples

>>> from watex.utils.exmath import ohmicArea
>>> from watex.utils.coreutils import vesSelector
>>> data = vesSelector (f= 'data/ves/ves_gbalo.xlsx')
>>> (ohmS, err, roots), *_ = ohmicArea(data = data, search =45, sum =True )
... (13.46012197818152, array([5.8131967e-12]), array([45.        , 98.07307307]))
# pseudo-area is computed between the spacing point AB =[45, 98] depth.
>>> _, (XY.shape, XYfit.shape, XYohms_area.shape) = ohmicArea(
                AB= data.AB, rhoa =data.resistivity, search =45,
                objective ='plot')
... ((26, 2), (1000, 2), (8, 2))

watex.plotAnomaly(erp, cz=None, station=None, fig_size=(10, 4), fig_dpi=300, savefig=None, show_fig_title=True, style='seaborn', fig_title_kws=Ellipsis, czkws=Ellipsis, legkws=Ellipsis, how='py', **kws)[source]#

Plot the whole Electrical Resistivity Profiling line and selected conductive zone.

Conductive zone can be supplied nannualy as a subset of the erp or by specifying the station expected for drilling location. For instance S07 for the seventh station. Futhermore, for automatic detection, one should set the station argument s to auto. However, it ‘s recommended to provide the cz or the s to have full control. The conductive zone overlained the whole Electrical Resistivity Profiling survey. user can customize the cz plot by filling with Matplotlib pyplot additional keywords araguments thought the keyword arguments czkws.

Parameters:

erp: array_like 1d

the Electrical Resistivity Profiling survey line. The line is an array of resistivity values. Note that if a dataframe is passed, be sure that the frame matches the DC resistivity data (ERP), otherwise an error occurs. At least, the frame columns includes the resistivity and stations.

cz: array_like 1d

the selected conductive zone. If None, only the erp should be displayed. Note that cz is an subset of erp array.

station: str, optional

The station location given as string (e.g. s= "S10") or as a station number (indexing; e.g s =10). If value is set to "auto", s should be find automatically and fetching cz as well.

figsize: tuple, default =(10, 4)

Tuple value of figure size. Refer to the web resources Matplotlib figure.

fig_dpi: int , default=300,

figure resolution “dot per inch”. Refer to Matplotlib figure.

savefig: str, optional,

save the figure. Refer to Matplotlib figure.

show_fig_title: bool, default =True

display the title of the figure.

fig_title_kws: dict,

Keywords arguments of figure suptile. Refer to Matplotlib figsuptitle.

style: str - the style for customizing visualization. For instance to

get the first seven available styles in pyplot, one can run the script below:

plt.style.available[:7]

Futher details can be foud in Webresources below or click on GeekforGeeks.

how: str, default=’py’

By default (how='py'), the station is naming following the Python indexing. Station is counting from station 00(S00). Any other values will start the station naming from 1.

czkws: dict,

keywords Matplotlib pyplot additional arguments to customize the cz plot.

legkws: dict,

Additional keywords Matplotlib legend arguments.

kws: dict,

additional keywords argument for Matplotlib pyplot to customize the erp plot.

See also

watex.erpSmartDetector: Detection conductive zone applying the constraint. Set the view=True for constraints visualization.

. _Cote d’Ivoire: https://en.wikipedia.org/wiki/Ivory_Coast

Examples

>>> import numpy as np
>>> from watex.utils import plotAnomaly, defineConductiveZone
>>> test_array = np.abs (np.random.randn (10)) *1e2
>>> selected_cz ,*_ = defineConductiveZone(test_array, 7)
>>> plotAnomaly(test_array, selected_cz )
>>> plotAnomaly(test_array, selected_cz , s= 5)
>>> plotAnomaly(test_array, s= 's02')
>>> plotAnomaly(test_array)

watex.plotOhmicArea(data=None, search=45.0, pre_computed=False, xy=None, xyf=None, xyarea=None, colors=None, fbtw=False, **plot_kws)[source]#

Plot the Vertical Electrical Sounding data ohmic -area

Parameters:

data (*) – contains the depth measurement AB from current electrodes, the potentials electrodes MN and the collected apparent resistivities.
search (*) – The depth in meters from which one expects to find a fracture zone outside of pollutions. Indeed, the search parameter is used to speculate about the expected groundwater in the fractured rocks under the average level of water inrush in a specific area. For instance in Bagoue region , the average depth of water inrush is around 45m. So the search can be specified via the water inrush average value.
pre_computed (bool, default=False,) – If True computed the ohmic_area parameters. If False, the ohmic area arguments must be passed to xy, xyf and xyarea, otherwise an errors will raise.
xy (array-like of shape (n_AB, 2)) – Arraylike of the sanitized depth measurement AB from current. electrodes n_AB. See vesDataOperator().
xyf (array-like of shape (n_fit_samples, 2)) – Array-like of the fitted samples i.e the number of points for fitting the sounding resistivity values from the surface thin the total depth. The fitted rhoa showns a smooth curves. The default point is 1000.
xyarea (array-like of shape (n_area, 2)) – Arraylike of the resistivity positions of the depth measurment AB where the fractured zone is found.
fbtw (bool, default=False,) – If True, filled the computed fractured zone using the parameters computed from xyf and xyarea.
kws (dict - Additionnal keywords arguments from Vertical Electrical Sounding data operations.) – See watex.utils.exmath.vesDataOperator() for futher details.

Notes

The first and second columns of xy, xyfit and xyarea are the position AB/2 and their corresponding resistivity values.

Examples

>>> from watex.datasets import load_semien
>>> from watex.utils.exmath import plotOhmicArea
>>> ves_data = load_semien ()
>>> plotOhmicArea (ves_data)

watex.plot_confidence_in(z_or_edis_obj_list, /, tensor='res', view='1d', drop_outliers=True, distance=None, c_line=False, view_ci=True, figsize=(6, 2), fontsize=4.0, dpi=300.0, top_label='Stations', rotate_xlabel=90.0, fbtw=True, savefig=None, **plot_kws)[source]#

Plot data confidency from tensor errors.

The default tensor for evaluating the data confidence is the resistivity at TE mode (‘xy’).

Check confidence in the data before starting the concrete processing seems meaningful. In the area with complex terrain, with high topography addition to interference noises, signals are weals or missing especially when using AMT survey. The most common technique to do this is to eliminate the bad frequency and interpolate the remains one. However, the tricks for eliminating frequency differ from one author to another. Here, the tip using the data confidence seems meaningful to indicate which frequencies to eliminate (at which stations/sites) and which ones are still recoverable using the tensor recovering strategy.

The plot implements three levels of confidence:

High confidence: \(conf. \geq 0.95\) values greater than 95%
Soft confidence: \(0.5 \leq conf. < 0.95\). The data in this confidence range can be beneficial for tensor recovery to restore the weak and missing signals.
bad confidence: \(conf. <0.5\). Data in this interval must be deleted.

Parameters:

z_or_edis_obj_list (list of watex.edi.Edi or watex.externals.z.Z) – A collection of EDI- or Impedances tensors objects.
tensor (str, default='res') – Tensor name. Can be [ ‘resistivity’|’phase’|’z’|’frequency’]
view (str, default='1d') – Type of plot. Can be [‘1D’|’2D’]
drop_outliers (bool, default=True) – Suppress the ouliers in the data if True.
distance (float, optional) – Distance between stations/sites
fontsize (float, default=3.) – label font size.
figsize (Tuple, default=(6, 2)) – Figure size.
c_line (bool, default=True,) – Display the confidence line in two dimensinal view.
dpi (int, default=300) – Image resolution in dot-per-inch
rotate_xlabel (float, default=90.) – Angle to rotate the stations/sites labels
top_label (str,default='Stations') – Labels the sites either using the survey name.
view_ci (bool,default=True,) – Show the marker of confidence interval.
fbtw (bool, default=True,) – Fill between confidence interval.
plot_kws (dict,) – Additional keywords pass to the plot()

See also

watex.methods.Processing.zrestore: For more details about the function for tensor recovering technique.

Examples

>>> from watex.utils.exmath import plot_confidence_in
>>> from watex.datasets import fetch_data
>>> emobj  = fetch_data ( 'huayuan', samples = 25, clear_cache =True,
                         key='raw').emo
>>> plot_confidence_in (emobj.ediObjs_ ,
                        distance =20 ,
                        view ='2d',
                        figsize =(6, 2)
                        )
>>> plot_confidence_in (emobj.ediObjs_ , distance =20 ,
                        view ='1d', figsize =(6, 3), fontsize =5,
                        )

watex.plot_sfi(cz, p=None, s=None, dipolelength=None, fig_size=(10, 4), style='classic', **plotkws)[source]#

Plot sfi parameter components.

Parameters:

cz (array-like 1d,) – Selected conductive zone
p (array-like 1d,) – Station positions of the conductive zone.
dipolelength (float. If p is not given, it will be set) – automatically using the default value to match the cz size. The default value is 10.
fig_size (tuple, default=(10, 4)) – Matplotlib (MPL) figure size; should be a tuple value of integers

See also

watex.utils.exmath.sfi: for more details about the sfi parameter computation.

Examples

>>> import numpy as np
>>> from watex.utils.exmath import plot_sfi
>>> rang = np.random.RandomState (42)
>>> condzone = np.abs(rang.randn (7))*1e2
>>> plotkws  = dict (rlabel = 'Selected conductive zone (cz)',
                     color=f'{P().frcolortags.get("fr3")}',
                     )
>>> plot_sfi (condzone, **plotkws)

watex.power(p)[source]#

Compute the power of the selected conductive zone. Anomaly power is closely referred to the width of the conductive zone.

The power parameter implicitly defines the width of the conductive zone and is evaluated from the difference between the abscissa \(X_{LB}\) and the end \(X_{UB}\) points of the selected anomaly:

\[power=|X_{LB} - X_{UB} |\]

Parameters:: p – array-like. Station position of conductive zone.
Returns:: Absolute value of the width of conductive zone in meters.

watex.qc(z_or_edis_obj_list, /, tol=0.5, *, interpolate_freq=False, return_freq=False, tensor='res', return_data=False, to_log10=False, return_qco=False)[source]#

Check the quality control in the collection of Z or EDI objects.

Analyse the data in the EDI collection and return the quality control value. It indicates how percentage are the data to be representative.

Parameters:

z_or_edis_obj_list (list of watex.edi.Edi or watex.externals.z.Z) – A collection of EDI- or Impedances tensors objects.
tol (float, default=.5) – the tolerance parameter. The value indicates the rate from which the data can be consider as meaningful. Preferably it should be less than 1 and greater than 0. Default is .5 means 50 %. Analysis becomes soft with higher tol values and severe otherwise.
interpolate_freq (bool,) – interpolate the valid frequency after removing the frequency which data threshold is under the ``1-tol``% goodness
return_freq (bool, default=False) – returns the interpolated frequency.
return_data (bool, default= False,) – returns the valid data from up to 1-tol% goodness.
tensor (str, default='z') – Tensor name. Can be [ resistivity|phase|z|frequency]. Impedance is used for data quality assessment.
to_log10 (bool, default=True) – convert the frequency value to log10.
qco (return) –
retuns quality control object that wraps all usefull informations after control. The following attributes can be fetched as:
- rate_: the rate of the quality of the data
- component_: The selected component where data is selected for analysis By default used either xy or yx.
- mode_: The EM mode. Either the [‘TE’|’TM’] modes
- freqs_: The valid frequency in the data selected according to the tol parameters. Note that if interpolate_freq is True, it is used instead.
- invalid_freqs_: Useless frequency dropped in the data during control
- data_: Valid tensor data either in TE or TM mode.

Returns:

return the quality control value and interpolated frequency if

return_freq is set to True otherwise return the only the quality control ratio.

return the the quality control object.

Return type:

Tuple (float ) or (float, array-like, shape (N, )) or QCo

Examples

>>> import watex as wx
>>> data = wx.fetch_data ('huayuan', samples =20, return_data =True ,
                          key='raw')
>>> r,= wx.qc (data)
r
Out[61]: 0.75
>>> r, = wx.qc (data, tol=.2 )
0.75
>>> r, = wx.qc (data, tol=.1 )

watex.read_data(f, sanitize=Ellipsis, reset_index=Ellipsis, comments='#', delimiter=None, columns=None, npz_objkey=None, verbose=Ellipsis, **read_kws)[source]#

Assert and read specific files and url allowed by the package

Readable files are systematically convert to a data frame.

Parameters:

f (str, Path-like object) – File path or Pathlib object. Must contain a valid file name and should be a readable file or url
sanitize (bool, default=False,) –
Push a minimum sanitization of the data such as:
- replace a non-alphabetic column items with a pattern ‘_’
- cast data values to numeric if applicable
- drop full NaN columns and rows in the data
reset_index (bool, default=False,) –
Reset index if full NaN columns are dropped after sanitization.

New in version 0.2.5: Apply minimum data sanitization after reading data.
comments (str or sequence of str or None, default='#') – The characters or list of characters used to indicate the start of a comment. None implies no comments. For backwards compatibility, byte strings will be decoded as ‘latin1’.
delimiter (str, optional) – The character used to separate the values. For backwards compatibility, byte strings will be decoded as ‘latin1’. The default is whitespace.
npz_objkey (str, optional) –
Dataset key to indentify array in multiples array storages in ‘.npz’ format. If key is not set during ‘npz’ storage, arr_0 should be used.

New in version 0.2.7: Capable to read text and numpy formats (‘.npy’ and ‘.npz’) data. Note that when data is stored in compressed “.npz” format, provided the ‘.npz’ object key as argument of parameter npz_objkey. If None, only the first array should be read and npz_objkey='arr_0'.
verbose (bool, default=0) – Outputs message for user guide.
read_kws (dict,) – Additional keywords arguments passed to pandas readable file keywords.

Returns:

f – A dataframe with head contents by default.

Return type:

pandas.DataFrame

See also

np.loadtxt: load text file.
np.load: Load uncompressed or compressed numpy .npy and .npz formats.
watex.utils.baseutils.save_or_load: Save or load numpy arrays.

watex.selectfeatures(df, features=None, include=None, exclude=None, coerce=False, **kwd)[source]#

Select features and return new dataframe.

Parameters:

df – a dataframe for features selections
features – list of features to select. List of features must be in the dataframe otherwise an error occurs.
include – the type of data to retrieve in the dataframe df. Can be number.
exclude – type of the data to exclude in the dataframe df. Can be number i.e. only non-digits data will be keep in the data return.
coerce – return the whole dataframe with transforming numeric columns. Be aware that no selection is done and no error is raises instead. default is False
kwd – additional keywords arguments from pd.astype function

Ref:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html

watex.sfi(cz, p=None, s=None, dipolelength=None, view=False, raw=False, return_components=False, **plotkws)[source]#

Compute the pseudo-fracturing index known as sfi.

The sfi parameter does not indicate the rock fracturing degree in the underground but it is used to speculate about the apparent resistivity dispersion ratio around the cumulated sum of the resistivity values of the selected anomaly. It uses a similar approach of IF parameter proposed by Dieng et al (2004). Furthermore, its threshold is set to \(sqrt{2}\) for symmetrical anomaly characterized by a perfect distribution of resistivity in a homogenous medium. The formula is given by:

\[sfi=\sqrt{(P_a^{*}/P_a )^2+(M_a^{*}/M_a )^2}\]

where \(P_a\) and \(M_a\) are the anomaly power and the magnitude respectively. \(P_a^{*}\) is and \(M_a^{*}\) are the projected power and magnitude of the lower point of the selected anomaly.

Parameters:

cz (array-like,) – Selected conductive zone
p (array-like,) – Station positions of the conductive zone.
dipolelength (float. If p is not given, it will be set) – automatically using the default value to match the cz size. The default value is 10..
view (bool, default=False,) – Visualize the fitting curve. Default is False.
raw (bool,default=False,) – Overlaining the fitting curve with the raw curve from cz.
return_components (bool, default=False,) – If True, it returns the different components used for compute sfi especially for external visualization.
plotkws (dict) – Matplotlib plot keyword arguments.

Returns:

sfi – value computed for pseudo-fracturing index

Return type:

float

Examples

>>> import numpy as np
>>> from watex.property import P
>>> from watex.utils.exmath import sfi
>>> rang = np.random.RandomState (42)
>>> condzone = np.abs(rang.randn (7))
>>> # no visualization and default value `s` with global minimal rho
>>> pfi = sfi (condzone)
... 3.35110143
>>> # visualize fitting curve
>>> plotkws  = dict (rlabel = 'Conductive zone (cz)',
                     label = 'fitting model',
                     color=f'{P().frcolortags.get("fr3")}',
                     )
>>> sfi (condzone, view= True , s= 5, figsize =(7, 7),
          **plotkws )
Out[598]: (array([ 0., 10., 20., 30.]), 1)

References

See Numpy Polyfit
See Stackoverflow
the answer of AkaRem edited by Tobu and Migilson.
See Numpy Errorstate and
how to implement the context manager.

watex.shape(cz, s=Ellipsis, p=Ellipsis)[source]#

Compute the shape of anomaly.

The shape parameter is mostly used in the basement medium to depict the better conductive zone for the drilling location. According to Sombo et al. (2011; 2012), various shapes of anomalies can be described such as:

“V”, “U”, “W”, “M”, “K”, “C”, and “H”

The shape consists to feed the algorithm with the Electrical Resistivity Profiling resistivity values by specifying the station \((S_{VES})\). Indeed, mostly, \(S_{VES}\) is the station with a very low resistivity value expected to be the drilling location.

Parameters:

cz – array-like - Conductive zone resistivity values
s – int, str - Station position index or name.
p – Array-like - Should be the position of the conductive zone.

Note

If s is given, p should be provided. If p is missing an error will raises.

Returns:

str - the shape of anomaly.

Example:

>>> import numpy as np
>>> rang = np.random.RandomState(42)
>>> from watex.utils.exmath import shape
>>> test_array1 = np.arange(10)
>>> shape (test_array1)
...  'C'
>>> test_array2 = rang.randn (7)
>>> shape(test_array2)
... 'H'
>>> test_array3 = np.power(10, test_array2 , dtype =np.float32)
>>> shape (test_array3)
... 'H'   # does not change whatever the resistivity values.

References

Sombo, P. A., Williams, F., Loukou, K. N., & Kouassi, E. G. (2011).: Contribution de la Prospection Électrique à L’identification et à la Caractérisation des Aquifères de Socle du Département de Sikensi (Sud de la Côte d’Ivoire). European Journal of Scientific Research, 64(2), 206–219.
Sombo, P. A. (2012). Application des methodes de resistivites electriques: dans la determination et la caracterisation des aquiferes de socle en Cote d’Ivoire. Cas des departements de Sikensi et de Tiassale (Sud de la Cote d’Ivoire). Universite Felix Houphouet Boigny.

watex.show_versions()[source]#: Print useful debugging information”

New in version 0.1.3.

watex.smart_label_classifier(arr, /, values=None, labels=None, order='soft', func=None, raise_warn=True)[source]#

map smartly the numeric array into a class labels from a map function or a given fixed values.

New classes created from the fixed values can be renamed if labels are supplied.

Parameters:

arr (Arraylike 1d,) – array-like whose items are expected to be categorized.
values (float, list of float,) – The threshold item values from which the default categorization must be fixed.
labels (int |str| or List of [str, int],) – The labels values that might be correspond to the fixed values. Note that the number of fixed_labels might be consistent with the fixed values plus one, otherwise a ValueError shall raise if order is set to strict.
order (str, ['soft'|'strict'], default='soft',) – If order is strict, the argument passed to values must be self contain as item in the arr, and raise warning otherwise.
func (callable, optional) – Function to map the given array. If given, values dont need to be supply.
raise_warn (bool, default='True') – Raise warning message if order=soft and the fixed values are not found in the arr. Also raise warnings, if labels arguments does not match the number of class from fixed values.

Returns:

arr – categorized array with the same length as the raw

Return type:

array-like 1d

Examples

>>> import numpy as np
>>> from watex.utils.funcutils import smart_label_classifier
>>> sc = np.arange (0, 7, .5 )
>>> smart_label_classifier (sc, values = [1, 3.2 ])
array([0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2], dtype=int64)
>>> # rename labels <=1 : 'l1', ]1; 3.2]: 'l2' and >3.2 :'l3'
>>> smart_label_classifier (sc, values = [1, 3.2 ], labels =['l1', 'l2', 'l3'])
>>> array(['l1', 'l1', 'l1', 'l2', 'l2', 'l2', 'l2', 'l3', 'l3', 'l3', 'l3',
       'l3', 'l3', 'l3'], dtype=object)
>>> def f (v):
        if v <=1: return 'l1'
        elif 1< v<=3.2: return "l2"
        else : return "l3"
>>> smart_label_classifier (sc, func= f )
array(['l1', 'l1', 'l1', 'l2', 'l2', 'l2', 'l2', 'l3', 'l3', 'l3', 'l3',
       'l3', 'l3', 'l3'], dtype=object)
>>> smart_label_classifier (sc, values = 1.)
array([0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int64)
>>> smart_label_classifier (sc, values = 1., labels='l1')
array(['l1', 'l1', 'l1', 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=object)

watex.to_numeric_dtypes(arr, *, columns=None, return_feature_types=Ellipsis, missing_values=nan, pop_cat_features=Ellipsis, sanitize_columns=Ellipsis, regex=None, fill_pattern='_', drop_nan_columns=True, how='all', reset_index=Ellipsis, drop_index=True, verbose=Ellipsis)[source]#

Convert array to dataframe and coerce arguments to appropriate dtypes.

Function includes additional tools to manipulate the transformed data such as:

pop_cat_features to remove the categorical attributes,
sanitize_columns to clean the columns of the dataframe by removing the undesirable characters,
drop_nan_columns to drop all the columns and/or rows that contains full NaN, …

Parameters:

arr (Ndarray or Dataframe, shape (m_samples, n_features)) – Array of dataframe to create, to sanitize or to auto-detect feature categories ( numerical or categorical).
columns (list of str, optional) – Usefull to create a dataframe when array is given. Be aware to fit the number of array columns (shape[1])
return_feature_types (bool, default=False,) – return the list of numerical and categorial features.
missing_values (float, default='NaN') – Replace the missing or empty string if exist in the dataframe.
pop_cat_features (bool, default=False,) – remove the categorial features from the DataFrame.
sanitize_columns (bool, default=False,) –
remove undesirable character in the data columns using the default argument of regex parameters.

New in version 0.1.9.
regex (re object,) –
Regular expresion object used to polish the data columns.
the default is:
>>> import re >>> re.compile (r'[_#&.)(*@!_,;\s-]\s*', flags=re.IGNORECASE)
New in version 0.1.9.
fill_pattern (str, default='') – Pattern to replace the non-alphabetic character in each item of columns.
drop_nan_columns (bool, default=True) –
Remove all columns filled by NaN values.
how (str, default='all') – Drop also the NaN row data. The row data which is composed entirely with NaN or Null values.
reset_index (bool, default=False) –
Reset the index of the dataframe.
drop_index (bool, default=True,) –
Drop index in the dataframe after reseting.
verbose (bool, default=False,) – outputs a message by listing the categorial items dropped from the dataframe if exists.

Returns:

df or (df, nf, cf) – also return nf and cf if return_feature_types is set to``True``.

Return type:

Dataframe of values casted to numeric types

Examples

>>> from watex.datasets.dload import load_bagoue
>>> from watex.utils.funcutils import to_numeric_dtypes
>>> X, y = load_bagoue (as_frame =True )
>>> X0 =X[['shape', 'power', 'magnitude']]
>>> X0.dtypes
... shape        object
    power        object
    magnitude    object
    dtype: object
>>> df = to_numeric_dtypes(X0)
>>> df.dtypes
... shape         object
    power        float64
    magnitude    float64
    dtype: object

watex.type_(erp)[source]#

Compute the type of anomaly.

The type parameter is defined by the African Hydraulic Study Committee report (CIEH, 2001). Later it was implemented by authors such as (Adam et al., 2020; Michel et al., 2013; Nikiema, 2012). Type comes to help the differenciation of two or several anomalies with the same shape. For instance, two anomalies with the same shape W will differ from the order of priority of their types. The type depends on the lateral resistivity distribution of underground (resulting from the pace of the apparent resistivity curve) along with the whole Electrical Resistivity Profiling survey line. Indeed, four types of anomalies were emphasized:

“EC”, “CB2P”, “NC” and “CP”.

For more details refers to references.

Parameters:

erp – array-like - Array of Electrical Resistivity Profiling line composed of apparent resistivity values.

Returns:

str -The type of anomaly.

Example:

>>> import numpy as np
>>> from watex.utils.exmath import type_
>>> rang = np.random.RandomState(42)
>>> test_array2 = rang.randn (7)
>>> type_(np.abs(test_array2))
... 'EC'
>>> long_array = np.abs (rang.randn(71))
>>> type(long_array)
... 'PC'

References

Adam, B. M., Abubakar, A. H., Dalibi, J. H., Khalil Mustapha,M., & Abubakar,: A. H. (2020). Assessment of Gaseous Emissions and Socio-Economic Impacts From Diesel Generators used in GSM BTS in Kano Metropolis. African Journal of Earth and Environmental Sciences, 2(1),517–523. https://doi.org/10.11113/ajees.v3.n1.104
CIEH. (2001). L’utilisation des méthodes géophysiques pour la recherche: d’eaux dans les aquifères discontinus. Série Hydrogéologie, 169.
Michel, K. A., Drissa, C., Blaise, K. Y., & Jean, B. (2013). Application: de méthodes géophysiques à l ’ étude de la productivité des forages d ’eau en milieu cristallin : cas de la région de Toumodi ( Centre de la Côte d ’Ivoire). International Journal of Innovation and Applied Studies, 2(3), 324–334.
Nikiema, D. G. C. (2012). Essai d‘optimisation de l’implantation géophysique: des forages en zone de socle : Cas de la province de Séno, Nord Est du Burkina Faso (IRD). (I. / I. Ile-de-France, Ed.). IST / IRD Ile-de-France, Ouagadougou, Burkina Faso, West-africa. Retrieved from http://documentation.2ie-edu.org/cdi2ie/opac_css/doc_num.php?explnum_id=148

watex.vesSelector(data=None, *, rhoa=None, AB=None, MN=None, index_rhoa=None, xy_coords=None, is_utm=False, utm_zone=None, epsg=None, **kws)[source]#

Assert the validity of Vertical Electrical Sounding data and return a sanitize dataframe.

param rhoa:

array-like - Apparent resistivities collected during the sounding.

param AB:

array-like - Investigation distance between the current electrodes. Note that the AB is by convention equals to AB/2. It’s taken as half-space of the investigation depth.

param MN:

array-like - Potential electrodes distances at each investigation depth. Note by convention the values are half-space and equals to MN/2.

param f:

Path-like object or sounding dataframe. If given, the others parameters could keep the ``None` values.

param index_rhoa:

int - The index to retrieve the resistivity data of a specific sounding point. Sometimes the sounding data are composed of the different sounding values collected in the same survey area into different Electrical Resistivity Profiling line. For instance:

AB/2

MN/2

SE1

SE2

SE3

…

SEn

Where SE are the electrical sounding data values and n is the number of the sounding points selected. SE1, SE2 and SE3 are three points selected for Vertical Electrical Sounding i.e. 3 sounding points carried out either in the same Electrical Resistivity Profiling or somewhere else. These sounding data are the resistivity data with a specific numbers. Commonly the number are randomly chosen. It does not refer to the expected best fracture zone selected after the prior-interpretation. After transformation via the function ves_selector, the header of the data should hold the resistivity. For instance, refering to the table above, the data should be:

AB

MN

resistivity

resistivity

resistivity

…

Therefore, the index_rhoa is used to select the specific resistivity values i.e. select the corresponding sounding number of the Vertical Electrical Sounding expecting to locate the drilling operations or for computation. For esample, index_rhoa=1 should figure out:

AB/2

MN/2

SE2

–>

AB

MN

resistivity

If index_rhoa is None and the number of sounding curves are more than one, by default the first sounding curve is selected ie index_rhoa equals to 0.

param xy_coords:

tuple (float, float) Coordinates of the sounding point. Must be (‘longitude’,’latitude’) or (‘easting’, ‘northing’). If xy is xy_coords is given as (‘easting’ , ‘northing’), specify is_utm=True so the conversion to (‘longitude’, ‘latitude’) should be triggered. If False, a warnings occurs if values are greater than 180 and 90 degree for longitude and latitude respectively. Note that if the coordinates exists in the dataframe, its should takes the priority

New in version 0.2.1.

param is_utm:

bool, default= False, Allow conversion the (‘easting’, ‘northing’) coordinated from xy_coords to (‘longitude’, ‘latitude’)

param utm_zone:

default=’49R’ Is needed when xy_coords is passed as (‘easting’, ‘northing’) for conversion.

param epsg:

int, str , optional EPSG number defining projection. See http://spatialreference.org/ref/ for moreinfo. Overrides utm_zone if both are provided

param kws:

dict - Pandas dataframe reading additionals keywords arguments.

return:

-dataframe -Sanitize Vertical Electrical Sounding dataframe with ` AB`, MN and resistivity as the column headers.

Example:
>>> from watex.utils.coreutils import vesSelector
>>> df = vesSelector (data='data/ves/ves_gbalo.csv')
>>> df.head(3)
...    AB   MN  resistivity
    0   1  0.4          943
    1   2  0.4         1179
    2   3  0.4         1103
>>> df = vesSelector ('data/ves/ves_gbalo.csv', index_rhoa=3 )
>>> df.head(3)
...    AB   MN  resistivity
    0   1  0.4          457
    1   2  0.4          582
    2   3  0.4          558

. _Cote d’Ivoire: https://en.wikipedia.org/wiki/Ivory_Coast

watex package#

A machine learning research in water exploration#

Subpackages#

Submodules#