watex.methods package#

Methods sub-package is composed of DC-Resistivity, EM, and hydro-geological methods for prediction parameter computations as well as exporting filtering tensors for 1D/2D modeling purpose.

class watex.methods.AqGroup(kname=None, aqname=None, method='naive', keep_label_0=False, **kws)[source]#

Bases: HData

Group of Aquifer is mostly related to area information after multiple boreholes collected.

However when predicted ‘k’ with a missing k-values using the Mixture Learning Strategy (MXS), we intend to solve this problem by creating a Naive Group of Aquifer (NGA) to compensate the missing k-values in the dataset. This could be a good idea to avoid introducing a lot of bias since the group of aquifer is mostly tied to the permeability coefficient ‘k’. To do this, an unsupervised learning is used to predict the NGA labels then the NGA labels are used in turn to fill the missing k-values. The best strategy for operting this trick is to seek for some importances between the true k-values with their corresponding aquifer groups at each depth, and find the most representative group. Once the most representative group is found for each true label ‘k’, the group of aquifer can be renamed as the naive similarity with the true k-label. For instance if true k-value is the label 1 and label 1 is most representative with the group of aquifer ‘IV’, therefore this group can be replaced throughout the column with ‘k1’+’IV=> i.e. ‘k14’. This becomes a new label created and is used to fill the true label ‘y_true’ to become a MXS target ( include NGA label). Note that the true label with valid ‘k-value’ remained intact and unchanged. The same process is done for label 2, 3 and so on. The selection of MXS label from NGA strongly depends on its preponderance or importance rate in the whole dataset.

The following example is the demonstration to how to compute the group representativity in datasets.

Parameters:

kname (str, int) –

Name of permeability coefficient columns. kname allows to retrieve the
permeability coefficient ‘k’ in a specific dataframe. If integer is passed, it assumes the index of the dataframe fits the ‘k’ columns. Note that integer value must not be out the dataframe size along axis 1. Commonly

kname needs to be supplied when a dataframe is passed as a positional
or keyword argument.
aqname (str, optional,) –

Name of aquifer group column. aqname allows to retrieve the
aquifer group arr_aq value in a specific dataframe. Commonly

aqname needs to be supplied when a dataframe is passed as a positional
or keyword argument. Note that it is not mandatory to have a group of aquifer in the log data. It is needed only if the label similarity needs to be calculated.
g (dict,) – Dictionnary compose of occurence between the true labels and the group of aquifer as a function of occurence and repesentativity

Example

>>> from watex.methods.hydro import AqGroup
>>> hg = AqGroup (kname ='k', aqname='aquifer_group').fit(hdata )
>>> hg.findGroups ()
Out[25]:
 _Group(Label=[' 0 ',
                   Preponderance( rate = ' 100.0  %',
                                [('Groups', {'II': 1.0}),
                                 ('Representativity', ( 'II', 1.0)),
                                 ('Similarity', 'II')])],
             )

findGroups(method='naive', default_arr=None, **g_kws)[source]#

Find the existing group between the permeability coefficient k and the group of aquifer.

It computes the occurence between the true labels and the group of aquifer as a function of occurence and repesentativity.

Parameters:

keep_label_0 (bool, default=False) – The prediction already include the label 0. However, including 0 in the predicted label refers to ‘k=0’ i.e. no permeability coefficient equals to 0, which is not True in principle, because all rocks have a permeability coefficient ‘k’. Here we considered ‘k=0’ as an undefined permeability coefficient. Therefore, ‘0’ , can be exclude since, it can also considered as a missing ‘k’-value. If predicted ‘0’ is in the target it should mean a missing ‘k’-value rather than being a concrete label. Therefore, to avoid any confusion, ‘0’ is altered to ‘1’ so the value +1 is used to move forward all class labels thereby excluding the ‘0’ label. To force include 0 in the label, set keep_label_0 to True.
method (str ['naive', 'strict'], default='naive') –
The kind of strategy to compute the representativity of a label in the predicted array ‘y_pred’. It can also be ‘strict’. Indeed:
- naive computes the importance of the label by the number of its
  occurence for this specific label in the array ‘y_true’. It does not take into account of the occurence of other existing labels. This is usefull for unbalanced class labels in y_true.
- strict computes the importance of the label by the number of
  occurence in the whole valid y_true i.e. under the total of occurence of all the labels that exist in the whole ‘arra_aq’. This can give a suitable anaylse results if the data is not unbalanced for each labels in y_pred.

Returns:

g – Use attribute .groups to find the group values.

Return type:

_Group: _Group class object

class watex.methods.AqSection(aqname=None, kname=None, zname=None, **kws)[source]#

Bases: HData

Aquifer section class

Get the section of each aquifer from dataframe.

The unique section ‘upper’ and ‘lower’ is the valid range of the whole data to consider as a valid data. Indeed, the aquifer section computing is necessary to shrunk the data of the whole boreholes. Mosly the data from the section is consided the valid data as the predictor Xr. Out of the range of aquifers ection, data can be discarded or compressed to top Xr.

Parameters:

aqname (str, optional,) –

Name of aquifer group column. aqname allows to retrieve the
aquifer group arr_aq value in a specific dataframe. Commonly

aqname needs to be supplied when a dataframe is passed as a positional
or keyword argument. Note that it is not mandatory to have a group of aquifer in the log data. It is needed only if the label similarity needs to be calculated.
kname (str, int) –

Name of permeability coefficient columns. kname allows to retrieve the
permeability coefficient ‘k’ in a specific dataframe. If integer is passed, it assumes the index of the dataframe fits the ‘k’ columns. Note that integer value must not be out the dataframe size along axis 1. Commonly

kname needs to be supplied when a dataframe is passed as a positional
or keyword argument.
zname (str, int) – Name of depth columns. zname allows to retrieve the depth column in a dataframe. If integer is passed, it assumes the index of the dataframe fits the depth column. Integer value must not be out the dataframe size along axis 1. Commonly `zname`needs to be supplied when a dataframe is passed to a function argument.

findSection(z=None, depth_unit='m')[source]#

Find aquifer valid section (upper and lower section )

Parameters:: z (array-like 1d, pandas.Series) – Array of depth or a pandas series that contains the depth values. Two dimensional array or more is not allowed. However when z is given as a dataframe and zname is not supplied, an error raises since zname is used to fetch and overwritten z from the dataframe.
Returns:: self.section_ – valid upper and lower section in SI units (m) if depth values are given in meters.
Return type:: list of float

class watex.methods.DCProfiling(stations=None, dipole=10.0, auto=False, keep_params=False, read_sheets=False, **kws)[source]#

Bases: ElectricalMethods

A collection of DC-resistivity profiling classes.

It reads and compute electrical parameters. Each line compose a specific object and gather all the attributes of ResistivityProfiling for easy use. For instance, the expeced drilling location point and its resistivity value for two survey lines ( line1 and line2) can be fetched as:

>>> <object>.line1.sves_ ; <object>.line1.sves_resistivity_
>>> <object>.line2.sves_ ; <object>.line2.sves_resistivity_

Parameters:

stations (list or str (path-like object )) –
list of station name where the drilling is expected to be located. It strongly linked to the name of used to specify the center position of each dipole when the survey data is collected. Each survey can have its own way for numbering the positions, howewer if the station is given it should be one ( presumed to be the suitable point for drilling) in the survey lines. Commonly it is called the sves which mean at this point, the DC-sounding will be operated. Be sure to provide the correct station to compute the electrical parameters.

It is recommed to provide the positioning of the station expected to hold the drillings. However if stations is None, the auto-way for computing electrical features should be triggered. User can also provide the list of stations by hand. In that case, each station should numbered from 1 not 0. For instance:
- in a survey line of 20 positions. We considered the station 13
  as the best point to locate the drilling. Therefore the name of the station should be ‘S13’. In other survey line (line2) the second point of my survey is considered the suitable one to locate my drilling. Considering the two survey lines, the list of stations sould be ‘[‘S13’, ‘S2’]
- stations can also be arrange in a single to be parsed which
  refer to the string arguments.
dipole (float) – The dipole length used during the exploration area. If dipole value is set as keyword argument,i.e. the station name is overwritten and is henceforth named according to the value of the dipole. For instance for dipole equals to 10m, the first station should be S00, the second S10 , the third S20 and so on. However, it is recommend to name the station using counting numbers rather than using the dipole position.
auto (bool) – Auto dectect the best conductive zone. If True, the station position should be the station of the lower resistivity value in Electrical Resistivity Profiling.
keep_params (bool, default=False,) – If True , keeps only the predicted parameters in the summary table, otherwise, returns the usefull details of the line like geographical coordinates where the DC predicted parameters are computed.
read_sheets (bool,) – Read the data in sheets. Here its assumes the data of each survey lines are arrange in a single excell worksheets. Note that if read_sheets is set to True and the file is not in excell format, a TypError will raise.
fit_params (dict) – Additional Electrical Resistivity Profiling keywords arguments

Examples

-> Get DC -resistivity profiling from the individual Resistivity object

>>> from watex.methods import ResistivityProfiling
>>> from watex.methods import DCProfiling
>>> robj1= ResistivityProfiling(auto=True) # auto detection
>>> robj1.utm_zone = '50N'
>>> robj1.fit('data/erp/testsafedata.xlsx')
>>> robj1.sves_
... 'S036'
>>> robj2= ResistivityProfiling(auto=True, utm_zone='40S')
>>> robj2.fit('data/erp/l11_gbalo.xlsx')
>>> robj2.sves_
... 'S006'
>>> # read the both objects
>>> dcobjs = DCProfiling()
>>> dcobjs.fit([robj1, robj2])
>>> dcobjs.sves_
... array(['S036', 'S006'], dtype=object)
>>> dcobjs.line1.sves_ # => robj1.sves_
>>> dcobjs.line2.sves_ # => robj2.sves_

-> Read from a collection of excell data

>>> datapath = r'data/erp'
>>> dcobjs.read_sheets=True
>>> dcobjs.fit(datapath)
>>> dcobjs.nlines_  # getting the number of survey lines
... 9
>>> dcobjs.sves_ # stations of the best conductive zone
... array(['S017', 'S006', 'S000', 'S036', 'S036', 'S036', 'S036', 'S036',
       'S001'], dtype='<U33')
>>> dcobjs.sves_resistivities_ # the lower conductive resistivities
... array([  80,   50, 1101,  500,  500,  500,  500,  500,   93], dtype=int64)
>>> dcobjs.powers_
... array([ 50,  60,  30,  60,  60, 180, 180, 180,  40])
>>> dcobjs.sves_ # stations of the best conductive zone
... array(['S017', 'S006', 'S000', 'S036', 'S036', 'S036', 'S036', 'S036',
       'S001'], dtype='<U33')

-> Read data and all sheets, assumes all data are arranged in a sheets

>>> dcobjs.read_sheets=True
>>> dcobjs.fit(datapath)
>>> dcobjs.nlines_ # here it assumes all the data are in single worksheets.
... 4
>>> dcobjs.line4.conductive_zone_ # conductive zone of the line 4
... array([1460, 1450,  950,  500, 1300, 1630, 1400], dtype=int64)
>>> dcobjs.sfis_
>>> array([1.05085691, 0.07639077, 0.03592814, 0.07639077, 0.07639077,
       0.07639077, 0.07639077, 0.07639077, 1.08655919])
>>> dcobjs.line3.sfi_ # => robj1.sfi_
... array([0.03592814]) # for line 3

fit(*data, **fit_params)[source]#

Read and fit the collections of data

Parameters:

**data** (List of path-like obj, or ResistivityProfiling) – object. Data containing the collection of DC-resistivity values of of multiple survey areas.
**fit_params** (str,) – Additional keyword from :func:watex.utils.coreutils.parseStations`. It refers to the station_delimiter parameters. If the attribute stations is given as a path-like object. If the stations are disposed in the same line, it is convenient to provide the delimiter to parse the stations.

Return type:

object instanciated from ResistivityProfiling.

Notes

The stations should numbered from 1 not 0 and might fit the number of the survey line. Each survey line expect to hold one positionning drilling.

property inspect#: Inspect object whether is fitted or not

summary(return_table=True)[source]#

Agregate the DC-Profiling parameters to compose a param-table

Parameters:

return_table – bool, default=True returns table of DC parameters at all sites if True and ‘DCProfiling’ instanciated object otherwise.

Returns:

table if return_table is True and DCProfiling

instanciated object otherwise.

class watex.methods.DCSounding(search=45.0, rho0=None, h0=1.0, read_sheets=False, strategy='HMCMC', vesorder=None, typeofop='mean', objective='coverall', keep_params=False, **kws)[source]#

Bases: ElectricalMethods

Direct-Current Electrical Sounding

A collection of Vertical Electrical Sounding class and computed predictors paramaters accordingly.

The VES is carried out to speculate about the existence of a fracture zone and the layer thicknesses. Commonly, it comes as supplement methods to Electrical Resistivity Profiling after selecting the best conductive zone when survey is made on one-dimensional. Data from each DC-sounding site can be retrieved using:
>>> <object>.site<number>.<:attr:`~.VerticalSounding.<attr>_`
For instance to fetch the DC-sounding data position and the resistivity in depth of the fractured zone for the first site, we use:
>>> <object>.site1.fractured_zone_
>>> <object>.site1.fractured_zone_resistivity_

Parameters:

search: float , list of float

The collection of the depth in meters from which one expects to find a fracture zone outside of pollutions. Indeed, the search parameter is used to speculate about the expected groundwater in the fractured rocks under the average level of water inrush in a specific area. For instance in Bagoue region , the average depth of water inrush is around 45m.So the search can be specified via the water inrush average value.

rho0: float

Value of the starting resistivity model. If None, rho0 should be the half minumm value of the apparent resistivity collected. Units is in Ω.m not log10(Ω.m)

h0: float

Thickness in meter of the first layers in meters.If None, it should be the minimum thickess as possible 1.m .

strategy: str

Type of inversion scheme. The defaut is Hybrid Monte Carlo (HMC) known as HMCMC. Another scheme is Bayesian neural network approach (BNN).

vesorder: int

The index to retrieve the resistivity data of a specific sounding point. Sometimes the sounding data are composed of the different sounding values collected in the same survey area into different Electrical Resistivity Profiling line. For instance:

AB/2

MN/2

SE1

SE2

SE3

…

SEn

Where SE are the electrical sounding data values and n is the number of the sounding points selected. SE1, SE2 and SE3 are three points selected for Vertical Electrical Sounding i.e. 3 sounding points carried out either in the same Electrical Resistivity Profiling or somewhere else. These sounding data are the resistivity data with a specific numbers. Commonly the number are randomly chosen. It does not refer to the expected best fracture zone selected after the prior-interpretation. After transformation via the function vesSelector(), the header of the data should hold the resistivity. For instance, refering to the table above, the data should be:

AB

MN

resistivity

resistivity

resistivity

…

Therefore, the vesorder is used to select the specific resistivity values i.e. select the corresponding sounding number of the Vertical Electrical Sounding expecting to locate the drilling operations or for computation. For esample, `vesorder`=1 should figure out:

AB/2

MN/2

SE2

–>

AB

MN

resistivity

If vesorder is None and the number of sounding curves are more than one, by default the first sounding curve is selected ie rhoaIndex equals to 0

typeofop: str

Type of operation to apply to the resistivity values rhoa of the duplicated spacing points AB. The default operation is mean. Sometimes at the potential electrodes ( MN ),the measurement of AB are collected twice after modifying the distance of MN a bit. At this point, two or many resistivity values are targetted to the same distance AB (AB still remains unchangeable while while MN is changed). So the operation consists whether to the average ( mean ) resistiviy values or to take the median values or to leaveOneOut (i.e. keep one value of resistivity among the different values collected at the same point AB ) at the same spacing AB. Note that for the LeaveOneOut, the selected resistivity value is randomly chosen.

objective: str

Type operation to output. By default, the function outputs the value of pseudo-area in $$ohm.m^2$$. However, for plotting purpose by setting the argument to view, its gives an alternatively outputs of X and Y, recomputed and projected as weel as the X and Y values of the expected fractured zone. Where X is the AB dipole spacing when imaging to the depth and Y is the apparent resistivity computed.

keep_params: bool, default=False,

If True , keeps only the predicted parameters in the summary table, otherwise, returns the usefull details of the site like the depth AB/2 where the DC predicted area parameter is computed.

kws: dict

Additionnal keywords arguments from Vertical Electrical Sounding data operations. See watex.utils.exmath.vesDataOperator() for futher details.

. _Cote d’Ivoire: https://en.wikipedia.org/wiki/Ivory_Coast

fit(*data, **fit_params)[source]#

Fit the DC- electrical sounding

Fit the sounding Vertical Electrical Sounding curves and computed the ohmic-area and set all the features for demarcating fractured zone from the selected anomaly.

Parameters:

data (list of path-like object, or DataFrames) – The string argument is a path-like object. It must be a valid file wich encompasses the collected data on the field. It shoud be composed of spacing values AB and the apparent resistivity values rhoa. By convention AB is half-space data i.e AB/2. So, if data is given, params AB and rhoa should be kept to None. If AB and rhoa is expected to be inputted, user must set the data to None values for API purpose. If not an error will raise. Or the recommended way is to use the vesSelector tool in watex.utils.vesSelector() to buid the Vertical Electrical Sounding data before feeding it to the algorithm. See the example below.
fit_params (dict) – additional keywords arguments, specific to the readable files. Refer to :method:`watex.property.Config.parsers` . Use the key() to get all the readables format.

Returns:

object

Return type:

A collection of Vertical Electrical Sounding objects

property inspect#: Inspect object whether is fitted or not

summary(return_table=True)[source]#

Agregate the DC-Sounding parameters to compose a param-table

Parameters:

return_table – bool, default=True returns table of DC parameters at all sites if True and ‘DCSounding’ instanciated object otherwise.

Returns:

table if return_table is True and DCSounding instanciated

object otherwise.

class watex.methods.EM(survey_name=None, verbose=0)[source]#

Bases: IsEdi

Create EM object as a collection of EDI-file.

Collect edifiles and create an EM object. It sets the properties from audio-magnetotelluric. The two(2) components XY and YX will be set and calculated.Can read MT data instead, however the full handling transfer function like Tipper and Spectra is not completed. Use other MT softwares for a long periods data.

Parameters:: survey_name (str) – location name where the date where collected . If surveyname is None can chech on edifiles.

ediObjs_#

array of the collection of edifiles read_sucessfully

Type:: Array-like of shape (N,)

data_#

array of all edifiles feed in the EM modules whatever sucessuffuly read or not.

Type:: Array-like of shape (N, )

edinames_#

array of all edi-names sucessfully read

Type:: array-like of shape (N,)

edifiles_#

array of all edifiles if given.

Type:: array of shape (N, )

freqs_#

Array of the frequency range from EDIs

Type:: array-like of shape (N, )

refreq_#

Reference refrequency for data correction. Note the reference frequency is the highest frequency with clean data.

Type:: float,

Properties#

------------

longitude#

longitude coordinate values collected from EDIs

Type:: array-like, shape (N,)

latitude#

Latitude coordinate values collected from EDIs

Type:: array-like, shape (N, )

elevation#

Elevation coordinates collected from EDIs

Type:: array-like, shape (N,)

property elevation#

exportedis(ediObj, new_Z, savepath=None, **kws)[source]#

Export new EDI files from the former object with a given new impedance tensors.

The export is assumed a new output EDI resulting from multiples corrections applications.

Parameters:

ediObj (str or watex.edi.Edi) – Full path to Edi file/object or object from pycsamt or `MTpy`_
new_Z (ndarray (nfreq, 2, 2)) – Ndarray of impendance tensors Z. The tensor Z is 3D array composed of number of frequency nfreq`and four components (``xx`, xy, yx, and yy) in 2X2 matrices. The tensor Z is a complex number.

Return type:

ediObj from pycsamt.core.edi.Edi

fit(data)[source]#

Assert and make EM object from a collection EDIs.

Parameters:: data (str, or list or pycsamt.core.edi.Edi object) – Full path to EDI files or collection of EDI-objects
Returns:: self
Return type:: EM object from a collection EDIs

Examples

>>> from watex.methods.em import EM
>>> emObjs = EM().fit (r'data/edis')
>>> emObjs.ediObjs_
...

getfullfrequency(to_log10=False)[source]#

Get the frequency with clean data.

The full or plain frequency is array frequency with no missing data during the data collection. Note that when using Natural Source Audio-Magnetotellurics, some data are missing due to the weak of missing frequency at certain band especially in the attenuation band.

Parameters:: to_log10 (bool, default=False,) – export frequency to base 10 logarithm
Returns:: f – frequency with clean data. Out of attenuation band if survey is completed with Natural Source Audio-Magnetotellurics.
Return type:: Arraylike 1d of shape(N, )

See also

watex.utils.exmath.get_full_frequency: Get the complete frequency with no missing signals.

Example

>>> import watex as wx
>>> edi_sample = wx.fetch_data ('edis', return_data=True, samples = 12 )
>>> wx.EM().fit(edi_sample).getfullfrequency(to_log10 =True )
array([4.76937733, 4.71707639, 4.66477553, 4.61247466, 4.56017382,
       4.50787287, 4.45557204, 4.40327104, 4.35097021, 4.29866928,
       4.24636832, 4.19406761, 4.14176668, 4.08946565, 4.03716465,
       ...
       2.67734228, 2.62504479, 2.57274385, 2.52044423, 2.46814047,
       2.41584107, 2.36353677, 2.31124512, 2.25892448, 2.20663701,
       2.15433266, 2.10202186, 2.04972182, 1.99743007])

getreferencefrequency(to_log10=False)[source]#

Get the reference frequency from collection Edis objects.

The highest frequency with clean data should be selected as the reference frequency

Parameters:

data (list of pycsamt.core.edi.Edi or mtpy.core.edi.Edi objects) – Collections of EDI-objects from pycsamt
to_log10 (bool,) – outputs the reference frequency into base 10 logarithm in Hz.

Returns:

rf – the reference frequency at the clean data in Hz

Return type:

float

Examples

>>> from watex.methods.em import EM
>>> edipath ='data/3edis'
>>> ref = EM().getreferencefrequency(edipath, to_log10=True)
>>> ref
... 4.845098040014257 # in Hz

References

http://www.zonge.com/legacy/PDF_DatPro/Astatic.pdf

property inspect#: Inspect object whether is fitted or not

is_valid(obj)[source]#

Assert that the given argument is an EDI -object from modules EDI or EDI from pycsamt and MTpy packages. A TypeError will occurs otherwise.

Parameters:: obj (str, pycsamt.core.edi.Edi or mtpy.core.edi.Edi) – Full path EDI file or pycsamt or `MTpy`_ objects.
Returns:: obj – Identical object after asserting.
Return type:: str, pycsamt.core.edi.Edi or mtpy.core.edi.Edi

property latitude#

property longitude#

make2d(out='resxy', *, kind='complex', **kws)[source]#

Out 2D resistivity, phase-error and tensor matrix from a collection of EDI-objects.

Matrix depends of the number of frequency times number of sites. The function asserts whether all data from all frequencies are available. The missing values should be filled by NaN.

Parameters:

data (Path-like object or list of pycsamt.core.edi objects) – Collections of EDI-objects from pycsamt or full path to EDI files.
out (str) – kind of data to output. Be sure to provide the component to retrieve the attribute from the collection object. Except the error and frequency attribute, the missing component to the attribute will raise an error. for instance resxy for xy component. Default is resxy.
kind (bool or str) – focuses on the tensor output. Note that the tensor is a complex number of ndarray (nfreq, 2,2 ). If set to``modulus`, the modulus of the complex tensor should be outputted. If real or``imag``, it returns only the specific one. Default is complex.
kws (dict) – Additional keywords arguments from :func:`~.getfullfrequency `.

Returns:

mat2d – the matrix of number of frequency and number of Edi-collectes which correspond to the number of the stations/sites.

Return type:

np.ndarray(nfreq, nstations)

Examples

>>> from watex.methods.em import EM
>>> edipath ='data/edis'
>>> emObjs= EM().fit(edipath)
>>> phyx = EM().make2d ('phaseyx')
>>> phyx
... array([[ 26.42546593,  32.71066454,  30.9222746 ],
       [ 44.25990541,  40.77911136,  41.0339148 ],
       ...
       [ 37.66594686,  33.03375863,  35.75420802],
       [         nan,          nan,  44.04498791]])
>>> phyx.shape
... (55, 3)
>>> # get the real number of the yy componet of tensor z
>>> zyy_r = make2d (ediObjs, 'zyx', kind ='real')
... array([[ 4165.6   ,  8665.64  ,  5285.47  ],
       [ 7072.81  , 11663.1   ,  6900.33  ],
       ...
       [   90.7099,   119.505 ,   122.343 ],
       [       nan,        nan,    88.0624]])
>>> # get the resistivity error of component 'xy'
>>> resxy_err = EM.make2d ('resxy_err')
>>> resxy_err
... array([[0.01329037, 0.02942557, 0.0176034 ],
       [0.0335909 , 0.05238863, 0.03111475],
       ...
       [3.33359942, 4.14684926, 4.38562271],
       [       nan,        nan, 4.35605603]])
>>> phyx.shape ,zyy_r.shape, resxy_err.shape
... ((55, 3), (55, 3), (55, 3))

rewrite(*, by='name', prefix=None, dataid=None, savepath=None, how='py', correct_ll=True, make_coords=False, reflong=None, reflat=None, step='1km', edi_prefix=None, export=True, **kws)[source]#

Rewrite Edis, correct station coordinates and dipole length.

Can rename the dataid, customize sites and correct the positioning latitudes and longitudes.

Parameters:

dataid (list) – list of ids to rename the existing EDI-dataid from Head.dataid. If given, it should match the length of the collections of ediObjs. A ValueError will occurs if the length of ids provided is out of the range of the number of EDis objects
by (str) – Rename according to the inner module Id. Can be name, id, number. Default is name. If survey_name is given, the whole survey name should be overwritten. Conversly, the argument ix outputs the number of formating stations excluding the survey name.
prefix (str) – Prefix the number of the site. It could be the abbreviation of the survey area.
correct_ll (bool,) – Write the scaled positions( longitude and latitude). Default is True.
make_coords (bool) – Useful to hide the real coordinates of the sites by generating a ‘fake’ coordinates for a specific purposes. When setting to True be sure to provide the reflong and reflat values otherwise and error will occurs.
reflong (float or string) – Reference longitude in degree decimal or in DD:MM:SS for the site considered as the origin of the lamdmark.
reflat (float or string) – Reference latitude in degree decimal or in DD:MM:SS for the reference site considered as the landmark origin.
step (float or str) – Offset or the distance of seperation between different sites in meters. If the value is given as string type, except the km, it should be considered as a m value. Only meters and kilometers are accepables. Default value of seperation between the site is 1km.
savepath (str) – Full path of the save directory. If not given, EDIs should be outputed in the created directory.
how (str) – The way to index the stations. Default is the Python indexing i.e. the counting starts by 0. Any other value will start counting the site from 1.
export (bool,) – Export new edi-files
kws (dict) – Additionnal keyword arguments from ~Edi.write_edifile and watex.utils.coreutils.makeCoords().

Returns:

EM – Returns self for easy method chaining.

Return type:

EM instance

Examples

>>> from watex.methods.em import EM
>>> edipath = r'data/edis'
>>> savepath =  r'/Users/Daniel/Desktop/ediout'
>>> emObjs = EM().fit(edipath)
>>> emObjs.rewrite_edis(by='id', edi_prefix ='b1',
                        savepath =savepath)
>>> #
>>> # second example to write 7 samples of edi from
>>> # Edi objects inner datasets
>>> #
>>> import watex as wx
>>> edi_sample = wx.fetch_data ('edis', key ='edi',
                                samples =7, return_data =True )
>>> emobj = wx.EM ().fit(edi_sample)
>>> emobj.rewrite(by='station', prefix='PS')

property stnames#

class watex.methods.ERP(erp_fn=None, dipole_length=None, auto=False, posMinMax=None, **kwargs)[source]#

Bases: object

Electrical resistivity profiling class . Define anomalies and compute its features. Can select multiples anomalies on ERP and give their features values.

Parameters:

erp_fn (*) – Path to electrical resistivity profile
dipole_length (*) – Measurement electrodes. Distance between two electrodes in meters.
auto (*) – Trigger the automatic computation . If the auto is set to True, dont need to provide the posMinMax argument otherwise posMinMax must be given.
posMinMax (*) – Selected anomaly boundary. The boundaries matches the startpoint as the begining of anomaly position and the endpoint as the end of anomaly position. If provided , auto will be turn off at False even True.

Notes

Provide the posMinMax is strongly recommended for accurate geo-electrical features computation. If not given, the best anomaly will be selected automatically and probably could not match what you expect.

Hold others informations:

Attributes	Type	Description
lat	float	sation latitude
lon	float	station longitude
elev	float	station elevantion in m or ft
east	float	station easting coordinate (m)
north	float	station northing coordinate (m)
azim	float	station azimuth in meter (m)
utm_zone	str	UTM location zone
resistivity	dict	resistivity value at each station (ohm.m)
name	str	survey location name
turn_on	bool	turn on/off the displaying computa- tion parameters.
best_point	float/int	position of the selected anomaly
best_rhoa	float	selected anomaly app.resistivity
display_autoinfos	bool	display the selected three best anomaly points selected automatic- cally.

To get the geo-electrical-features, create an erp object by calling:

>>> from watex.methods.erp import ERP
>>> anomaly_obj =ERP(erp_fn = '~/location_filename')

The call of the following erp properties attributes:

properties	Type	Description
select_best_point_	float	Best anomaly position points
select_best_value_	float	Best anomaly app.resistivity value.
best_points	float	Best positions points selected automatically.
best_sfi	float	Best anomaly standart fracturation index value.
best_anr	float	Best
best_power	float	Best anomaly power in meter(m).
best_magnitude	float	Best anomlay magnitude in ohm.m
best_shape	str	Best anomaly shape. can be `V`, `W`,``K``, `H`, `C`, `M`.
best_type	str	Best anomaly type. Can be : - `EC` for Extensive conductive. - `NC` for narrow conductive. - `CP` for conductive plane. - `CB2P` for contact between two planes.

Examples

>>> from watex.methods.erp import ERP
>>> anom_obj= ERP(erp_fn = 'data/l10_gbalo.xlsx', auto=False,
...                  posMinMax= (90, 130),turn_off=True)
>>> anom_obj.name
... l10_gbalo
>>> anom_obj.select_best_point_
...110
>>> anom_obj.select_best_value_
...132
>>> anom_obj.best_magnitude
...5
>>> nom_obj.best_power
..40
>>> anom_obj.best_sfi
...1.9394488747363936
>>> anom_obj.best_anr
...0.5076113145430543

property best_anr#: Get the select best anomaly ratio abest_anr along the ERP

property best_east#: Get the easting coordinates of selected anomaly

property best_index#: Keeop the index of selected best anomaly

property best_lat#: Get the latitude coordinates of selected anomaly

property best_lon#: Get the longitude coordinates of selected anomaly

property best_magnitude#: Get the magnitude of the select select_best_point.

property best_north#: Get the northing coordinates of selected anomaly

property best_points#: Get the best points from auto computation

property best_power#: Get the power from the select select_best_point.

property best_rhoaRange#: Collect the resistivity values range from selected anomaly boundaries.

property best_sfi#: Get the standard fraturation index from select_best_point_

property best_shape#: Find the selected anomaly shape

property best_type#: Get the select best anomaly type

dataType = {'.csv': <function read_csv>, '.html': <function read_json>, '.json': <function read_json>, '.sql': <function read_sql>, '.xlsx': <function read_excel>}#

property dipoleLength#: Get the dipole length i.e the distance between two measurement.

erpLabels = ['pk', 'east', 'north', 'rhoa']#

property fn#: erp file type

property posi_max#

select_best_point_ boundaries using the station locations of unarbitrary positions got from :attr:`~.ERP.dipoleLength.

Type:: Get the right position of
Type:: attr

property posi_min#

select_best_point_ boundaries using the station locations of unarbitrary positions got from :attr:`~.ERP.dipoleLength.

Type:: Get the left position of
Type:: attr

property rhoa_max#

select_best_point_ boundaries using the magnitude got from :attr:`~.ERP.abest_magnitude.

Type:: Get the top position of
Type:: attr

property rhoa_min#

select_best_point_ boundaries using the magnitude got from :attr:`~.ERP.abest_magnitude.

Type:: Get the buttom position of
Type:: attr

sanitize_columns()[source]#: Get the columns of electrical resistivity profiling dataframe and set new names according to ERP.erpLabels .

property select_best_point_#: Select the best anomaly points.

property select_best_value_#: Select the best anomaly points.

class watex.methods.ERPCollection(listOferpfn=None, listOfposMinMax=None, erpObjs=None, **kws)[source]#

Bases: object

Collection objects. The class collects all erp survey lines. Each erp is an singleton class object with their corresponding attributes. The goal is to build a container geao-elecricals to straigthforwardly given to watex.bases.features.GeoFeatures class.

Parameters:

listOferpfn (list, ndarray) – list of different erp files.
listOfposMinMax (list) –
collection of different selected anomaly boundaries. If not provided, the auto will triggered. It’s recommanded to provided for all erp your convenient anomaly boundaries like:
```
listOfposMinMax=[(90, 130), (10, 70), ...]
```
where (90,130)``is boundaries of selected anomaly on the first `erp` line and ``(10,70) is the boundaries of the second erp survey line and so on.
erpObjs (list, ndarray) – Collection of objects from ERP. If objects are alread created. Gather them on a list and pass though the argument erpObjs.
arguments. (Holds others optionals infos passed as keyword) –
=================================== (====================== =============) –
Description (Attributes Type) –
=================================== –
User (list_of_dipole_lengths list Collection of dipoleLength.) – can provide the distance between sites measurements as performed on investigations site. If given, the automaticall dipoleLength computation will be turned off.
name. (fnames array_like Array of erp survey lines) – If each survey name is the location name then will keep it.
numbers (id array_like Each erp obj reference) –
geo-electrical (erps_data nd.array Array composed of) – parameters. ndarray(nerp, 8) where num is the number of `erp`obj collected.
line (erpdf pd.DataFrame A dataFrame of collected erp) – and the number of lines correspond to the number of collected erp.
=================================== –
of (It's posible to get from each erp collection the singular array) –

properties:: (different parameters considered as) –

>>> from watex.methods.erp import ERP_collection as ERPC
>>> erpcol = ERPC(listOferpfn='list|path|filename')
>>> erpcol.survey_ids
>>> erpcol.selectedPoints

:param List of the ERP_collection attribute properties: :param ==================== ============== ===================================: :param Properties Type Description: :param ==================== ============== ===================================: :param selectedPoints array_like Collection of Best anomaly: position points. :param survey_ids array_like Collection of all erp survey: survey ids. Note that each ids is

following by the prefix e.

Parameters:

standard (sfis array_like Collection of best anomaly) – fracturation index value.
power (powers array_like Collection of best anomaly) –
anomaly (magnitudes array_like Colection of best) – magnitude in ohm.m.
shape. (shapes array_like Collection of best anomaly) – For more details please refer to ERP.
type. (types array_like Collection of best anomaly) – Refer to ERP for more details.
=================================== (==================== ==============) –

Examples

>>> from watex.methods.erp import ERP_collection
>>> erpObjs =ERP_collection(listOferpfn= 'data/erp')
>>> erpObjs.erpdf
>>> erpObjs.survey_ids
... ['e2059734331848' 'e2059734099144' 'e2059734345608']

property easts#: Collect the utm_easting value from erp survey line.

erpColums = ['id', 'east', 'north', 'power', 'magnitude', 'shape', 'type', 'sfi']#

exportErp(extension_file=None, savepath=None, **kwargs)[source]#

Export erp data after computing different geo_electrical features.

Parameters:

extension_file (str) – Extension type to export the files. Can be xlsx or csv. The default extension_file is csv.
savepath (str) – Path like string to save the output file.

get_property_infos(attr_name, objslist=None)[source]#

From each obj erp ,get the attribute infos and set on data array

Parameters:

attr_name – Name of attribute to get the informations of the properties.
objslist (list) – list of collection objects.

Example:

>>> from watex.methods.erp.ERP_collection as ERPcol
>>> erpObjs =ERPcol(listOferpfn= 'data/erp',
...                export_erpFeatures=True,
...                    filename='ykroS')

property magnitudes#: Get the magnitudes of select anomaly from erp

property norths#: Collect the utm_northing value from erp survey line.

property powers#: Get the power of select anomaly from erp

property selectedPoints#: Keep on array the best selected anomaly points

property sfis#: Collect sfi for selected anomaly points

property shapes#: Get the shape of the selected anomaly.

property survey_ids#: Get the erp filenames

property types#: Collect selected anomalies types from erp.

class watex.methods.Hydrogeology(**kwd)[source]#

Bases: ABC

A branch of geology concerned with the occurrence, use, and functions of surface water and groundwater.

Hydrogeology is the study of groundwater – it is sometimes referred to as geohydrology or groundwater hydrology. Hydrogeology deals with how water gets into the ground (recharge), how it flows in the subsurface (through aquifers) and how groundwater interacts with the surrounding soil and rock (the geology).

Indeed, hydrogeologists apply this knowledge to many practical uses. They might:

Design and construct water wells for drinking water supply, irrigation
schemes and other purposes;
Try to discover how much water is available to sustain water supplies
so that these do not adversely affect the environment – for example, by depleting natural baseflows to rivers and important wetland ecosystems;
Investigate the quality of the water to ensure that it is fit for its
intended use;
Where the groundwater is polluted, they design schemes to try and
clean up this pollution; Design construction dewatering schemes and deal with groundwater problems associated with mining; Help to harness geothermal energy through groundwater-based heat pumps.

class watex.methods.Logging(zname=None, kname=None, verbose=0)[source]#

Bases: object

Logging class

Only deal with numerical values. If categorical values are find in the logging dataset, they should be discarded.

Parameters:

zname (str, default='depth' or 'None') – The name of the depth column in data. If the name ‘depth’ is not specified as the main depth columns, an other name in the columns that matches the depth can also be indicated so the function will put aside this columm as depth column for plot purpose. If set to None, zname holds the name depth and assumes that depth exists in data columns.
kname (str, int) –

Name of permeability coefficient columns. kname allows to retrieve the
permeability coefficient ‘k’ in a specific dataframe. If integer is passed, it assumes the index of the dataframe fits the ‘k’ columns. Note that integer value must not be out the dataframe size along axis 1. Commonly

kname needs to be supplied when a dataframe is passed as a positional
or keyword argument.

Examples

>>> from watex.datasets import load_hlogs
>>> from watex.methods.hydro import Logging
>>> # get the logging data
>>> h = load_hlogs ()
>>> h.feature_names
Out[29]:
['hole_id',
 'depth_top',
 'depth_bottom',
 'strata_name',
 'rock_name',
 'layer_thickness',
 'resistivity',
 'gamma_gamma',
 'natural_gamma',
 'sp',
 'short_distance_gamma',
 'well_diameter']
>>> # we can fit to collect the valid logging data
>>> log= Logging(kname ='k', zname='depth_top' ).fit(h.frame[h.feature_names])
>>> log.feature_names_in_ # categorical features should be discarded.
Out[33]:
['depth_top',
 'depth_bottom',
 'layer_thickness',
 'resistivity',
 'gamma_gamma',
 'natural_gamma',
 'sp',
 'short_distance_gamma',
 'well_diameter']
>>> log.plot ()
Out[34]: Logging(zname= depth_top, kname= k, verbose= 0)
>>> # plot log including the target y
>>> log.plot (y = h.frame.k , posiy =0 )# first position
Logging(zname= depth_top, kname= k, verbose= 0)

fit(data, **fit_params)[source]#

Fit logging data and populate attributes

Parameters:

data (Dataframe of shape (n_samples, n_features)) – where n_samples is the number of data, expected to be the data collected at different depths and n_features is the number of columns (features) that supposed to be plot. Note that X must include the depth columns. If not given a relative depth should be created according to the number of samples that composes data.
fit_params (dict,) – Additional keyword arguments passed to to_numeric_dtypes().

Returns:

self

Return type:

object instanciated for chaining methods.

property inspect#: Inspect object whether is fitted or not

plot(normalize=False, impute_nan=True, log10=False, posiy=None, fill_value=None, **plot_kws)[source]#

Plot the logging data

Parameters:

normalize (bool, default = False) – Normalize all the data to be range between (0, 1) except the depth,
impute_nan (bool, default=True,) – Replace the NaN values in the dataframe. Note that the default behaviour for replacing NaN is the mean. However if the argument of fill_value is provided,the latter should be used to replace ‘NaN’ in X.
log10 (bool, default=False) – Convert values to log10. This can be usefull when using the logarithm data. However, it seems not all the data can be used this operation, for instance, a negative data. In that case, column_to_skip argument is usefull to provide so to skip that columns when converting values to log10.
fill_value (str or numerical value, optional) – When strategy == “constant”, fill_value is used to replace all occurrences of missing_values. If left to the default, fill_value will be 0 when imputing numerical data and “missing_value” for strings or object data types. If not given and impute_nan is True, the mean strategy is used instead.
posiy (int, optional) – the position to place the target plot y . By default the target plot if given is located at the last position behind the logging plots.

class watex.methods.MXS(kname=None, aqname=None, threshold=None, method='naive', trailer='*', keep_label_0=False, random_state=42, n_groups=3, sep=None, prefix=None, **kws)[source]#

Bases: HData

Mixture Learning Strategy (MXS)

The use of machine learning for k-parameter prediction seems an alternative way to reduce the cost of data collection thereby saving money. However, the borehole data comes with a lot of missing k since the parameter is strongly tied to the aquifer after the pumping test. In other words, the k-parameter collection is feasible if the layer in the well is an aquifer. Unfortunately, predicting some samples of k in a large set of missing data remains an issue using the classical supervised learning methods. We, therefore propose an alternative approach called a mixture learning strategy (MXS) to solve these double issues. It entails predicting upstream a naïve group of aquifers (NGA) combined with the real values k to counterbalance the missing values and yield an optimal prediction score. The method, first, implies the K-Means and Hierarchical Agglomerative Clustering (HAC) algorithms. K-Means and HAC are used for NGA label predicting necessary the MXS label merging.

Parameters:

kname (str, int) –

Name of permeability coefficient columns. kname allows to retrieve the
permeability coefficient ‘k’ in a specific dataframe. If integer is passed, it assumes the index of the dataframe fits the ‘k’ columns. Note that integer value must not be out the dataframe size along axis 1. Commonly

kname needs to be supplied when a dataframe is passed as a positional
or keyword argument.
aqname (str, optional,) –

Name of aquifer group column. aqname allows to retrieve the
aquifer group arr_aq value in a specific dataframe. Commonly

aqname needs to be supplied when a dataframe is passed as a positional
or keyword argument. Note that it is not mandatory to have a group of aquifer in the log data. It is needed only if the label similarity needs to be calculated.
threshold (float, default=None) – The threshold from which, label in ‘k’ array can be considered similar than the one in NGA labels ‘y_pred’. The default is ‘None’ which means none rule is considered and the high preponderence or occurence in the data compared to other labels is considered as the most representative and similar. Setting the rule instead by fixing the threshold is recommended especially in a huge dataset.
n_groups (int, default=3) – The number of aquifer n_groups to form as well as the number of centroids to generate. If a idea about the number of aquifer group in the areas, it should be used instead. Hiwever, it is recommended to validate this number using the ‘elbow plot’ or the ‘silhouette plot’ or the Hierachical Agglomerative Clustering dendrogram. Refer to plot_elbow() or plotSilhouette() or :func:~.watex.view.plotDendrogram` for plotting purpose.
keep_label_0 (bool, default=False) –
The prediction already include the label 0. However, including 0 in
the predicted label refers to ‘k=0’ i.e. no permeability coefficient equals to 0, which is not True in principle, because all rocks have a permeability coefficient ‘k’. Here we considered ‘k=0’ as an undefined permeability coefficient. Therefore, ‘0’ , can be exclude since, it can also considered as a missing ‘k’-value. If predicted ‘0’ is in the target it should mean a missing ‘k’-value rather than being a concrete label. Therefore, to avoid any confusion, ‘0’ is altered to ‘1’ so the value +1 is used to move forward all class labels thereby excluding the ‘0’ label. To force include 0 in the label, set keep_label_0 to True.

sep: str, default’’
Separator between the true labels ‘y_true’ and predicted NGA labels. Sep is used to rewrite the MXS labels. Mostly the MXS labels is a combinaison with the true label of permeability coefficient ‘k’ and the label of NGA to compose new similarity labels. For instance
>>> true_labels=['k1', 'k2', 'k3'] ; NGA_labels =['II', 'I', 'UV'] >>> # gives >>> MXS_labels= ['k1_II', 'k2_I', 'k3_UV']
where the seperator sep is set to _. This happens especially when one of the label (NGA or true_labels) is not a numeric datatype and a similariy is found between ‘k1’ and ‘II’, ‘k2’ and ‘I’ and so on.
prefix: str, default=’’
prefix is used to rename the true_labels i.e the true valid-k. For instance:
>>> k_valid =[1, 2, ..] -> k_new = [k1, k2, ...]
where ‘k’ is the prefix.
method: str [‘naive’, ‘strict’], default=’naive’
The kind of strategy to compute the representativity of a label in the predicted array ‘y_pred’. It can also be ‘strict’. Indeed:
- naive computes the importance of the label by the number of its
  occurence for this specific label in the array ‘y_true’. It does not take into account of the occurence of other existing labels. This is usefull for unbalanced class labels in y_true.
- strict computes the importance of the label by the number of
  occurence in the whole valid y_true i.e. under the total of occurence of all the labels that exist in the whole ‘arra_aq’. This can give a suitable anaylse results if the data is not unbalanced for each labels in y_pred.
trailer: str, default=’*’
The Mixture strategy marker to differentiate the existing class label in ‘y_true’ with the predicted labels ‘y_pred’ especially when the the same class labels are also present the true label with the same label-identifier name. This usefull to avoid any confusion for both labels in y_true and y_pred for better demarcation and distinction. Note that if the trailer`is set to ``None` and both y_true and y_pred are numeric data, the labels in y_pred are systematically renamed to be distinct with the ones in the ‘y_true’. For instance
>>> true_labels=[1, 2, 3] ; NGA_labels =[0, 1, 2] >>> # with trailer , MXS labels should be >>> MXS_labels= ['0', '1*', '2*', '3'] # 1 and 2 are in true_labels >>> # with no trailer >>> MXS_labels= [0, 4, 5, 3] # 1 and 2 have been changed to [4, 5]
verbose (int, default is 0) – Control the level of verbosity. Higher value lead to more messages.

Examples

>>> from watex.datasets import load_hlogs
>>> from watex.methods.hydro import MXS
>>> hdata= load_hlogs (as_frame =True)
>>> # drop the 'remark' columns since there is no valid data
>>> hdata.drop (columns ='remark', inplace =True)
>>> mxs = MXS (kname ='k').fit(hdata)
>>> # predict the default NGA
>>> mxs.predictNGA() # default prediction with n_groups =3
>>> # make MXS labels using the default 'k' categorization
>>> ymxs=mxs.makeyMXS(categorize_k=True, default_func=True)
>>> mxs.yNGA_ [62:74]
Out[43]: array([1, 2, 2, 2, 3, 1, 2, 1, 2, 2, 1, 2])
>>> ymxs[62:74]
Out[44]: array([ 1, 22, 22, 22,  3,  1, 22,  1, 22, 22,  1, 22])
>>> # to get the label similariry , need to provide the
>>> # the column name of aquifer group and fit again like
>>> mxs = MXS (kname ='k', aqname ='aquifer_group').fit(hdata)
>>> sim = mxs.labelSimilarity()
>>> sim
Out[47]: [(0, 'II')] # group II and label 0 are very similar

aqname = 'aquifer_group'#

kname = 'k'#

labelSimilarity(func=None, categorize_k=False, default_func=False, **sm_kws)[source]#

Find label similarities

Parameters:

func (callable) – Function to specifically map the permeability coefficient column in the dataframe of serie. If not given, the default function can be enabled instead from param default_func.
string (bool,) – If set to “True”, categorized map from ‘k’ should be prefixed by “k”. However is string value is given , the prefix is changed according to this label.
default_ufunc (bool,) –
Default function for mapping k is setting to True. Note that, this could probably not fitted your own data. So it is recommended to provide your own function for mapping ‘k’. However the default ‘k’ mapping is given as follow:
- k0 {0}: k = 0
- k1 {1}: 0 < k <= .01
- k2 {2}: .01 < k <= .07
- k3 {3}: k> .07
sm_kws (dict,) – Additional keyword arguments passed to find_similar_labels().

makeyMXS(y_pred=None, func=None, categorize_k=False, default_func=False, **mxs_kws)[source]#

Construct the MXS target $y*$

Parameters:

y_pred (Array-like 1d, pandas.Series) –
Array composing the valid NGA labels. Note that NGA labels is a predicted labels mostly using the unsupervising learning.

seealso:

predict_NGA_labels() for further details.
func (callable) – Function to specifically map the permeability coefficient column in the dataframe of serie. If not given, the default function can be enabled instead from param default_func.
string (bool,) – If set to “True”, categorized map from ‘k’ should be prefixed by “k”. However is string value is given , the prefix is changed according to this label.
default_ufunc (bool,) –
Default function for mapping k is setting to True. Note that, this
could probably not fitted your own data. So it is recommended to provide your own function for mapping ‘k’. However the default ‘k’ mapping is given as follow:
- k0 {0}: k = 0
- k1 {1}: 0 < k <= .01
- k2 {2}: .01 < k <= .07
- k3 {3}: k> .07
mxs_kws:dict,
Additional keyword arguments passed to make_MXS_labels().

Returns:

MXS.mxs_labels_ – array like of MXS labels

Return type:

array-like 1d `

Example

>>> from watex.datasets import load_hlogs
>>> from watex.methods.hydro import MXS
>>> hdata = load_hlogs ().frame
>>> # drop the 'remark' columns since there is no valid data
>>> hdata.drop (columns ='remark', inplace=True)
>>> mxs =MXS (kname ='k').fit(hdata) # specify the 'k'columns
>>> # we can predict the NGA labels and yMXS with single line
>>> # of code snippet using the default 'k' classification.
>>> ymxs = mxs.predictNGA().makeyMXS(categorize_k=True, default_func=True)
>>> mxs.yNGA_[:7]
... array([2, 2, 2, 2, 2, 2, 2])
>>> ymxs[:7]
Out[40]: array([22, 22, 22, 22, 22, 22, 22])
>>> mxs.mxs_group_classes_
Out[56]: {1: 1, 2: 22, 3: 3} # transform classes
>>> mxs.mxs_group_labels_
Out[57]: (2,)
>>> # **comment:
    # # only the label '2' is tranformed to '22' since
    # it is the only one that has similariry with the true label 2

predictNGA(n_components=2, return_label=False, **NGA_kws)[source]#

Predicts Naive Group of Aquifer from Hydro-Log data.

Parameters:

n_components (int, default=2) – Number of dimension to preserve. If`n_components` is ranged between float 0. to 1., it indicates the number of variance ratio to preserve. If None as default value the number of variance to preserve is 95%.
return_label (bool,default=False) – If True, return the NGA label predicted, otherwise return MXS instanciated object. if False, NGA label can be fetch using the attribute watex.hydro.MXS.yNGA_
NGA_kws (dict,) – keyword argument passed to watex.utils.predict_NGA_labels()

Returns:

yNGA_ or self – MXS instanciated object.

Return type:

arraylike-1d of naive group of aquifer or

Example

>>> from watex.datasets import load_hlogs
>>> from watex.methods.hydro import MXS
>>> hdata = load_hlogs ().frame
>>> # drop the 'remark' columns since there is no valid data
>>> hdata.drop (columns ='remark', inplace=True)
>>> mxs =MXS (kname ='k').fit(hdata) # specify the 'k' column
>>> y_pred = mxs.predictNGA(return_label=True )
>>> y_pred [-12:]
Out[52]: array([1, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3])

sname = None#

verbose = 0#

zname = None#

class watex.methods.Processing(window_size=5, component='xy', mode='same', method='slinear', out='srho', c=2, **kws)[source]#

Bases: EM

Base processing of EM object

Fast process EMAP and AMT data. Tools are used for data sanitizing, removing noises and filtering.

Parameters:

data (Path-like object or list of pycsamt.core.edi.Edi objects) – Collections of EDI-objects from pycsamt
freqs (array-like, shape (N)) – Frequency array. It should be the complete frequency used during the survey area. It can be get using the :func:`getfullfrequency ` No need if ediObjs is provided.
window_size (int) – the length of the window. Must be greater than 1 and preferably an odd integer number. Default is 5
component (str) – field tensors direction. It can be xx, xy,``yx``, yy. If arr2d` is provided, no need to give an argument. It become useful when a collection of EDI-objects is provided. If don’t specify, the resistivity and phase value at component xy should be fetched for correction by default. Change the component value to get the appropriate data for correction. Default is xy.
mode (str) – mode of the border trimming. Should be ‘valid’ or ‘same’.’valid’ is used for regular trimimg whereas the ‘same’ is used for appending the first and last value of resistivity. Any other argument except ‘valid’ should be considered as ‘same’ argument. Default is same.
method (str, default slinear) – Interpolation technique to use. Can be nearest``or ``pad. Refer to the documentation of ~.interpolate2d.
out (str) – Value to export. Can be sfactor, tensor for corrections factor and impedance tensor. Any other values will export the static corrected resistivity.
c (int,) – A window-width expansion factor that must be input to the filter adaptation process to control the roll-off characteristics of the applied Hanning window. It is recommended to select c between 1 and 4. Default is 2.

Examples

>>> import matplotlib.pyplot as plt
>>> from watex.methods.em import Processing
>>> edipath = 'data/edis'
>>> p = Processing().fit(edipath)
>>> p.window_size =2
>>> p.component ='yx'
>>> rc= p.tma()
>>> # get the resistivy value of the third frequency  at all stations
>>> p.res2d_[3, :]
... array([ 447.05423001, 1016.54352954, 1415.90992189,  536.54293994,
       1307.84456036,   65.44806698,   86.66817791,  241.76592273,
       ...
        248.29077039,  247.71452712,   17.03888414])
>>>  # get the resistivity value corrected at the third frequency
>>> rc [3, :]
... array([ 447.05423001,  763.92416768,  929.33837349,  881.49992091,
        404.93382163,  190.58264151,  160.71917654,  163.30034875,
        394.2727092 ,  679.71542811,  953.2796567 , 1212.42883944,
        ...
        164.58282866,   96.60082159,   17.03888414])
>>> plt.semilogy (np.arange (p.res2d_.shape[1] ), p.res2d_[3, :], '--',
                  np.arange (p.res2d_.shape[1] ), rc[3, :], 'ok--')

References

[1]

http://www.zonge.com/legacy/PDF_DatPro/Astatic.pdf

ama()[source]#

Use an adaptive-moving-average filter to estimate average apparent resistivities at a single static-correction-reference frequency..

The AMA filter estimates static-corrected apparent resistivities at a single reference frequency by calculating a profile of average impedances along the length of the line. Sounding curves are then shifted so that they intersect the averaged profile.

Parameters:: data (path-like object or list of pycsamt.core.edi.Edi) – Collections of EDI-objects from pycsamt
Returns:: rc or z – EMAP apparent resistivity static shift corrected or static correction tensor
Return type:: np.ndarray, shape (N, M)

References

[1]

http://www.zonge.com/legacy/PDF_DatPro/Astatic.pdf

[2]

Torres-Verdin and Bostick, 1992, Principles of spatial surface electric field filtering in magnetotellurics: electromagnetic array profiling (EMAP), Geophysics, v57, p603-622.https://doi.org/10.1190/1.2400625

static controlFrequencyBuffer(freq, buffer=None)[source]#

Assert buffer and find the nearest value if the value of the buffer is not in frequency ranges .

Parameters:

freq – array-like of frequencies
buffer – list of maximum and minimum frequency. It should contains only two values. If None, the max and min frequencies are selected

Returns:

Buffer frequency range

Example:

>>> import numpy as np
>>> from watex.methods.em import Processing
>>> freq_ = np.linspace(7e7, 1e0, 20) # 20 frequencies as reference
>>> buffer = Processing.controlFrequencyBuffer(freq_, buffer =[5.70e7, 2e1])
>>> freq_
... array([7.00000000e+07, 6.63157895e+07, 6.26315791e+07, 5.89473686e+07,
       5.52631581e+07, 5.15789476e+07, 4.78947372e+07, 4.42105267e+07,
       4.05263162e+07, 3.68421057e+07, 3.31578953e+07, 2.94736848e+07,
       2.57894743e+07, 2.21052638e+07, 1.84210534e+07, 1.47368429e+07,
       1.10526324e+07, 7.36842195e+06, 3.68421147e+06, 1.00000000e+00])
>>> buffer
... array([5.52631581e+07, 1.00000000e+00])

flma()[source]#

A fixed-length-moving-average filter to estimate average apparent resistivities at a single static-correction-reference frequency.

The FLMA filter estimates static-corrected apparent resistivities at a single reference frequency by calculating a profile of average impedances along the length of the line. Sounding curves are then shifted so that they intersect the averaged profile.

Parameters:: data (path-like object or list of pycsamt.core.edi.Edi) – Collections of EDI-objects from pycsamt
Returns:: rc or z – EMAP apparent resistivity static shift corrected or static correction impedance tensor.
Return type:: np.ndarray, shape (N, M)

References

[1]

http://www.zonge.com/legacy/PDF_DatPro/Astatic.pdf

static freqInterpolation(y, /, buffer=None, kind='freq')[source]#

Interpolate frequency in frequeny buffer range.

Parameters:

y – array-like, shape(N, ) - Can be a frequency array or periods note that the frequency is not in log10 Hz.
buffer – list of maximum and minimum frequency. It should contains only two values. If None, the max and min frequencies are used
kind – str type of given data. Can be ‘period’ if the value is given as periods or ‘frequency’ otherwise. Any other value should be considered as a frequency values.

Returns:

array_like, shape (N2, ) New interpolated frequency with N2 size

Example:

>>> from watex.methods.em import Processing
>>> pobj = Processing().fit('data/edis')
>>> f = getfullfrequency (pobj.ediObjs_)
>>> buffer = [5.86000e+04, 1.6300e+01]
>>> f
... array([7.00000e+04, 5.88000e+04, 4.95000e+04, 4.16000e+04, 3.50000e+04,
       2.94000e+04, 2.47000e+04, 2.08000e+04, 1.75000e+04, 1.47000e+04,
       ...
       2.75000e+01, 2.25000e+01, 1.87500e+01, 1.62500e+01, 1.37500e+01,
       1.12500e+01, 9.37500e+00, 8.12500e+00, 6.87500e+00, 5.62500e+00])
>>> new_f = freqInterpolation(f, buffer = buffer)
>>> new_f
... array([5.88000000e+04, 4.93928459e+04, 4.14907012e+04, 3.48527859e+04,
       2.92768416e+04, 2.45929681e+04, 2.06584471e+04, 1.73533927e+04,
       ...
       2.74153120e+01, 2.30292565e+01, 1.93449068e+01, 1.62500000e+01])

getValidTensors(tol=0.5, **kws)[source]#

Select valid tensors from tolerance threshold and write EDI if applicable.

Function analyzes the data and keep the good ones. The goodness of the data depends on the threshold rate. For instance 50% means to consider an impedance tensor ‘z’ valid if the quality control shows at least that score at each frequency of all stations.

Parameters:

data (Path-like object or list of pycsamt.core.edi.Edi) – collections of EDI-objects from pycsamt . data params is passed to fit() method.
tol (float,) – tolerance parameter. The value indicates the rate from which the data can be consider as a valid. The valid data selection should be soft when the tolerance parameter is close to ‘1’ and hard otherwise. As the tol value decreases, the selection becomes severe. Default is .5 means 50 %
kws (dict ,) – Additional keywords arguments for EDI file exporting

Returns:

Return type:

watex.externals.z.Z impedance tensor objects.

Examples

>>> from watex.methods.em import Processing
>>> pObj = Processing ().fit('data/edis')
>>> f= pObj.freqs_
>>> len(f)
... 55
>>> zObjs_hard = pObj.getValidTensors (tol= 0.3 ) # None doesn't export EDI-file
>>> len(zObjs_hard[0]._freq) # suppress 3 tensor data
... 52
>>> zObjs_soft  = pObj.getValidTensors(p.ediObjs_, tol = 0.6 , option ='write')
>>> len(zObjs_soft[0]._freq)  # suppress only two
... 53

qc(tol=0.5, *, return_freq=False, return_ratio=False, to_log10=True)[source]#

Check the quality control of the collected EDIs.

Analyse the data in the EDI collection and return the quality control value. It indicates how percentage are the data to be representative.

Parameters:

tol – float, the tolerance parameter. The value indicates the rate from which the data can be consider as meaningful. Preferably it should be less than 1 and greater than 0. At this value. Default is .5 means 50 %
return_freq – bool return the interpolated frequency if set to True. Default is False.
return_ratio –
bool, default=False, return only the ratio of the representation of the data.

New in version 0.1.5.

:param to_log10:bool, default=False: convert the interpolated frequency into a log10.

Returns:

Tuple (float , index ) or (float, array-like, shape (N, )) return the quality control value and interpolated frequency if return_freq is set to True otherwise return the index of useless data.

Example:

>>> from watex.methods.em import Processing
>>> pobj = Processing().fit('data/edis')
>>> f = pobj.getfullfrequency ()
>>> # len(f)
>>> # ... 55 # 55 frequencies
>>> c,_ = pobj.qc ( tol = .4 ) # mean 60% to consider the data as
>>> # representatives
>>> c  # the representative rate in the whole EDI- collection
>>> # ... 0.95 # the whole data at all stations is safe to 95%.
>>> # now check the interpolated frequency
>>> c, freq_new  = pobj.qc ( tol=.6 , return_freq =True)

skew(method='swift', return_skewness=False, suppress_outliers=False)[source]#

The conventional asymmetry parameter based on the Z magnitude.

The EM signal is influenced by several factors such as the dimensionality of the propagation medium and the physical anomalies, which can distort the EM field both locally and regionally. The distortion of Z was determined from the quantification of its asymmetry and the deviation from the conditions that define its dimensionality. The parameters used for this purpose are all rotational invariant because the Z components involved in its definition are independent of the orientation system used. The conventional asymmetry parameter based on the Z magnitude is the skew defined by Swift (1967) as follows:

\[skew_{swift}= |\frac{Z_{xx} + Z_{yy}}{ Z_{xy} - Z_{yx}}|\]

When the $skew_{swift}$ is close to 0., we assume a 1D or 2D model when the $skew_{swift}$ is greater than >=0.2, we assume 3D local anomaly (Bahr, 1991; Reddy et al., 1977). It is generally considered that an electrical structure of $skew < 0.4$ can be treated as a 2D medium.

Furthermore, Bahr (1988) proposed the phase sensitive skew which calculates the skew taking into account the distortions produced in Z over 2D structures by shallow conductive anomalies and is defined as follows:

\[ \begin{align}\begin{aligned}skew_{Bahr} & = & \sqrt{ \frac{|[D_1, S_2] -[S_1, D_2]|}{|D_2|}} \quad \text{where}\\S_1 & = & Z_{xx} + Z_{yy} \quad ; \quad S_2 = Z_{xy} + Z_{yx}\\D_1 & = & Z_{xx} - Z_{yy} \quad ; \quad D_2 = Z_{xy} - Z_{yx}\end{aligned}\end{align} \]

Note that The phase differences between two complex numbers $C_1$ and $C_2$ and the corresponding amplitude products are now abbreviated by the commutators:

\[ \begin{align}\begin{aligned}\[C_1, C_2] & = & \text{Im} C_2*C_1^*\\\[C_1, C_2] & = & \text{Re} C_1 * \text{Im}C_2 - R_e(C_2)* \text{Im}C_1\end{aligned}\end{align} \]

Indeed, $skew_{Bahr}$ measures the deviation from the symmetry condition through the phase differences between each pair of tensor elements,considering that phases are less sensitive to surface distortions(i.e. galvanic distortion). The $skew_{Bahr}$ threshold is set at 0.3 and higher values mean 3D structures (Bahr, 1991).

Parameters:

data (str of path-like or list of pycsamt.core.edi.Edi) – EDI data or EDI object with full impedance tensor Z.
method (str) – Kind of correction. Can be swift for the remove distorsion proposed by Swift in 1967. The value close to 0. assume the 1D and 2D structures and 3D otherwise. Conversly to bahr for the remove distorsion proposed by Bahr in 1991. The latter threshold is set to 0.3. Above this value the structures is 3D.
return_skewness (str,) – Typically returns the type of skewness. 'skew' or mu for skew and rotation- all invariant values respectively. Any other value return both skew and rotational invariant.
suppress_outliers (bool, default=False,) –
Remove the outliers (if applicable in the data ) before normalizing.

New in version 0.1.6.

Returns:

skw, mu –

Array of skew at each frequency
rotational invariant mu at each frequency that measures of phase differences in the impedance tensor.

Return type:

Tuple of ndarray-like , shape (N, M )

watex.methods package#

Submodules#