The module encompasses the main functionalities for class and methods to sucessfully run. Somes modules are written and shortcutted for the users to do some singular tasks before feeding to the main algorithms.

watex.utils.coreutils.defineConductiveZone(erp, station=None, position=None, auto=False, index='py', **kws)[source]#

Define conductive zone as subset of the erp line.

Indeed the conductive zone is a specific zone expected to hold the drilling location station. If drilling location is not provided, it would be by default the very low resistivity values found in the erp line.

Parameters:
  • erp (array_like,) – the array contains the apparent resistivity values

  • station (str or int,) – is the station position name.

  • position (float,) – station position value.

  • auto (bool) – If True, the station position should be the position of the lower resistivity value in Electrical Resistivity Profiling.

  • indexing (str,) –

Returns:

  • - conductive zone of resistivity values

  • - conductive zone positionning

  • - station position index in the conductive zone

  • - station position index in the whole |ERP| line

Example:
>>> import numpy as np
>>>
>>> from watex.utils.coreutils import defineConductiveZone
>>> test_array = np.random.randn (10)
>>> selected_cz ,*_ = defineConductiveZone(test_array, 's20')
>>> shortPlot(test_array, selected_cz )
watex.utils.coreutils.erpSelector(f, columns=Ellipsis, force=False, **kws)[source]#

Read and sanitize the data collected from the survey.

data should be an array, a dataframe, series, or arranged in .csv or .xlsx formats. Be sure to provide the header of each columns in’ the worksheet. In a file is given, header columns should be aranged as ['station','resistivity' ,'longitude', 'latitude']. Note that coordinates columns (longitude and latitude) are not compulsory.

Parameters:
  • f (Path-like object, ndarray, Series or Dataframe,) – If a path-like object is given, can only parse .csv and .xlsx file formats. However, if ndarray is given and shape along axis 1 is greater than 4, the ndarray should be shrunked.

  • columns (list) – list of the valuable columns. It can be used to fix along the axis 1 of the array the specific values. It should contain the prefix or the whole name of each item in ['station','resistivity' ,'longitude', 'latitude'].

  • force (bool, default=False,) – If Vertical electrical (VES) is passed while expecting ERP data, force set to True will consider the VES data as ERP data and will use only the resistivity values in VES data. This will will an invalid results especially when parameters computation are needed.

  • kws (dict) – Additional pandas pd.read_csv and pd.read_excel methods keyword arguments. Be sure to provide the right argument. when reading f. For instance, provide sep= ',' argument when the file to read is xlsx format will raise an error. Indeed, sep parameter is acceptable for parsing the .csv file format only.

Return type:

DataFrame with valuable column(s).

Notes

The length of acceptable columns is 4. If the size of the columns is higher than 4, the data should be shrunked to match the expected columns. Futhermore, if the header is not specified in f , the defaut column arrangement should be used. Therefore, the second column should be considered as the resistivity column.

Examples

>>> import numpy as np
>>> from watex.utils.coreutils import erpSelector
>>> df = erpSelector ('data/erp/testsafedata.csv')
>>> df.shape
... (45, 4)
>>> list(df.columns)
... ['station','resistivity', 'longitude', 'latitude']
>>> df = erp_selector('data/erp/testunsafedata.xlsx')
>>> list(df.columns)
... ['easting', 'station', 'resistivity', 'northing']
>>> df = erpSelector(np.random.randn(7, 7))
>>> df.shape
... (7, 4)
>>> list(df.columns)
... ['station', 'resistivity', 'longitude', 'latitude']
watex.utils.coreutils.fill_coordinates(data=None, lon=None, lat=None, east=None, north=None, epsg=None, utm_zone=None, datum='WGS84', verbose=0)[source]#

Assert and recompute coordinates values based on geographical coordinates systems.

Compute the couples (easting, northing) or (longitude, latitude ) and set the new calculated values into a dataframe.

Parameters:
  • data (dataframe,) – Dataframe contains the lat, lon or east and north. All data don’t need to be provided. If (‘lat’, ‘lon’) and (east, north) are given, (’easting, northing’) should be overwritten.

  • lat (array-like float or string (DD:MM:SS.ms)) – Values composing the longitude of point

  • lon (array-like float or string (DD:MM:SS.ms)) – Values composing the longitude of point

  • east (array-like float) – Values composing the northing coordinate in meters

  • north (array-like float) – Values composing the northing coordinate in meters

  • datum (string) – well known datum ex. WGS84, NAD27, etc.

  • projection (string) – projected point in lat and lon in Datum latlon, as decimal degrees or ‘UTM’.

  • epsg (int) – epsg number defining projection (see http://spatialreference.org/ref/ for moreinfo). Overrides utm_zone if both are provided

  • utm_zone (string) – zone number and ‘S’ or ‘N’ e.g. ‘55S’. Defaults to the centre point of the provided points

  • verbose (int,default=0) – warning user if UTMZONE is not supplied when computing the latitude/longitude from easting/northing

Returns:

  • - `data` (Dataframe with new coodinates values computed)

  • - `utm_zone` (zone number and ‘S’ or ‘N’)

Examples

>>> from watex.utils.coreutils import fill_coordinates
>>> from watex.utils import read_data
>>> data = read_data ('data/erp/l2_gbalo.xlsx')
>>> # rename columns 'x' and 'y' to 'easting' and 'northing'  inplace
>>> data.rename (columns ={"x":'easting', "y":'northing'} , inplace =True )
>>> # transform the data by computing latitude/longitude by specifying the utm zone
>>> data_include,_ = fill_coordinates (data , utm_zone ='49N' )
>>> data.head(2)
easting   northing   rho  longitude  latitude
0   790752  1092750.0  1101        113         9
10   790747  1092758.0  1147        113         9
>>> # doing the revert action
>>> datalalon = data_include[['pk', 'longitude', 'latitude']]
>>> data_east_north, _ = fill_coordinates (datalalon )
>>> data_east_north.head(2)
pk  longitude  latitude  easting  northing
0   0        113         9   719870    995452
1  10        113         9   719870    995452
watex.utils.coreutils.is_erp_dataframe(data, dipolelength=None, force=False)[source]#

Ckeck whether the dataframe contains the electrical resistivity profiling (ERP) index properties.

DataFrame should be reordered to fit the order of index properties. Anyway it should he dataframe filled by 0. where the property is missing. However, if station property is not given. station` property should be set by using the dipolelength default value equals to 10..

Parameters:
  • data (Dataframe object) –

    Dataframe object. The columns dataframe should match the property ERP property object such as ``[‘station’,’resistivity’,

    ’longitude’,’latitude’]``

    or ['station','resistivity', 'easting','northing'].

  • dipolelength (float) – Distance of dipole during the whole survey line. If the station is not given as data columns, the station location should be computed and filled the station columns using the default value of the dipole. The default value is set to 10 meters.

  • force (bool, default=False,) – If Vertical electrical (VES) is passed while expecting ERP data, force set to True will consider the VES data as ERP data and will use only the resistivity values in VES data. This will will an invalid results especially when parameters computation are needed.

Return type:

A new data with index properties.

Raises:
  • - None of the column matches the property indexes.

  • - Find duplicated values in the given data header.

Examples

>>> import numpy as np
>>> from watex.utils.coreutils import is_erp_dataframe
>>> df = pd.read_csv ('data/erp/testunsafedata.csv')
>>> df.columns
... Index(['x', 'stations', 'resapprho', 'NORTH'], dtype='object')
>>> df = _is_erp_dataframe (df)
>>> df.columns
... Index(['station', 'easting', 'northing', 'resistivity'], dtype='object')
watex.utils.coreutils.is_erp_series(data, dipolelength=None)[source]#

Validate the data series whether is ERP data.

The data should be the resistivity values with the one of the following property index names resistivity or rho. Will raises error if not detected. If a`dipolelength` is given, a data should include each station positions values.

Parameters:
  • data (pandas Series object) – Object of resistivity values

  • dipolelength (float) – Distance of dipole during the whole survey line. If it is is not given , the station location should be computed and filled using the default value of the dipole. The default value is set to 10 meters.

Returns:

  • A dataframe of the property indexes such as

  • ['station', 'easting','northing', 'resistivity'].

Raises:

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from watex.utils.coreutils imprt is_erp_series
>>> data = pd.Series (np.abs (np.random.rand (42)), name ='res')
>>> data = is_erp_series (data)
>>> data.columns
... Index(['station', 'easting', 'northing', 'resistivity'], dtype='object')
>>> data = pd.Series (np.abs (np.random.rand (42)), name ='NAN')
>>> data = _is_erp_series (data)
... ResistivityError: Unable to detect the resistivity column: 'NAN'.
watex.utils.coreutils.makeCoords(reflong, reflat, nsites, *, r=45.0, utm_zone=None, step='1km', order='+', todms=False, is_utm=False, raise_warning=True, **kws)[source]#

Generate multiple stations coordinates (longitudes, latitudes) from a reference station/site.

One degree of latitude equals approximately 364,000 feet (69 miles), one minute equals 6,068 feet (1.15 miles), and one-second equals 101 feet. One-degree of longitude equals 288,200 feet (54.6 miles), one minute equals 4,800 feet (0.91 mile) , and one second equals 80 feet. Illustration showing longitude convergence. (1 feet ~=0.3048 meter)

Parameters:
  • reflong (float or string or list of [start, stop]) – Reference longitude in degree decimal or in DD:MM:SS for the first site considered as the origin of the landmark.

  • reflat (float or string or list of [start, stop]) – Reference latitude in degree decimal or in DD:MM:SS for the reference site considered as the landmark origin. If value is given in a list, it can containt the start point and the stop point.

  • nsites (int or float) – Number of site to generate the coordinates onto.

  • r (float or int) – The rotate angle in degrees. Rotate the angle features the direction of the projection line. Default value is 45 degrees.

  • step (float or str) – Offset or the distance of seperation between different sites in meters. If the value is given as string type, except the km, it should be considered as a m value. Only meters and kilometers are accepables.

  • order (str) – Direction of the projection line. By default the projected line is in ascending order i.e. from SW to NE with angle r set to 45 degrees. Could be - for descending order. Any other value should be in ascending order.

  • is_utm (bool,) – Consider the first two positional arguments as UTM coordinate values. This is an alternative way to assume reflong and reflat are UTM coordinates ‘easting’and ‘northing` by default. If utm2deg is False, any value greater than 180 degrees for longitude and 90 degrees for latitude will raise an error. Default is False.

  • utm_zone (string (##N or ##S)) – utm zone in the form of number and North or South hemisphere, 10S or 03N Must be given if utm2deg is set to True.

  • todms (bool) – Convert the degree decimal values into the DD:MM:SS. Default is False.

  • raise_warning (bool, default=True,) – Raises warnings if GDAL is not set or the coordinates accurately status.

  • kws (dict,) – Additional keywords of gistools.project_point_utm2ll().

Returns:

  • Tuple of generated projected coordinates longitudes and latitudes

  • either in degree decimals or DD (MM:SS)

Notes

The distances vary. A degree, minute, or second of latitude remains fairly constant from the equator to the poles; however a degree, minute, or second of longitude can vary greatly as one approaches the poles and the meridians converge.

References

https://math.answers.com/Q/How_do_you_convert_degrees_to_meters

Examples

>>> from watex.utils.coreutils import makeCoords
>>> rlons, rlats = makeCoords('110:29:09.00', '26:03:05.00',
...                                     nsites = 7, todms=True)
>>> rlons
... array(['110:29:09.00', '110:29:35.77', '110:30:02.54', '110:30:29.30',
       '110:30:56.07', '110:31:22.84', '110:31:49.61'], dtype='<U12')
>>> rlats
... array(['26:03:05.00', '26:03:38.81', '26:04:12.62', '26:04:46.43',
       '26:05:20.23', '26:05:54.04', '26:06:27.85'], dtype='<U11')
>>> rlons, rlats = makeCoords ((116.7, 119.90) , (44.2 , 40.95),
                                        nsites = 238, step =20. ,
                                        order = '-', r= 125)
>>> rlons
... array(['119:54:00.00', '119:53:11.39', '119:52:22.78', '119:51:34.18',
       '119:50:45.57', '119:49:56.96', '119:49:08.35', '119:48:19.75',
       ...
       '116:46:03.04', '116:45:14.43', '116:44:25.82', '116:43:37.22',
       '116:42:48.61', '116:42:00.00'], dtype='<U12')
>>> rlats
... array(['40:57:00.00', '40:57:49.37', '40:58:38.73', '40:59:28.10',
       '41:00:17.47', '41:01:06.84', '41:01:56.20', '41:02:45.57',
       ...
   '44:07:53.16', '44:08:42.53', '44:09:31.90', '44:10:21.27',
   '44:11:10.63', '44:12:00.00'], dtype='<U11')
watex.utils.coreutils.parseDCArgs(fn, delimiter=None, arg='stations')[source]#

Parse DC stations and search arguments from file and output to array accordingly.

The froms argument is the depth in meters from which one expects to find a fracture zone outside of pollutions. Indeed, the fromS parameter is used to speculate about the expected groundwater in the fractured rocks under the average level of water inrush in a specific area. For more details refer to watex.methods.electrical.VerticalSounding.fromS documentation.

Parameters:
  • fn – path-like object, full path to DC station or fromS file. if data is considered as a station file, it must be composed the station names. Commonly it can be used to specify the selected station of all DC-resistity line where one expects to locate the drilling. Conversly, the fromS file should not include any letter so if given, ot sould be removed.

  • arg – str of the attribute of the DC methods.Any other value except station should considered as fromS value and will parse the file accordingly.

  • delimiter

    str , delimiter to separate the different stations or ‘fromS’ value. For instance, use use < delimiter=’ ‘> when all values are separated with space and be arranged in the same line like:

    >>> 'S02 S12 S12 S15 S28 S30' #  line of the file.
    

Returns:

array: array of station name.

Note:

if all station prefixes belong to the module station property object i.e watex.property.P.istation, the prefix should be overwritten to only keep the S. For instance ‘pk25’-> ‘S25’

Example:
>>> from watex.utils.coreutils import parseDCArgs
>>> sf='data/sfn.txt' # use delimiter if values are in the same line.
>>> sdata= parseDCArgs(sf)
>>> sdata
...
>>> # considered that the digits in the file correspond to the depths
>>> fdata= parseDCArgs(sf, arg='froms')
>>> fdata
...
watex.utils.coreutils.plotAnomaly(erp, cz=None, station=None, fig_size=(10, 4), fig_dpi=300, savefig=None, show_fig_title=True, style='seaborn', fig_title_kws=Ellipsis, czkws=Ellipsis, legkws=Ellipsis, how='py', **kws)[source]#

Plot the whole Electrical Resistivity Profiling line and selected conductive zone.

Conductive zone can be supplied nannualy as a subset of the erp or by specifying the station expected for drilling location. For instance S07 for the seventh station. Futhermore, for automatic detection, one should set the station argument s to auto. However, it ‘s recommended to provide the cz or the s to have full control. The conductive zone overlained the whole Electrical Resistivity Profiling survey. user can customize the cz plot by filling with Matplotlib pyplot additional keywords araguments thought the keyword arguments czkws.

Parameters:
erp: array_like 1d

the Electrical Resistivity Profiling survey line. The line is an array of resistivity values. Note that if a dataframe is passed, be sure that the frame matches the DC resistivity data (ERP), otherwise an error occurs. At least, the frame columns includes the resistivity and stations.

cz: array_like 1d

the selected conductive zone. If None, only the erp should be displayed. Note that cz is an subset of erp array.

station: str, optional

The station location given as string (e.g. s= "S10") or as a station number (indexing; e.g s =10). If value is set to "auto", s should be find automatically and fetching cz as well.

figsize: tuple, default =(10, 4)

Tuple value of figure size. Refer to the web resources Matplotlib figure.

fig_dpi: int , default=300,

figure resolution “dot per inch”. Refer to Matplotlib figure.

savefig: str, optional,

save the figure. Refer to Matplotlib figure.

show_fig_title: bool, default =True

display the title of the figure.

fig_title_kws: dict,

Keywords arguments of figure suptile. Refer to Matplotlib figsuptitle.

style: str - the style for customizing visualization. For instance to

get the first seven available styles in pyplot, one can run the script below:

plt.style.available[:7]

Futher details can be foud in Webresources below or click on GeekforGeeks.

how: str, default=’py’

By default (how='py'), the station is naming following the Python indexing. Station is counting from station 00(S00). Any other values will start the station naming from 1.

czkws: dict,

keywords Matplotlib pyplot additional arguments to customize the cz plot.

legkws: dict,

Additional keywords Matplotlib legend arguments.

kws: dict,

additional keywords argument for Matplotlib pyplot to customize the erp plot.

See also

watex.erpSmartDetector

Detection conductive zone applying the constraint. Set the view=True for constraints visualization.

. _Cote d’Ivoire: https://en.wikipedia.org/wiki/Ivory_Coast

Examples

>>> import numpy as np
>>> from watex.utils import plotAnomaly, defineConductiveZone
>>> test_array = np.abs (np.random.randn (10)) *1e2
>>> selected_cz ,*_ = defineConductiveZone(test_array, 7)
>>> plotAnomaly(test_array, selected_cz )
>>> plotAnomaly(test_array, selected_cz , s= 5)
>>> plotAnomaly(test_array, s= 's02')
>>> plotAnomaly(test_array)
watex.utils.coreutils.read_data(f, **read_kws)[source]#

Assert and read specific files and url allowed by the package

Readable files are systematically convert to a pandas dataframe frame.

Parameters:
  • f (str, Path-like object) – File path or Pathlib object. Must contain a valid file name and should be a readable file or url

  • read_kws (dict,) – Additional keywords arguments passed to pandas readable file keywords.

Returns:

f – A dataframe with head contents by default.

Return type:

pandas.DataFrame

watex.utils.coreutils.vesSelector(data=None, *, rhoa=None, AB=None, MN=None, index_rhoa=None, **kws)[source]#

Assert the validity of Vertical Electrical Sounding data and return a sanitize dataframe.

param rhoa:

array-like - Apparent resistivities collected during the sounding.

param AB:

array-like - Investigation distance between the current electrodes. Note that the AB is by convention equals to AB/2. It’s taken as half-space of the investigation depth.

param MN:

array-like - Potential electrodes distances at each investigation depth. Note by convention the values are half-space and equals to MN/2.

param f:

Path-like object or sounding dataframe. If given, the others parameters could keep the ``None` values.

param index_rhoa:

int - The index to retrieve the resistivity data of a specific sounding point. Sometimes the sounding data are composed of the different sounding values collected in the same survey area into different Electrical Resistivity Profiling line. For instance:

AB/2

MN/2

SE1

SE2

SE3

SEn

Where SE are the electrical sounding data values and n is the number of the sounding points selected. SE1, SE2 and SE3 are three points selected for Vertical Electrical Sounding i.e. 3 sounding points carried out either in the same Electrical Resistivity Profiling or somewhere else. These sounding data are the resistivity data with a specific numbers. Commonly the number are randomly chosen. It does not refer to the expected best fracture zone selected after the prior-interpretation. After transformation via the function ves_selector, the header of the data should hold the resistivity. For instance, refering to the table above, the data should be:

AB

MN

resistivity

resistivity

resistivity

Therefore, the index_rhoa is used to select the specific resistivity values i.e. select the corresponding sounding number of the Vertical Electrical Sounding expecting to locate the drilling operations or for computation. For esample, index_rhoa=1 should figure out:

AB/2

MN/2

SE2

–>

AB

MN

resistivity

If index_rhoa is None and the number of sounding curves are more than one, by default the first sounding curve is selected ie index_rhoa equals to 0.

param kws:

dict - Pandas dataframe reading additionals keywords arguments.

return:

-dataframe -Sanitize Vertical Electrical Sounding dataframe with ` AB`, MN and resistivity as the column headers.

Example:
>>> from watex.utils.coreutils import vesSelector
>>> df = vesSelector (data='data/ves/ves_gbalo.csv')
>>> df.head(3)
...    AB   MN  resistivity
    0   1  0.4          943
    1   2  0.4         1179
    2   3  0.4         1103
>>> df = vesSelector ('data/ves/ves_gbalo.csv', index_rhoa=3 )
>>> df.head(3)
...    AB   MN  resistivity
    0   1  0.4          457
    1   2  0.4          582
    2   3  0.4          558

. _Cote d’Ivoire: https://en.wikipedia.org/wiki/Ivory_Coast