<no title> — watex 0.2.7.dev22+gf01a77d.d20230708 documentation

watex.utils.funcutils.accept_types(*objtypes, format=False)[source]#

List the type format that can be accepted by a function.

Parameters:

objtypes – List of object types.
format – bool - format the list of the name of objects.

Returns:

list of object type names or str of object names.

Example:

>>> import numpy as np; import pandas as pd
>>> from watex.utils.funcutils import accept_types
>>> accept_types (pd.Series, pd.DataFrame, tuple, list, str)
... "'Series','DataFrame','tuple','list' and 'str'"
>>> atypes= accept_types (
    pd.Series, pd.DataFrame,np.ndarray, format=True )
..."'Series','DataFrame' and 'ndarray'"

watex.utils.funcutils.assert_doi(doi)[source]#

assert the depath of investigation Depth of investigation converter

Parameters:: doi (str|float) – depth of investigation in meters. If value is given as string following by yhe index suffix of kilometers ‘km’, value should be converted instead.

:returns doi:value in meter :rtype: float

watex.utils.funcutils.assert_ratio(v, /, bounds=None, exclude_value=None, in_percent=False, name='rate')[source]#

Assert rate value between a specific range.

Parameters:

v (float,) – ratio value to assert
bounds (list ( lower, upper)) – The range that value must be included
exclude_value (float) – A value that v must not taken. Exclude it from the bounds. Raise error otherwise. Note that any other value will use the lower bound in bounds as exlusion.
in_percent (bool, default=False,) –
Convert the value into a percentage.

Changed in version 0.2.3: as_percent parameter is changed to in_percent.
name (str, default='rate') – the name of the value for assertion.

Returns:

v – Asserted value.

Return type:

float

Examples

>>> from watex.utils.funcutils import assert_ratio
>>> assert_ratio('2')
2.0
>>> assert_ratio(2 , bounds =(2, 8))
2.0
>>> assert_ratio(2 , bounds =(4, 8))
ValueError:...
>>> assert_ratio(2 , bounds =(1, 8), exclude_value =2 )
ValueError: ...
>>> assert_ratio(2 , bounds =(1, 8), exclude_value ='use bounds' )
2.0
>>> assert_ratio(2 , bounds =(0, 1) , in_percent =True )
0.02
>>> assert_ratio(2 , bounds =(0, 1) )
ValueError:
>>> assert_ratio(2 , bounds =(0, 1), exclude_value ='use lower bound',
                     name ='tolerance', in_percent =True )
0.02

watex.utils.funcutils.check_dimensionality(obj, data, z, x)[source]#

Check dimensionality of data and fix it.

Parameters:

obj – Object, can be a class logged or else.
data – 2D grid data of ndarray (z, x) dimensions.
z – array-like should be reduced along the row axis.
x – arraylike should be reduced along the columns axis.

watex.utils.funcutils.cleaner(data, /, columns=None, inplace=False, labels=None, func=None, mode='clean', **kws)[source]#

Sanitize data in the data or columns by dropping specified labels from rows or columns.

If data is not a pandas dataframe, should be converted to dataframe and uses index to drop the labels.

Parameters:

data (pd.Dataframe or arraylike2D.) – Dataframe pandas or Numpy two dimensional arrays. If 2D array is passed, it should prior be converted to a daframe by default and drop row index from index parameters
columns (single label or list-like) –

Alternative to specifying axis (
labels, axis=1 is equivalent to columns=labels).
labels (single label or list-like) – Index or column labels to drop. A tuple will be used as a single label and not treated as a list-like.
func (F, callable) – Universal function used to clean the columns. If performs only when mode is on clean option.
inplace (bool, default False) – If False, return a copy. Otherwise, do operation inplace and return None.
mode (str, default='clean') – Options or mode of operation to do on the data. It could be [‘clean’|’drop’]. If drop, it behaves like dataframe.drop of pandas.

Returns:

DataFrame cleaned or without the removed index or column labels or None if inplace=True or array is data is passed as an array.

Return type:

DataFrame, array2D or None

watex.utils.funcutils.concat_array_from_list(list_of_array, concat_axis=0)[source]#

Concat array from list and set the None value in the list as NaN.

Parameters:

list_of_array – List of array elements
concat_axis (int) – axis for concatenation 0 or 1

Returns:

Concatenated array with shape np.ndaarry( len(list_of_array[0]), len(list_of_array))

Return type:

np.ndarray

Example:

>>> import numpy as np
>>> from watex.utils.funcutils import concat_array_from_list
>>> np.random.seed(0)
>>> ass=np.random.randn(10)
>>> ass = ass2=np.linspace(0,15,10)
>>> concat_array_from_list ([ass, ass])

watex.utils.funcutils.convert_csvdata_from_fr_to_en(csv_fn, pf, destfile='pme.en.csv', savepath=None, delimiter=':')[source]#

Translate variable data from french csv data to english with parser file.

Parameters:

csv_fn – data collected in csv format.
pf – parser file.
destfile – str, Destination file, outputfile.
savepath – Path-Like object, save data to a path.

Example:

# to execute this script, we need to import the two modules below >>> import os >>> import csv >>> from watex.utils.funcutils import convert_csvdata_from_fr_to_en >>> path_pme_data = r’C:/UsersAdministratorDesktop__elodata >>> datalist=convert_csvdata_from_fr_to_en(

os.path.join( path_pme_data, _enuv2.csv’) , os.path.join(path_pme_data, pme.parserf.md’)

savefile = ‘pme.en.cv’)

watex.utils.funcutils.convert_value_in(v, /, unit='m')[source]#

Convert value based on the reference unit.

Parameters:

v (str, float, int,) – value to convert
unit (str, default='m') – Reference unit to convert value in. Default is ‘meters’. Could be ‘kg’ or else.

Returns:

v – Value converted.

Return type:

float,

Examples

>>> from watex.utils.funcutils import convert_value_in
>>> convert_value_in (20)
20.0
>>> convert_value_in ('20mm')
0.02
>>> convert_value_in ('20kg', unit='g')
20000.0
>>> convert_value_in ('20')
20.0
>>> convert_value_in ('20m', unit='g')
ValueError: Unknwon unit 'm'...

watex.utils.funcutils.count_func(path, verbose=0)[source]#

Count function and method using ‘ast’ modules

Parameters:

path (str, Path-like object,) – Path to the python module file
verbose (int, default=0) – Different to 0 outputs the counting details.

Returns:

cobj or None – verbose is False.

Return type:

Returns the counter object from module ast or nothing if

watex.utils.funcutils.cparser_manager(cfile, savepath=None, todo='load', dpath=None, verbose=0, **pkws)[source]#

Save and output message according to the action.

Parameters:

cfile – name of the configuration file
savepath – Path-like object
dpath – default path
todo – Action to perform with config file. Can ve load or dump
config – Type of configuration file. Can be [‘YAML|CSV|JSON]
verbose – int, control the verbosity. Output messages

watex.utils.funcutils.cpath(savepath=None, dpath='_default_path_')[source]#

Control the existing path and create one of it does not exist.

Parameters:

savepath – Pathlike obj, str
dpath – str, default pathlike obj

watex.utils.funcutils.display_infos(infos, **kws)[source]#

Display unique element on list of array infos

Parameters:

infos – Iterable object to display.
header – Change the header to other names.

Example:

>>> from watex.utils.funcutils import display_infos
>>> ipts= ['river water', 'fracture zone', 'granite', 'gravel',
     'sedimentary rocks', 'massive sulphide', 'igneous rocks',
     'gravel', 'sedimentary rocks']
>>> display_infos('infos= ipts,header='TestAutoRocks',
                  size =77, inline='~')

watex.utils.funcutils.drawn_anomaly_boundaries2(erp_data, appRes, index)[source]#

Function to drawn anomaly boundary and return the anomaly with its boundaries

Parameters:

erp_data (array_like or list) – erp profile
appRes (float) – resistivity value of minimum pk anomaly
index (int) – index of minimum pk anomaly

Returns:

anomaly boundary

Return type:

list of array_like

watex.utils.funcutils.drawn_boundaries(erp_data, appRes, index)[source]#

Function to drawn anomaly boundary and return the anomaly with its boundaries

Parameters:

erp_data (array_like or list) – erp profile
appRes (float) – resistivity value of minimum pk anomaly
index (int) – index of minimum pk anomaly

Returns:

anomaly boundary

Return type:

list of array_like

watex.utils.funcutils.ellipsis2false(*parameters, default_value=False)[source]#

Turn all parameter arguments to False if ellipsis.

Note that the output arguments must be in the same order like the positional arguments.

Parameters:

parameters – tuple List of parameters
default_value – Any, Value by default that might be take the ellipsis.

Returns:

tuple, same list of parameters passed ellipsis to default_value. By default, it returns False. For a single parameters, uses the trailing comma for collecting the parameters

Example:

>>> from watex.utils.funcutils import ellipsis2false
>>> var, = ellipsis2false (...)
>>> var
False
>>> data, sep , verbose = ellipsis2false ([2,3, 4], ',', ...)
>>> verbose
False

watex.utils.funcutils.exist_features(df, features, error='raise')[source]#

Control whether the features exist or not

Parameters:

df – a dataframe for features selections
features – list of features to select. Lits of features must be in the dataframe otherwise an error occurs.
error – str - raise if the features don’t exist in the dataframe. default is raise and ignore otherwise.

Returns:

bool assert whether the features exists

watex.utils.funcutils.fetch_json_data_from_url(url, todo='load')[source]#

Retrieve JSON data from url :param url: Universal Resource Locator . :param todo: Action to perform with JSON:

load: Load data from the JSON file

dump: serialize data from the Python object and create a JSON file

watex.utils.funcutils.fillNaN(arr, method='ff')[source]#

Most efficient way to back/forward-fill NaN values in numpy array.

Parameters:

arr (ndarray) – Array containing NaN values to be filled
method (str) – Method for filling. Can be forward fill ff or backward fill bf`. or both for the two methods. Default is ff.

Return type:

new array filled.

Notes

When NaN value is framed between two valid numbers, ff and bf performs well the filling operations. However, when the array is ended by multiple NaN values, the ff is recommended. At the opposite the bf is the method suggested. The ``both``argument does the both tasks at the expense of the computation cost.

Examples

>>> import numpy as np
>>> from from watex.utils.funcutils import fillNaN
>>> arr2d = np.random.randn(7, 3)
>>> # change some value into NaN
>>> arr2d[[0, 2, 3, 3 ],[0, 2,1, 2]]= np.nan
>>> arr2d
... array([[        nan, -0.74636104,  1.12731613],
       [ 0.48178017, -0.18593812, -0.67673698],
       [ 0.17143421, -2.15184895,         nan],
       [-0.6839212 ,         nan,         nan]])
>>> fillNaN (arr2d)
... array([[        nan, -0.74636104,  1.12731613],
       [ 0.48178017, -0.18593812, -0.67673698],
       [ 0.17143421, -2.15184895, -2.15184895],
       [-0.6839212 , -0.6839212 , -0.6839212 ]])
>>> fillNaN(arr2d, 'bf')
... array([[-0.74636104, -0.74636104,  1.12731613],
       [ 0.48178017, -0.18593812, -0.67673698],
       [ 0.17143421, -2.15184895,         nan],
       [-0.6839212 ,         nan,         nan]])
>>> fillNaN (arr2d, 'both')
... array([[-0.74636104, -0.74636104,  1.12731613],
       [ 0.48178017, -0.18593812, -0.67673698],
       [ 0.17143421, -2.15184895, -2.15184895],
       [-0.6839212 , -0.6839212 , -0.6839212 ]])

References

Some function below are edited by the authors in pyQuestion.com website. There are other way more efficient to perform this task by calling the module Numba to accelerate the computation time. However, at the time this script is writen (August 17th, 2022) , Numba works with Numpy version 1.21. The latter is older than the one used in for writting this package (1.22.3 ).

For furher details, one can refer to the following link: https://pyquestions.com/most-efficient-way-to-forward-fill-nan-values-in-numpy-array

watex.utils.funcutils.find_by_regex(o, /, pattern, func=<function match>, **kws)[source]#

Find pattern in object whatever an “iterable” or not.

when we talk about iterable, a string value is not included.

Parameters:

o (str or iterable,) – text litteral or an iterable object containing or not the specific object to match.
pattern (str, default = ‘[_#&*@!_,;s-]s*’) – The base pattern to split the text into a columns
func (re callable , default=re.match) –
regular expression search function. Can be [re.match, re.findall, re.search ],or any other regular expression function.
- re.match(): function searches the regular expression pattern and
  return the first occurrence. The Python RegEx Match method checks for a match only at the beginning of the string. So, if a match is found in the first line, it returns the match object. But if a match is found in some other line, the Python RegEx Match function returns null.
- re.search(): function will search the regular expression pattern
  and return the first occurrence. Unlike Python re.match(), it will check all lines of the input string. The Python re.search() function returns a match object when the pattern is found and “null” if the pattern is not found
- re.findall() module is used to search for ‘all’ occurrences that
  match a given pattern. In contrast, search() module will only return the first occurrence that matches the specified pattern. findall() will iterate over all the lines of the file and will return all non-overlapping matches of pattern in a single step.
kws (dict,) – Additional keywords arguments passed to functions re.match() or re.search() or re.findall().

Returns:

om – matched object put is the list

Return type:

list

Example

>>> from watex.utils.funcutils import find_by_regex
>>> from watex.datasets import load_hlogs
>>> X0, _= load_hlogs (as_frame =True )
>>> columns = X0.columns
>>> str_columns =','.join (columns)
>>> find_by_regex (str_columns , pattern='depth', func=re.search)
... ['depth']
>>> find_by_regex(columns, pattern ='depth', func=re.search)
... ['depth_top', 'depth_bottom']

watex.utils.funcutils.find_close_position(refarr, arr)[source]#

Get the close item from arr in the reference array refarr.

Parameters:

arr – array-like 1d, Array to extended with fill value. It should be shorter than the refarr.
refarr – array-like- the reference array. It should have a greater length than the array arr.

Returns:

generator of index of the closest position in refarr.

watex.utils.funcutils.find_feature_positions(anom_infos, anom_rank, pks_rhoa_index, dl)[source]#

Get the pk bound from ranking of computed best points

Parameters:

anom_infos – Is a dictionnary of best anomaly points computed from drawn_anomaly_boundaries2() when pk_bounds is not given. see find_position_bounds()
anom_rank – Automatic ranking after selecting best points
pk_rhoa_index –
Is tuple of selected anomaly resistivity value and index in the whole Electrical Resistivity Profiling line. for instance:

pks_rhoa_index= (80., 17)

where “80” is the value of selected anomaly in ohm.m and “17” is the index of selected points in the Electrical Resistivity Profiling array.
dl – Is the distance between two measurement as dipole_length. Provide the dl if the default value is not right.

Returns:

Refer to .exmath.select_anomaly

watex.utils.funcutils.find_position_bounds(pk, rhoa, rhoa_range, dl=10.0)[source]#

Find station position boundary indexed in Electrical Resistivity Profiling line.

Useful to get the boundaries indexes pk_boun_indexes for Electrical Resistivity Profiling normalisation when computing anr or else.

Parameters:

pk (float) – Selected anomaly station value
rhoa (float) – Selected anomaly value in ohm.m

Rhoa_range:

Selected anomaly values from pk_min to pk_max

Rhoa_range:

array_like

Parm dl:

see find_position_from_sa() docstring.

Example:

>>> from watex.utils.funcutils import find_position_bounds
>>> find_position_bounds(pk=110, rhoa=137,
                  rhoa_range=np.array([175,132,137,139,170]))

watex.utils.funcutils.find_position_from_sa(an_res_range, pos=None, selectedPk=None)[source]#

Function to select the main pk from both get_boundaries().

Paran an_res_range:

anomaly resistivity range on Electrical Resistivity Profiling line.

Parameters:

pos (list) – position of anomaly boundaries (inf and sup): anBounds = [90, 130] - 130 is max boundary and 90 the min boundary
selectedPk – User can set its own position of the right anomaly. Be sure that the value provided is right position . Could not compute again provided that pos is not None.

Returns:

anomaly station position.

Return type:

str ‘pk{position}’

Example:

>>> from watex.utils.funcutils import find_positon_from_sa
>>> resan = np.array([168,130, 93,146,145])
>>> pk= find_pk_from_selectedAn(
...    resan, pos=[90, 13], selectedPk= 'str20')
>>> pk

watex.utils.funcutils.fit_by_ll(ediObjs)[source]#

Fit EDI by location and reorganize EDI according to the site longitude and latitude coordinates.

EDIs data are mostly reading in an alphabetically order, so the reoganization

according to the location(longitude and latitude) is usefull for distance betwen site computing with a right position at each site.

Parameters:

ediObjs (watex.edi.Edi_Collection) – list of EDI object, composed of a collection of watex.edi.Edi or pycsamt.core.edi.Edi or mtpy.core.edi objects

Returns:

array splitted into ediObjs and Edifiles basenames

Rtyple:

tuple

Example:

>>> import numpy as np
>>> from watex.methods.em import EM
>>> from watex.utils.funcutils import fit_by_ll
>>> edipath ='data/edi_ss'
>>> cediObjs = EM (edipath)
>>> ediObjs = np.random.permutation(cediObjs.ediObjs) # shuffle the
... # the collection of ediObjs
>>> ediObjs, ediObjbname = fit_by_ll(ediObjs)
...

watex.utils.funcutils.fmt_text(anFeatures=None, title=None, **kwargs)[source]#

Function format text from anomaly features

Parameters:

anFeatures (list or dict) – Anomaly features
title (list) – head lines

Example:

>>> from watex.utils.funcutils import fmt_text
>>> fmt_text(anFeatures =[1,130, 93,(146,145, 125)])

watex.utils.funcutils.format_notes(text, cover_str='~', inline=70, **kws)[source]#

Format note :param text: Text to be formated.

Parameters:

cover_str – type of str to surround the text.
inline – Nomber of character before going in liine.
margin_space – Must be <1 and expressed in %. The empty distance between the first index to the inline text

Example:

>>> from watex.utils import funcutils as func
>>> text ='Automatic Option is set to ``True``.'            ' Composite estimator building is triggered.'
>>>  func.format_notes(text= text ,
...                       inline = 70, margin_space = 0.05)

watex.utils.funcutils.fr_en_parser(f, delimiter=':')[source]#

Parse the translated data file.

Parameters:

f – translation file to parse.
delimiter – str, delimiter.

Returns:

generator obj, composed of a list of french and english Input translation.

Example:

>>> file_to_parse = 'pme.parserf.md'
>>> path_pme_data = r'C:/Users\Administrator\Desktop\__elodata
>>> data =list(BS.fr_en_parser(
    os.path.join(path_pme_data, file_to_parse)))

watex.utils.funcutils.get_boundaries(df)[source]#

Define anomaly boundary upper bound and lowerbound from define_position_bounds location.

Parameters:

df – Dataframe pandas contained the columns ‘pk’, ‘x’, ‘y’, ‘rho’, ‘dl’.

Returns:

autoOption triggered the automatic Option if nothing is specified
into excelsheet.
ves_loc: Sounding curve location at pk
posMinMax: Anomaly boundaries composed of lower and upper
bounds.

Specific names can be used to define lower and upper bounds:
`lower`: 'lower', 'inf', 'min', 'min', '1' or 'low' `upper`: 'upper', 'sup', 'maj', 'max', '2, or 'up'

To define the sounding location, can use::: ves:’ves’, ‘se’, ‘sond’,’vs’, ‘loc’, ‘0’ or ‘dl’

watex.utils.funcutils.get_confidence_ratio(ar, /, axis=0, invalid='NaN')[source]#

Get ratio of confidence in array by counting the number of invalid values.

Parameters:

ar (arraylike 1D or 2D) – array for checking the ratio of confidence
axis (int, default=0,) – Compute the ratio of confidence alongside the rows by defaults.
invalid (int, foat, default='NaN') – The value to consider as invalid in the data might be listed if applicable. The default is NaN.

Returns:

ratio – The ratio of confidence array alongside the axis.

Return type:

arraylike 1D

Examples

>>> import numpy as np
>>> np.random.seed (0)
>>> test = np.random.randint (1, 20 , 10 ).reshape (5, 2 )
>>> test
array([[13, 16],
       [ 1,  4],
       [ 4,  8],
       [10, 19],
       [ 5,  7]])
>>> from watex.utils.funcutils import get_confidence_ratio
>>> get_confidence_ratio (test)
>>> array([1., 1.])
>>> get_confidence_ratio (test, invalid= ( 13, 19) )
array([0.8, 0.8])
>>> get_confidence_ratio (test, invalid= ( 13, 19, 4) )
array([0.6, 0.6])
>>> get_confidence_ratio (test, invalid= ( 13, 19, 4), axis =1 )
array([0.5, 0.5, 0.5, 0.5, 1. ])

watex.utils.funcutils.get_config_fname_from_varname(data, config_fname=None, config='.yml')[source]#

use the variable name given to data as the config file name.

Parameters:

data – Given data to retrieve the variable name
config_fname – Configurate variable filename. If None , use the name of the given varibale data
config – Type of file for configuration. Can be json, yml or csv file. default is yml.

Returns:

str, the configuration data.

watex.utils.funcutils.get_params(obj)[source]#

Get object parameters.

Object can be callable or instances

Parameters:

obj – object , can be callable or instance

Returns:

dict of parameters values

Examples:

>>> from sklearn.svm import SVC
>>> from watex.utils.funcutils import get_params
>>> sigmoid= SVC (
    **{
        'C': 512.0,
        'coef0': 0,
        'degree': 1,
        'gamma': 0.001953125,
        'kernel': 'sigmoid',
        'tol': 1.0
        }
    )
>>> pvalues = get_params( sigmoid)
>>> {'decision_function_shape': 'ovr',
     'break_ties': False,
     'kernel': 'sigmoid',
     'degree': 1,
     'gamma': 0.001953125,
     'coef0': 0,
     'tol': 1.0,
     'C': 512.0,
     'nu': 0.0,
     'epsilon': 0.0,
     'shrinking': True,
     'probability': False,
     'cache_size': 200,
     'class_weight': None,
     'verbose': False,
     'max_iter': -1,
     'random_state': None
 }

watex.utils.funcutils.get_xy_coordinates(d, /, as_frame=False, drop_xy=False, raise_exception=True, verbose=0)[source]#

Check whether the coordinate values exist in the data

Parameters:

d (Dataframe) – Frame that is expected to contain the longitude/latitude or easting/northing coordinates. Note if all types of coordinates are included in the data frame, the longitude/latitude takes the priority.
as_frame (bool, default= False,) – Returns the coordinates values if included in the data as a frame rather than computing the middle points of the line
drop_xy (bool, default=False,) – Drop the coordinates in the data and return the data transformed inplace
raise_exception (bool, default=True) – raise error message if data is not a dataframe. If set to False, exception is converted to a warning instead. To mute the warning set raise_exception to mute
verbose (int, default=0) – Send message whether coordinates are detected.

Returns:

xy, d, xynames –

xytuple of float ( longitude, latitude) or (easting/northing ): if as_frame is set to True.

d: Dataframe transformed (coordinated removed ) or not xynames: str, the name of coordinates detected.

Return type:

Tuple

Examples

>>> import watex as wx
>>> from watex.utils.funcutils import get_xy_coordinates
>>> testdata = wx.make_erp ( n_stations =7, seed =42 ).frame
>>> xy, d, xynames = get_xy_coordinates ( testdata,  )
>>> xy , xynames
((110.48627946874444, 26.051952363176344), ('longitude', 'latitude'))
>>> xy, d, xynames = get_xy_coordinates ( testdata, as_frame =True  )
>>> xy.head(2)
    longitude   latitude        easting      northing
0  110.485833  26.051389  448565.380621  2.881476e+06
1  110.485982  26.051577  448580.339199  2.881497e+06
>>> # remove longitude and  lat in data
>>> testdata = testdata.drop (columns =['longitude', 'latitude'])
>>> xy, d, xynames = get_xy_coordinates ( testdata, as_frame =True  )
>>> xy.head(2)
         easting      northing
0  448565.380621  2.881476e+06
1  448580.339199  2.881497e+06
>>> # note testdata should be transformed inplace when drop_xy is set to True
>>> xy, d, xynames = get_xy_coordinates ( testdata, drop_xy =True)
>>> xy, xynames
((448610.25612032827, 2881538.4380570543), ('easting', 'northing'))
>>> d.head(2)
   station  resistivity
0      0.0          1.0
1     20.0        167.5
>>> testdata.head(2) # coordinates are henceforth been dropped
   station  resistivity
0      0.0          1.0
1     20.0        167.5
>>> xy, d, xynames = get_xy_coordinates ( testdata, drop_xy =True)
>>> xy, xynames
(None, ())
>>> d.head(2)
   station  resistivity
0      0.0          1.0
1     20.0        167.5

watex.utils.funcutils.hex_to_rgb(c, /)[source]#: Convert colors Hexadecimal to RGB

watex.utils.funcutils.interpol_scipy(x_value, y_value, x_new, kind='linear', plot=False, fill='extrapolate')[source]#

function to interpolate data

Parameters:

x_value (*) – value on array data : original abscissA
y_value (*) – value on array data : original coordinates (slope)
x_new (*) – new value of absciss you want to interpolate data
kind (*) – projection kind maybe : “linear”, “cubic”
fill (*) – kind of extraolation, if None , *spi will use constraint interpolation can be “extrapolate” to fill_value.
plot (*) – Set to True to see a wiewer graph

Returns:

y_new ,new function interplolate values .

Return type:

np.ndarray

Example:

>>> import numpy as np
>>>  fill="extrapolate"
>>>  x=np.linspace(0,15,10)
>>>  y=np.random.randn(10)
>>>  x_=np.linspace(0,20,15)
>>>  ss=interpol_Scipy(x_value=x, y_value=y, x_new=x_, kind="linear")
>>>  ss

watex.utils.funcutils.interpolate_grid(arr, /, method='cubic', fill_value='auto', view=False)[source]#

Interpolate data containing missing values.

Parameters:

arr (ArrayLike2D) – Two dimensional array for interpolation
method (str, default='cubic') – kind of interpolation. It could be [‘nearest’|’linear’|’cubic’].
fill_value (float, str, default='auto') – Fill the interpolated grid at the egdes or surrounding NaN with a filled value. The auto fill use the forward and backward fill stragety.
view (bool, default=False,) – Quick visualize the interpolated grid.

Returns:

arri – Interpolated 2D grid.

Return type:

ArrayLike2d

See also

spi.griddata: Scipy interpolate Grid data
fillNaN: Fill missing data strategy.

Examples

>>> import numpy as np
>>> from watex.utils.funcutils import interpolate_grid
>>> x = [28, np.nan, 50, 60] ; y = [np.nan, 1000, 2000, 3000]
>>> xy = np.vstack ((x, y)).T
>>> xyi = interpolate_grid (xy, view=True )
>>> xyi
array([[  28.        ,   22.78880936,   50.        ,   60.        ],
       [1000.        , 1000.        , 2000.        , 3000.        ]])

watex.utils.funcutils.is_depth_in(X, name, columns=None, error='ignore')[source]#

Assert wether depth exists in the data from column attributes.

If name is an integer value, it assumes to be the index in the columns of the dataframe if not exist , a warming will be show to user.

Parameters:

X – dataframe dataframe containing the data for plotting
columns – list, New labels to replace the columns in the dataframe. If given , it should fit the number of colums of X.
name – str, int depth name in the dataframe or index to retreive the name of the depth in dataframe
error – str , default=’ignore’ Raise or ignore when depth is not found in the dataframe. Whe error is set to ignore, a pseudo-depth is created using the lenght of the the dataframe, otherwise a valueError raises.

Returns:

X, depth Dataframe without the depth columns and depth values.

watex.utils.funcutils.is_in_if(o, /, items, error='raise', return_diff=False, return_intersect=False)[source]#

Raise error if item is not found in the iterable object ‘o’

Parameters:

o – unhashable type, iterable object, object for checkin. It assumes to be an iterable from which ‘items’ is premused to be in.
items – str or list, Items to assert whether it is in o or not.
error – str, default=’raise’ raise or ignore error when none item is found in o.
return_diff – bool, returns the difference items which is/are not included in ‘items’ if return_diff is True, will put error to ignore systematically.

:param return_intersect:bool,default=False: returns items as the intersection between o and items.

Raise:

ValueError raise ValueError if items not in o.

Returns:

list, s : object found in o` or the difference object i.e the object that is not in `items` provided that `error` is set to ``ignore. Note that if None object is found and error is ignore , it will return None, otherwise, a ValueError raises.

Example:

>>> from watex.datasets import load_hlogs
>>> from watex.utils.funcutils import is_in_if
>>> X0, _= load_hlogs (as_frame =True )
>>> is_in_if  (X0 , items= ['depth_top', 'top'])
... ValueError: Item 'top' is missing in the object
>>> is_in_if (X0, ['depth_top', 'top'] , error ='ignore')
... ['depth_top']
>>> is_in_if (X0, ['depth_top', 'top'] , error ='ignore',
               return_diff= True)
... ['sp',
 'well_diameter',
 'layer_thickness',
 'natural_gamma',
 'short_distance_gamma',
 'strata_name',
 'gamma_gamma',
 'depth_bottom',
 'rock_name',
 'resistivity',
 'hole_id']

watex.utils.funcutils.is_installing(module, upgrade=True, action=True, DEVNULL=False, verbose=0, **subpkws)[source]#

Install or uninstall a module/package using the subprocess under the hood.

Parameters:

module (str,) – the module or library name to install using Python Index Package PIP
upgrade (bool,) – install the lastest version of the package. default is True.
DEVNULL (bool,) – decline the stdoutput the message in the console
action (str,bool) – Action to perform. ‘install’ or ‘uninstall’ a package. default is True which means ‘intall’.
verbose (int, Optional) – Control the verbosity i.e output a message. High level means more messages. default is 0.
subpkws (dict,) – additional subprocess keywords arguments

Returns:

success – whether the package is sucessfully installed or not.

Return type:

bool

Example

>>> from watex import is_installing
>>> is_installing(
    'tqdm', action ='install', DEVNULL=True, verbose =1)
>>> is_installing(
    'tqdm', action ='uninstall', verbose =1)

watex.utils.funcutils.is_iterable(y, /, exclude_string=False, transform=False, parse_string=False)[source]#

Asserts iterable object and returns ‘True’ or ‘False’

Function can also transform a non-iterable object to an iterable if transform is set to True.

Parameters:

y – any, object to be asserted
exclude_string – bool, does not consider string as an iterable object if y is passed as a string object.
transform – bool, transform y to an iterable objects. But default puts y in a list object.
parse_string – bool, parse string and convert the list of string into iterable object is the y is a string object and containg the word separator character ‘[#&.*@!_,;s-]’. Refer to the function str2columns() documentation.

Returns:

bool, or iterable object if transform is set to True.

Note

Parameter parse_string expects transform to be True, otherwise a ValueError will raise. Note is_iterable() is not dedicated for string parsing. It parses string using the default behaviour of str2columns(). Use the latter for string parsing instead.

Examples:

>>> from watex.funcutils.is_iterable
>>> is_iterable ('iterable', exclude_string= True )
Out[28]: False
>>> is_iterable ('iterable', exclude_string= True , transform =True)
Out[29]: ['iterable']
>>> is_iterable ('iterable', transform =True)
Out[30]: 'iterable'
>>> is_iterable ('iterable', transform =True, parse_string=True)
Out[31]: ['iterable']
>>> is_iterable ('iterable', transform =True, exclude_string =True,
                 parse_string=True)
Out[32]: ['iterable']
>>> is_iterable ('parse iterable object', parse_string=True,
                 transform =True)
Out[40]: ['parse', 'iterable', 'object']

watex.utils.funcutils.ismissing(refarr, arr, fill_value=nan, return_index=False)[source]#

Get the missing values in array-like and fill it to match the length of the reference array.

The function makes sense especially for frequency interpollation in the ‘attenuation band’ when using the audio-frequency magnetotelluric methods.

Parameters:

arr – array-like- Array to be extended with fill value. It should be shorter than the refarr. Otherwise it returns the same array arr
refarr – array-like- the reference array. It should have a greater length than the array
fill_value – float - Value to fill the arr to match the length of the refarr.
return_index – bool or str - array-like, index of the elements element in arr. Default is False. Any other value should returns the mask of existing element in reference array

Returns:

array and values missings or indexes in reference array.

Example:

>>> import numpy as np
>>> from watex.utils.funcutils import ismissing
>>> refreq = np.linspace(7e7, 1e0, 20) # 20 frequencies as reference
>>> # remove the value between index 7 to 12 and stack again
>>> freq = np.hstack ((refreq.copy()[:7], refreq.copy()[12:] ))
>>> f, m  = ismissing (refreq, freq)
>>> f, m
...array([7.00000000e+07, 6.63157895e+07, 6.26315791e+07, 5.89473686e+07,
       5.52631581e+07, 5.15789476e+07, 4.78947372e+07,            nan,
                  nan,            nan,            nan,            nan,
       2.57894743e+07, 2.21052638e+07, 1.84210534e+07, 1.47368429e+07,
       1.10526324e+07, 7.36842195e+06, 3.68421147e+06, 1.00000000e+00])
>>> m # missing values
... array([44210526.68421052, 40526316.21052632, 36842105.73684211,
       33157895.2631579 , 29473684.78947368])
>>>  _, m_ix  = ismissing (refreq, freq, return_index =True)
>>> m_ix
... array([ 7,  8,  9, 10, 11], dtype=int64)
>>> # assert the missing values from reference values
>>> refreq[m_ix ] # is equal to m
... array([44210526.68421052, 40526316.21052632, 36842105.73684211,
       33157895.2631579 , 29473684.78947368])

watex.utils.funcutils.key_checker(keys, /, valid_keys, regex=None, pattern=None, deep_search=Ellipsis)[source]#

check whether a give key exists in valid_keys and return a list if many keys are found.

Parameters:

keys (str, list of str) – Key value to find in the valid_keys
valid_keys (list) – List of valid keys by default.

regex (re object,) –

Regular expresion object. the default is:

>>> import re
>>> re.compile (r'[_#&*@!_,;\s-]\s*', flags=re.IGNORECASE)

pattern (str, default = ‘[_#&*@!_,;s-]s*’) – The base pattern to split the text into a columns
deep_search (bool, default=False) –
If deep-search, the key finder is no sensistive to lower/upper case or whether a numeric data is included.

New in version 0.2.5.

Returns:

keys – List of keys that exists in the valid_keys.

Return type:

str, list ,

Examples

>>> from watex.utils.funcutils import key_checker
>>> key_checker('h502', valid_keys= ['h502', 'h253','h2601'])
Out[68]: 'h502'
>>> key_checker('h502+h2601', valid_keys= ['h502', 'h253','h2601'])
Out[69]: ['h502', 'h2601']
>>> key_checker('h502 h2601', valid_keys= ['h502', 'h253','h2601'])
Out[70]: ['h502', 'h2601']
>>> key_checker(['h502',  'h2601'], valid_keys= ['h502', 'h253','h2601'])
Out[73]: ['h502', 'h2601']
>>> key_checker(['h502',  'h2602'], valid_keys= ['h502', 'h253','h2601'])
UserWarning: key 'h2602' is missing in ['h502', 'h2602']
Out[82]: 'h502'
>>> key_checker(['502',  'H2601'], valid_keys= ['h502', 'h253','h2601'],
                deep_search=True )
Out[57]: ['h502', 'h2601']

watex.utils.funcutils.key_search(keys, /, default_keys, parse_keys=True, regex=None, pattern=None, deep=Ellipsis, raise_exception=Ellipsis)[source]#

Find key in a list of default keys and select the best match.

Parameters:

keys (str or list) – The string or a list of key. When multiple keys is passed as a string, use the space for key separating.
default_keys (str or list) – The likehood key to find. Can be a litteral text. When a litteral text is passed, it is better to provide the regex in order to skip some character to parse the text properly.
parse_keys (bool, default=True) –
Parse litteral string using default pattern and regex.

New in version 0.2.7.
regex (re object,) –
Regular expresion object. Regex is important to specify the kind of data to parse. the default is:
```
>>> import re
>>> re.compile (r'[_#&*@!_,;\s-]\s*', flags=re.IGNORECASE)
```
pattern (str, default = ‘[_#&*@!_,;s-]s*’) – The base pattern to split the text into a columns. Pattern is important especially when some character are considers as a part of word but they are not a separator. For example a data columns with a name ‘DH_Azimuth’, if a pattern is not explicitely provided, the default pattern will parse as two separated word which is far from the expected results.
deep (bool, default=False) – Not sensistive to uppercase.
raise_exception (bool, default=False) – raise error when key is not find.

Returns:

list

Return type:

list of valid keys or None if not find ( default)

Examples

>>> from watex.utils.funcutils import key_search
>>> key_search('h502-hh2601', default_keys= ['h502', 'h253','HH2601'])
Out[44]: ['h502']
>>> key_search('h502-hh2601', default_keys= ['h502', 'h253','HH2601'],
               deep=True)
Out[46]: ['h502', 'HH2601']
>>> key_search('253', default_keys= ("I m here to find key among h502,
                                         h253 and HH2601"))
Out[53]: ['h253']
>>> key_search ('east', default_keys= ['DH_East', 'DH_North']  , deep =True,)
Out[37]: ['East']
key_search ('east', default_keys= ['DH_East', 'DH_North'],
            deep =True,parse_keys= False)
Out[39]: ['DH_East']

watex.utils.funcutils.listing_items_format(lst, /, begintext='', endtext='', bullet='-', enum=True, lstyle=None, space=3, inline=False, verbose=True)[source]#

Format list by enumerate them successively with carriage return

Parameters:

lst – list, object for listening
begintext – str, Text to display at the beginning of listing the items in lst.
endtext – str, Text to display at the end of the listing items in lst.

:param enum:bool, default=True,: Count the number of items in lst and display it

Parameters:: lstyle – str, default =None listing marker.

:param bullet:str, default=’-’: symbol that is used to introduce item if enum is set to False.

Parameters:

space – int, number of space to keep before each outputted item in lst
inline – bool, default=False, Display all element inline rather than carriage return every times.
verbose – bool, Always True for print. If set to False, return list of string litteral text.

Returns:

None or str None or string litteral if verbose is set to False.

Examples

>>> from watex.utils.funcutils import listing_items_format
>>> litems = ['hole_number', 'depth_top', 'depth_bottom', 'strata_name',
            'rock_name','thickness', 'resistivity', 'gamma_gamma',
            'natural_gamma', 'sp','short_distance_gamma', 'well_diameter']
>>> listing_items_format (litems , 'Features' ,
                           'have been successfully drop.' ,
                          lstyle ='.', space=3)

watex.utils.funcutils.load_serialized_data(filename, verbose=0)[source]#

Load data from dumped file.

Parameters:

filename – str or path-like object Name of dumped data file.

Returns:

Data reloaded from dumped file.

Example:

>>> from watex.utils.functils import load_serialized_data
>>> data = load_serialized_data(
...    filename = '_memory_/__mymemoryfile.2021-10-29_14-49-35.647295__.pkl',
...    verbose =3)

watex.utils.funcutils.make_arr_consistent(refarr, arr, fill_value=nan, return_index=False, method='naive')[source]#

Make arr to be consistent with the reference array refarr. Fill the missing value with param fill_value.

Note that it does care of the position of the value in the array. Use Numpy digitize to compute the bins. The array caveat here is the bins must be monotonically decreasing or increasing.

If the values in arr are present in refarr, the position of arr in new consistent array should be located decreasing or increasing order.

Parameters:

arr (array-like 1d,) – Array to extended with fill value. It should be shorter than the refarr.
refarr (array-like- the reference array. It should have a greater) – length than the array arr.
fill_value (float,) – Value to fill the arr to match the length of the refarr.
return_index (bool or str, default=True) –

index of the position of the elements in refarr.
Default is False. If mask should return the

mask of existing element in reference array
method (str, default="naive") –
Is the method used to find the right position of items in arr based on the reference array. - naive, considers the length of arr must fit the number of

items that should be visible in the consistent array. This method erases the remaining bins values out of length of arr.
- ``strict` did the same but rather than considering the length,
  it considers the maximum values in the arr. It assumes that arr is sorted in ascending order. This methods is usefull for plotting a specific stations since the station loactions are sorted in ascending order.

Returns:

index: indices of the position of arr items in refarr. mask: bool of the position arr items in refarr t: new consistent array with the same length as refarr

Return type:

non_zero_index , mask or t

Examples

>>> import numpy as np
>>> from watex.utils.funcutils import make_arr_consistent
>>> refarr = np.arange (12)
>>> arr = np.arange (7, 10)
>>> make_arr_consistent (refarr, arr )
Out[84]: array([nan, nan, nan, nan, nan, nan, nan,  7.,  8.,  9., nan, nan])
>>> make_arr_consistent (refarr, arr , return_index =True )
Out[104]: array([7, 8, 9], dtype=int64)
>>> make_arr_consistent (refarr, arr , return_index ="mask" )
Out[105]:
array([False, False, False, False, False, False, False,  True,  True,
        True, False, False])
>>> a = np.arange ( 12 ); b = np.linspace (7, 10 , 7)
>>> make_arr_consistent (a, b )
Out[112]: array([nan, nan, nan, nan, nan, nan, nan,  7.,  8.,  9., 10., 11.])
>>> make_arr_consistent (a, b ,method='strict')
Out[114]: array([nan, nan, nan, nan, nan, nan, nan,  7.,  8.,  9., 10., nan])

watex.utils.funcutils.make_ids(arr, prefix=None, how='py', skip=False)[source]#

Generate auto Id according to the number of given sites.

Parameters:

arr – Iterable object to generate an id site . For instance it can be the array-like or list of EDI object that composed a collection of watex.edi.Edi object.
prefix (str) – string value to add as prefix of given id. Prefix can be the site name.
how – Mode to index the station. Default is ‘Python indexing’ i.e. the counting starts by 0. Any other mode will start the counting by 1.
skip (bool) – skip the strong formatage. the formatage acccording to the number of collected file.

Returns:

ID number formated

Return type:

list

Example:

>>> import numpy as np
>>> from watex.utils.func_utils import make_ids
>>> values = ['edi1', 'edi2', 'edi3']
>>> make_ids (values, 'ix')
... ['ix0', 'ix1', 'ix2']
>>> data = np.random.randn(20)
>>>  make_ids (data, prefix ='line', how=None)
... ['line01','line02','line03', ... , line20]
>>> make_ids (data, prefix ='line', how=None, skip =True)
... ['line1','line2','line3',..., line20]

watex.utils.funcutils.make_introspection(Obj, subObj)[source]#

Make introspection by using the attributes of instance created to populate the new classes created.

Parameters:

Obj – callable New object to fully inherits of subObject attributes.
subObj – Callable Instance created.

watex.utils.funcutils.make_obj_consistent_if(item=Ellipsis, default=Ellipsis, size=None, from_index=True)[source]#

Combine default values to item to create default consistent iterable objects.

This is valid if the size of item does not fit the number of expected iterable objects.

Parameters:

item (Any) – Object to construct it default values
default (Any) – Value to hold in the case the items does not match the size of given items
size (int, Optional) – Number of items to return.
from_index (bool, default=True) – make an item size to match the exact size of given items

Returns:

item

Return type:

Iterable object that contain default values.

Examples

>>> from watex.utils.funcutils import make_obj_consistent_if
>>> from watex.exlib import SVC, LogisticRegression, XGBClassifier
>>> classifiers = ["SVC", "LogisticRegression", "XGBClassifier"]
>>> classifier_names = ['SVC', 'LR']
>>> make_obj_consistent_if (classifiers, default = classifier_names )
['SVC', 'LogisticRegression', 'XGBClassifier']
>>> make_obj_consistent_if (classifier_names, from_index =False  )
['SVC', 'LR']
>>> >>> make_obj_consistent_if ( classifier_names,
                                 default= classifiers, size =3 ,
                                 from_index =False  )
['SVC', 'LR', 'SVC']

watex.utils.funcutils.map_specific_columns(X, ufunc, columns_to_skip=None, pattern=None, inplace=False, **kws)[source]#

Apply function to a specific columns is the dataframe.

It is possible to skip some columns that we want operation to not be performed.

Parameters:

X (dataframe,) – pandas dataframe with valid columns
ufunc (callable,) – Universal function that can be applying to the dataframe.
columns_to_skip (list or str ,) – List of columns to skip. If given as string and separed by the default pattern items, it should be converted to a list and make sure the columns name exist in the dataframe. Otherwise an error with raise.
pattern (str, default = '[#&*@!,;s]s*') –
The base pattern to split the text in column2skip into a columns For instance, the following string coulb be splitted to:
```
'depth_top, thickness, sp, gamma_gamma' ->
['depth_top', 'thickness', 'sp', 'gamma_gamma']
```
Refer to str2columns() for further details.
inplace (bool, default=True) – Modified dataframe in place and return None, otherwise return a new dataframe
kws (dict,) – Keywords argument passed to :func: pandas.DataFrame.apply function

Returns:

X – Dataframe modified inplace with values computed using the given func`except the skipped columns, or ``None` if inplace is True.

Return type:

Dataframe or None

Examples

>>> from watex.datasets import load_hlogs
>>> from watex.utils.plotutils import map_specific_columns
>>> X0, _= load_hlogs (as_frame =True )
>>> # let visualize the  first3 values of `sp` and `resistivity` keys
>>> X0['sp'][:3] , X0['resistivity'][:3]
... (0   -1.580000
     1   -1.580000
     2   -1.922632
     Name: sp, dtype: float64,
     0    15.919130
     1    16.000000
     2    24.422316
     Name: resistivity, dtype: float64)
>>> column2skip = ['hole_id','depth_top', 'depth_bottom',
                  'strata_name', 'rock_name', 'well_diameter', 'sp']
>>> map_specific_columns (X0, ufunc = np.log10, column2skip)
>>> # now let visualize the same keys values
>>> X0['sp'][:3] , X0['resistivity'][:3]
... (0   -1.580000
     1   -1.580000
     2   -1.922632
     Name: sp, dtype: float64,
     0    1.201919
     1    1.204120
     2    1.387787
     Name: resistivity, dtype: float64)
>>> # it is obvious the `resistiviy` values is log10
>>> # while `sp` stil remains the same

watex.utils.funcutils.minimum_parser_to_write_edi(edilines, parser='=')[source]#

This fonction validates edifile for writing , string with egal. we assume that dictionnary in list will be for definemeasurment E and H fied.

Parameters:

edilines (list) – list of item to parse
parser (str) – the egal is use to parser edifile . can be changed, default is =

watex.utils.funcutils.move_cfile(cfile, savepath=None, **ckws)[source]#

Move file to its savepath and output message.

If path does not exist, should create one to save data. :param cfile: name of the configuration file :param savepath: Path-like object :param dpath: default path

Returns:

configuration file
out message

watex.utils.funcutils.normalizer(arr, /, method='naive')[source]#

Normalize values to be between 0 and 1.

This normlizer handles NaN values translates data individually such that it is in the given range on the training set, e.g. between zero and one.

Note that when the transformation is set to the method ='MinMax', The transformation is given by:

X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
X_normed = X_std * (max - min) + min

where min, max = feature_range.

This transformation is often used as an alternative to zero mean, unit variance scaling.

Parameters:

arr (Arraylike,) – Array to normalize, can contain NaN values.
method (str,) – Can be use ‘scikit-learn’ MinMaxScaler for normalization. Any other values used the naive normalization.

Returns:

arr_norm

Return type:

Normalized array.

Examples

>>> import numpy as np
>>> from watex.utils.funcutils import normalizer
>>> np.random.seed (42)
>>> arr = np.random.randn (3, 2 )
array([[ 0.49671415, -0.1382643 ],
       [ 0.64768854,  1.52302986],
       [-0.23415337, -0.23413696]])
>>> normalizer (arr )
array([[4.15931313e-01, 5.45697636e-02],
       [5.01849720e-01, 1.00000000e+00],
       [0.00000000e+00, 9.34323403e-06]])
>>> normalizer (arr , method ='min-max')  # normalize data along axis=0
array([[0.82879654, 0.05456093],
       [1.        , 1.        ],
       [0.        , 0.        ]])
>>> arr [0, 1] = np.nan; arr [1, 0] = np.nan
>>> normalizer (arr )
array([[4.15931313e-01,            nan],
       [           nan, 1.00000000e+00],
       [0.00000000e+00, 9.34323403e-06]])
>>> normalizer (arr , method ='min-max')
array([[ 1., nan],
       [nan,  1.],
       [ 0.,  0.]])

watex.utils.funcutils.numstr2dms(sdigit, /, sanitize=True, func=None, args=(), regex=None, pattern=None, return_values=Ellipsis, **kws)[source]#

Convert numerical digit string to DD:MM:SS

Note that the any string digit for Minutes and seconds must be composed of two values i.e the function accepts at least six digits, otherwise an error occurs. For instance the value between [0-9] must be prefixed by 0 beforehand. Here is an example for designating 1degree-1min-1seconds:

sdigit= 1'1'1" --> 01'01'01 or 010101

where 010101 is the right arguments for 111.

Parameters:

sdigit (str,) – Digit string composing of unique values.
func (Callable,) – Function uses to parse digit. Function must return a string values. Any other values should be convert to str
args (tuple) – Function func positional arguments
regex (re object,) –
Regular expresion object. Regex is important to specify the kind of data to parse. the default is:
```
>>> import re
>>> re.compile (r'[_#&@!+,;:"'\s-]\s*', flags=re.IGNORECASE)
```
pattern (str, default = ‘[_#&@!+,;:”’s-]s*’) – Specific pattern for sanitizing sdigit. For instance remove undesirable non-character.
sanitize (bool=default=True) – Remove undesirable character using the default argument of pattern parameter.
return_values (bool, default=False,) – return the DD:MM:SS into a tuple of (DD,MM,SS)

Returns:

sdigit/tuple – DD:MM:SS or tuple of ( DD, MM, SS)

Return type:

str, tuple

Examples

>>> from watex.utils.funcutils import numstr2dms
>>> numstr2dms ("1134132.08")
Out[17]: '113:41:32.08
>>> numstr2dms ("13'41'32.08")
Out[18]: '13:41:32.08'
>>> numstr2dms ("11:34:13:2.08", return_values=True)
Out[19]: (113.0, 41.0, 32.08)

watex.utils.funcutils.parse_attrs(attr, /, regex=None)[source]#

Parse attributes using the regular expression.

Remove all string non-alphanumeric and some operator indicators, and fetch attributes names.

Parameters:

attr (str, text litteral containing the attributes) – names

regex (re object, default is) –

Regular expresion object. the default is:

>>> import re
>>> re.compile (r'per|mod|times|add|sub|[_#&*@!_,;\s-]\s*',
                    flags=re.IGNORECASE)

Returns:

attr

Return type:

List of attributes

Example

>>> from watex.utils.funcutils import parse_attrs
>>> parse_attrs('lwi_sub_ohmSmulmagnitude')
... ['lwi', 'ohmS', 'magnitude']

watex.utils.funcutils.parse_csv(csv_fn=None, data=None, todo='reader', fieldnames=None, savepath=None, header=False, verbose=0, **csvkws)[source]#

Parse comma separated file or collect data from CSV.

Parameters:

csv_fn – csv filename,or output CSV name if data is given and todo is set to write|dictwriter.Otherwise the CSV output filename should be the c.data or the given variable name.
data – Sequence Data in Python obj to write.
todo – Action to perform with JSON: - reader|DictReader: Load data from the JSON file - writer|DictWriter: Write data from the Python object and create a CSV file
savepath – If default should save the csv_fn If path does not exist, should save to the <’_savecsv_’> default path.
fieldnames –
is a sequence of keys that identify the order in which values in the dictionary passed to the writerow()

method are written csv_fn file.
savepath – If default should save the csv_fn If path does not exist, should save to the <’_savecsv_’> default path .
verbose – int, control the verbosity. Output messages
csvkws – additional keywords csv class arguments

https://stackoverflow.com/questions/10373247/how-do-i-write-a-python-dictionary-to-a-csv-file: …

Example:

>>> import watex.utils.funcutils as FU
>>> PATH = 'data/model'
>>> k_ =['model', 'iter', 'mesh', 'data']
>>> try :
    INVERS_KWS = {
        s +'_fn':os.path.join(PATH, file)
        for file in os.listdir(PATH)
                  for s in k_ if file.lower().find(s)>=0
                  }
except :
    INVERS=dict()
>>> TRES=[10, 66,  70, 100, 1000, 3000]# 7000]
>>> LNS =['river water','fracture zone', 'MWG', 'LWG',
      'granite', 'igneous rocks', 'basement rocks']
>>> geo_kws ={'oc2d': INVERS_KWS,
              'TRES':TRES, 'LN':LNS}
>>> # write data and save to  'csvtest.csv' file
>>> # here the `data` is a sequence of dictionary geo_kws
>>> FU.parse_csv(csv_fn = 'csvtest.csv',data = [geo_kws],
                 fieldnames = geo_kws.keys(),todo= 'dictwriter',
                 savepath = 'data/saveCSV')
# collect csv data from the 'csvtest.csv' file
>>> FU.parse_csv(csv_fn ='data/saveCSV/csvtest.csv',
                 todo='dictreader',fieldnames = geo_kws.keys()
                 )

watex.utils.funcutils.parse_json(json_fn=None, data=None, todo='load', savepath=None, verbose=0, **jsonkws)[source]#

Parse Java Script Object Notation file and collect data from JSON config file.

Parameters:

json_fn – Json filename, URL or output JSON name if data is given and todo is set to dump.Otherwise the JSON output filename should be the data or the given variable name.
data – Data in Python obj to serialize.
todo – Action to perform with JSON: - load: Load data from the JSON file - dump: serialize data from the Python object and create a JSON file
savepath – If default should save the json_fn If path does not exist, should save to the <’_savejson_’> default path .
verbose – int, control the verbosity. Output messages

Example:

>>> PATH = 'data/model'
>>> k_ =['model', 'iter', 'mesh', 'data']
>>> try :
    INVERS_KWS = {
        s +'_fn':os.path.join(PATH, file)
        for file in os.listdir(PATH)
                  for s in k_ if file.lower().find(s)>=0
                  }
except :
    INVERS=dict()
>>> TRES=[10, 66,  70, 100, 1000, 3000]# 7000]
>>> LNS =['river water','fracture zone', 'MWG', 'LWG',
      'granite', 'igneous rocks', 'basement rocks']
>>> import watex.utils.funcutils as FU
>>> geo_kws ={'oc2d': INVERS_KWS,
              'TRES':TRES, 'LN':LNS}
# serialize json data and save to  'jsontest.json' file
>>> FU.parse_json(json_fn = 'jsontest.json',
                  data=geo_kws, todo='dump', indent=3,
                  savepath ='data/saveJSON', sort_keys=True)
# Load data from 'jsontest.json' file.
>>> FU.parse_json(json_fn='data/saveJSON/jsontest.json', todo ='load')

watex.utils.funcutils.parse_md_data(pf, delimiter=':')[source]#

watex.utils.funcutils.parse_yaml(yml_fn=None, data=None, todo='load', savepath=None, verbose=0, **ymlkws)[source]#

Parse yml file and collect data from YAML config file.

Parameters:

yml_fn – yaml filename and can be the output YAML name if data is given and todo is set to dump.Otherwise the YAML output filename should be the data or the given variable name.
data – Data in Python obj to serialize.
todo – Action to perform with YAML: - load: Load data from the YAML file - dump: serialize data from the Python object and create a YAML file
savepath – If default should save the yml_fn to the default path otherwise should store to the convenient path. If path does not exist, should set to the default path.
verbose – int, control the verbosity. Output messages

watex.utils.funcutils.pretty_printer(clfs, clf_score=None, scoring=None, **kws)[source]#

Format and pretty print messages after gridSearch using multiples estimators.

Display for each estimator, its name, it best params with higher score and the mean scores.

Parameters:

clfs (Callables) – classifiers or estimators
clf_scores (array-like) – for single classifier, usefull to provided the cross validation score.
scoring (str) – Scoring used for grid search.

watex.utils.funcutils.print_cmsg(cfile, todo='load', config='YAML')[source]#

Output configuration message.

Parameters:

cfile – name of the configuration file
todo – Action to perform with config file. Can be load or dump
config – Type of configuration file. Can be [YAML|CSV|JSON]

watex.utils.funcutils.random_sampling(d, /, samples=None, replace=False, random_state=None, shuffle=True)[source]#

Sampling data.

Parameters:

d ({array-like, sparse matrix} of shape (n_samples, n_features)) – Data for sampling, where n_samples is the number of samples and n_features is the number of features.
samples (int,optional) – Ratio or number of items from axis to return. Default = 1 if samples is None.
replace (bool, default=False) – Allow or disallow sampling of the same row more than once.
random_state (int, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional) – If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given.
split_Xy –

Returns:

d ({array-like, sparse matrix} of shape (n_samples, n_features))
samples data based on the given samples.

Examples

>>> from watex.utils.funcutils import random_sampling
>>> from watex.datasets import load_hlogs
>>> data= load_hlogs().frame
>>> random_sampling( data, samples = 7 ).shape
(7, 27)

watex.utils.funcutils.random_selector(arr, /, value, seed=None, shuffle=False)[source]#

Randomly select the number of values in array.

Parameters:

arr (ArrayLike) – Array of values
value (float, arraylike) – If float value is passed, it indicates the number of values to select among the length of arr. If array (value) is passed, it should be self contain in the given arr`. However if ``string is given and contain the %, it calculates the ratio of number to randomly selected.
seed (int, Optional) – Allow retrieving the identical value randomly selected in the given array.
suffle (bool, False) – If True , shuffle the selected values.

Returns:

arr

Return type:

Array containing the selected values

Examples

>>> import numpy as np
>>> from watex.utils.funcutils import random_selector
>>> dat= np.arange (42 )
>>> random_selector (dat , 7, seed = 42 )
array([0, 1, 2, 3, 4, 5, 6])
>>> random_selector ( dat, ( 23, 13 , 7))
array([ 7, 13, 23])
>>> random_selector ( dat , "7%", seed =42 )
array([0, 1])
>>> random_selector ( dat , "70%", seed =42 , shuffle =True )
array([ 0,  5, 20, 25, 13,  7, 22, 10, 12, 27, 23, 21, 16,  3,  1, 17,  8,
        6,  4,  2, 19, 11, 18, 24, 14, 15,  9, 28, 26])

watex.utils.funcutils.random_state_validator(seed)[source]#

Turn seed into a np.random.RandomState instance.

Parameters:: seed (None, int or instance of RandomState) – If seed is None, return the RandomState singleton used by np.random. If seed is an int, return a new RandomState instance seeded with seed. If seed is already a RandomState instance, return it. Otherwise raise ValueError.
Returns:: The random state object based on seed parameter.
Return type:: numpy:numpy.random.RandomState

watex.utils.funcutils.read_from_excelsheets(erp_file=None)[source]#

Read all Excelsheets and build a list of dataframe of all sheets.

Parameters:: erp_file – Excell workbooks containing erp profile data.
Returns:: A list composed of the name of erp_file at index =0 and the datataframes.

watex.utils.funcutils.read_main(csv_fn, pf, delimiter=':', destfile='pme.en.csv')[source]#

watex.utils.funcutils.read_worksheets(*data)[source]#

Read sheets and returns a list of DataFrames and sheet names.

Parameters:

data: list of str: A collection of excel sheets files. Read only .xlsx files. Any other files raises an errors.

epositorieswatexdataerpsheetsgbalo.xlsx’

>>> data, snames =  read_worksheets (sheet_file )
>>> snames
['l11', 'l10', 'l02']
>>> data, snames =  read_worksheets (os.path.dirname (sheet_file))
>>> snames
['l11', 'l10', 'l02', 'l12', 'l13']

watex.utils.funcutils.remove_outliers(ar, method='IQR', threshold=3.0, fill_value=None, axis=1)[source]#

Efficient strategy to remove outliers in the data.

Indeed, an outlier is the data point of the given sample, observation, or distribution that shall lie outside the overall pattern. A commonly used rule says that one will consider a data point an outlier if it has more than 1.5 IQR below the first quartile or above the third.

Two approaches is used to remove the outliers.

Inter Quartile Range (IQR) IQR is the most commonly used and most trusted approach used in the research field. Said differently, low outliers shall lie below Q1-1.5 IQR, and high outliers shall lie Q3+1.5IQR. One needs to calculate median, quartiles, including IQR, Q1, and Q3.

\[ \begin{align}\begin{aligned}Q1 = 1/4(n + 1)\\Q3 = 1/4 (n + 1)\\Q2 = Q3 – Q1\end{aligned}\end{align} \]

To define the outlier base value is defined above and below datasets normal range namely Upper and Lower bounds, define the upper and the lower bound (1.5*IQR value is considered) :

\[ \begin{align}\begin{aligned}upper = Q3 +1.5*IQR\\lower = Q1 – 1.5*IQR\end{aligned}\end{align} \]

In the above formula as according to statistics, the 0.5 scale-up of \(IQR (new_IQR = IQR + 0.5*IQR)\) is taken, to consider all the data between 2.7 standard deviations in the Gaussian Distribution
Z-score Is also called a standard score. This value/score helps to understand that how far is the data point from the mean. And after setting up a threshold value one can utilize z score values of data points to define the outliers.

\[Zscore = ( ext{data_point} - ext{mean}) / ext{std. deviation}\]

Now to define an outlier threshold value is chosen which is generally 3.0. As 99.7% of the data points lie between +/- 3 standard deviation (using Gaussian Distribution approach).

Parameters:

ar (Arraylike, pd.dataframe) –
Arraylike containing outliers to remove.

New in version 0.2.7: Accepts dataframe and can remove outliers using the z_score.
method (str, default='IQR') – The selected approach to remove the outliers. It can be [‘IQR’|’Z-score’]. See Above for outlier explanations. Note that when selecting "z-score" the threshold value greatly influence the quality of data considering as ooutliers.
threshold (float, default=3) – Thershold values is useful for "z-score" as the value for considering data above as outliers.
fill_value (float, optional) – Value to replace the outliers. If not given, outliers are suppressed in the array.
axis (int, default=1) – axis from which to remove values. This is useful when two dimensional array is supplied. Default, delete outlier from the rows.

Returns:

arr – New array whith removed outliers.

Return type:

Array_like

Examples

>>> import numpy as np
>>> np.random.seed (42 )
>>> from watex.utils.funcutils import remove_outliers
>>> data = np.random.randn (7, 3 )
>>> data_r = remove_outliers ( data )
>>> data.shape , data_r.shape
(7, 3) (5, 3)
>>> remove_outliers ( data, fill_value =np.nan )
array([[ 0.49671415, -0.1382643 ,  0.64768854],
       [ 1.52302986, -0.23415337, -0.23413696],
       [ 1.57921282,  0.76743473, -0.46947439],
       [ 0.54256004, -0.46341769, -0.46572975],
       [ 0.24196227,         nan,         nan],
       [-0.56228753, -1.01283112,  0.31424733],
       [-0.90802408,         nan,  1.46564877]])
>>> # for one dimensional
>>> remove_outliers ( data[:, 0] , fill_value =np.nan )
array([ 0.49671415,  1.52302986,  1.57921282,  0.54256004,  0.24196227,
       -0.56228753,         nan])

watex.utils.funcutils.rename_files(src_files, /, dst_files, basename=None, extension=None, how='py', prefix=True, keep_copy=True, trailer='_', sortby=None, **kws)[source]#

Rename files in directory.

Parameters:

src_files (str, Path-like object) – Source files to rename
dst_files (str of PathLike object) – Destination files renamed.
extension (str, optional) – If a path is given in src_files, specifying the extension will just collect only files with this typical extensions.
basename (str, optional) – If dst_files is passed as Path-object, name should be need for a change, otherwise, the number is incremented using the Python index counting defined by the parameter ``how=py`
how (str, default='py') – The way to increment files when dst_files is given as a Path object. For instance, for a name=E_survey and prefix==True, the first file should be E_survey_00 if how='py' otherwise it should be E_survey_01.
prefix (bool, default=True) – Prefix is used to position the name before the number incrementation. If False and name is given, the number is positionning before the name. If True and not prefix for a name=E_survey, it should be 00_E_survey and 01_E_survey.
keep_copy (bool, default=True) – Keep a copy of the source files.
trailer (str, default='_',) – Item used to separate the basename for counter.
sortby (Regex or Callable,) – Key to sort the collection of the items when src_files is passed as a path-like object. This is usefull to keep order as the origin files especially when files includes a specific character. Furthermore [int| float |'num'|’digit’] sorted the files according to the number included in the filename if exists.
kws (dict) – keyword arguments passed to os.rename.

watex.utils.funcutils.repeat_item_insertion(text, /, pos, item='', fill_value='')[source]#

Insert character in text according from it position.

Parameters:

v (text) – Text
pos (int) – position where the item must be insert.
item (str,) – Item to insert at each position.
fill_value (str,) – Does nothing special; fill the the last position.

Returns:

text – New construct object.

Return type:

str,

Examples

>>> from watex.utils.funcutils import repeat_item_insertion
>>> repeat_item_insertion ( '0125356.45', pos=2, item=':' )
Out[65]: '01:25:35:6.45'
>>> repeat_item_insertion ( 'Function inserts car in text.', pos=10, item='TK' )
Out[69]: 'Function iTKnserts carTK in text.'

watex.utils.funcutils.replace_data(X, y=None, n_times=1, axis=0, reset_index=Ellipsis)[source]#

Replace items in data \(n\) times

Parameters:

X (Arraylike 1D or pd.DataFrame) – Data to replace. Note Sparse matrices is not allowed. Use random_sampling() instead.
y (Arraylike 1d.) – Preferably one dimensional data.
n_times (int,) – Number of times all items should be replaced in data.
reset_index (bool, default=False.) – If True and dataframe,Index is reset and dropped.

Returns:

X or (X, y) – Tuple is returned if y is passed.

Return type:

Tuple of data replaced

Examples

>>> import numpy as np
>>> from watex.utils.funcutils import replace_data
>>> X, y = np.random.randn ( 7, 2 ), np.arange(7)
>>> X.shape, y.shape
((7, 2), (7,))
>>> X_new, y_new = replace_data (X, y, n_times =10 )
>>> X_new.shape , y_new.shape
Out[158]: ((70, 2), (70,))

watex.utils.funcutils.repr_callable_obj(obj, skip=None)[source]#

Represent callable objects.

Format class, function and instances objects.

Parameters:

obj – class, func or instances object to format.
skip – str , attribute name that is not end with ‘_’ and whom it needs to be skipped.

Raises:

TypeError - If object is not a callable or instanciated.

Examples:

>>> from watex.utils.funcutils import repr_callable_obj
>>> from watex.methods.electrical import  ResistivityProfiling
>>> repr_callable_obj(ResistivityProfiling)
... 'ResistivityProfiling(station= None, dipole= 10.0,
        auto_station= False, kws= None)'
>>> robj= ResistivityProfiling (AB=200, MN=20, station ='S07')
>>> repr_callable_obj(robj)
... 'ResistivityProfiling(AB= 200, MN= 20, arrangememt= schlumberger, ... ,
    dipole= 10.0, station= S07, auto= False)'
>>> repr_callable_obj(robj.fit)
... 'fit(data= None, kws= None)'

watex.utils.funcutils.reshape(arr, axis=None)[source]#

Detect the array shape and reshape it accordingly, back to the given axis.

Parameters:

array – array_like with number of dimension equals to 1 or 2
axis – axis to reshape back array. If ‘axis’ is None and the number of dimension is greater than 1, it reshapes back array to array-like

Returns:

New reshaped array

Example:

>>> import numpy as np
>>> from watex.utils.funcutils import reshape
>>> array = np.random.randn(50 )
>>> array.shape
... (50,)
>>> ar1 = reshape(array, 1)
>>> ar1.shape
... (1, 50)
>>> ar2 =reshape(ar1 , 0)
>>> ar2.shape
... (50, 1)
>>> ar3 = reshape(ar2, axis = None)
>>> ar3.shape # goes back to the original array
>>> ar3.shape
... (50,)

watex.utils.funcutils.return_ctask(todo=None)[source]#

Get the convenient task to do if users misinput the todo action.

Parameters:

todo –

Action to perform: - load: Load data from the config [YAML|CSV|JSON] file - dump: serialize data from the Python object and

create a config [YAML|CSV|JSON] file.

watex.utils.funcutils.round_dipole_length(value, round_value=5.0)[source]#

small function to graduate dipole length 5 to 5. Goes to be reality and simple computation .

Parameters:: value (float) – value of dipole length
Returns:: value of dipole length rounded 5 to 5
Return type:: float

watex.utils.funcutils.sPath(name_of_path)[source]#

Savepath func. Create a path with name_of_path if path not exists.

Parameters:: name_of_path – str, Path-like object. If path does not exist, name_of_path should be created.

watex.utils.funcutils.sanitize_fdataset(_df)[source]#

Sanitize the feature dataset.

Recognize the columns provided by the users and resset according to the features labels disposals featureLabels.

watex.utils.funcutils.sanitize_frame_cols(d, /, func=None, regex=None, pattern=None, fill_pattern=None, inplace=False)[source]#

Remove an indesirable characters and returns new columns

Use regular expression for columns sanitizing

Parameters:

d (list, columns,) – columns to sanitize. It might contain a list of items to to polish. If dataframe or series are given, the dataframe columns and the name respectively will be polished and returns the same dataframe.
func (F, callable) – Universal function used to clean the columns

regex (re object,) –

Regular expresion object. the default is:

>>> import re
>>> re.compile (r'[_#&.)(*@!_,;\s-]\s*', flags=re.IGNORECASE)

pattern (str, default = ‘[_#&.)(@!_,;s-]s’) – The base pattern to sanitize the text in each column names.
fill_pattern (str, default='') – pattern to replace the non-alphabetic character in each item of columns.
inplace (bool, default=False,) – transform the dataframe of series in place.

Returns:

return Serie or dataframe if one is given, otherwise it returns a sanitized columns.

Return type:

columns | pd.Series | dataframe.

Examples

>>> from watex.utils.funcutils import sanitize_frame_cols
>>> from watex.utils.coreutils import read_data
>>> h502= read_data ('data/boreholes/H502.xlsx')
>>> h502 = sanitize_frame_cols (h502, fill_pattern ='_' )
>>> h502.columns[:3]
... Index(['depth_top', 'depth_bottom', 'strata_name'], dtype='object')
>>> f = lambda r : r.replace ('_', "'s ")
>>> h502_f= sanitize_frame_cols( h502, func =f )
>>> h502_f.columns [:3]
... Index(['depth's top', 'depth's bottom', 'strata's name'], dtype='object')

watex.utils.funcutils.sanitize_unicode_string(str_)[source]#

Replace all spaces and remove all french accents characters.

Example:

>>> from watex.utils.funcutils import sanitize_unicode_string
>>> sentence ='Nos clients sont extrêmement satisfaits '
    'de la qualité du service fourni. En outre Nos clients '
        'rachètent frequemment nos "services".'
>>> sanitize_unicode_string  (sentence)
... 'nosclientssontextrmementsatisfaitsdelaqualitduservice'
    'fournienoutrenosclientsrachtentfrequemmentnosservices'

watex.utils.funcutils.savejob(job, savefile, *, protocol=None, append_versions=True, append_date=True, fix_imports=True, buffer_callback=None, **job_kws)[source]#

Quick save your job using ‘joblib’ or persistent Python pickle module

Parameters:

job (Any) – Anything to save, preferabaly a models in dict
savefile (str, or path-like object) – name of file to store the model The file argument must have a write() method that accepts a single bytes argument. It can thus be a file object opened for binary writing, an io.BytesIO instance, or any other custom object that meets this interface.
append_versions (bool, default =True) – Append the version of Joblib module or Python Pickle module following by the scikit-learn, numpy and also pandas versions. This is useful to have idea about previous versions for loading file when system or modules have been upgraded. This could avoid bottleneck when data have been stored for long times and user has forgotten the date and versions at the time the file was saved.
append_date (bool, default=True,) –
Append the date of the day to the filename.

New in version 0.2.3.
protocol (int, optional) –
The optional protocol argument tells the pickler to use the given protocol; supported protocols are 0, 1, 2, 3, 4 and 5. The default protocol is 4. It was introduced in Python 3.4, and is incompatible with previous versions.

Specifying a negative protocol version selects the highest protocol version supported. The higher the protocol used, the more recent the version of Python needed to read the pickle produced.
fix_imports (bool, default=True,) – If fix_imports is True and protocol is less than 3, pickle will try to map the new Python 3 names to the old module names used in Python 2, so that the pickle data stream is readable with Python 2.
buffer_call_back (int, optional) –
If buffer_callback is None (the default), buffer views are serialized into file as part of the pickle stream.

If buffer_callback is not None, then it can be called any number of times with a buffer view. If the callback returns a false value (such as None), the given buffer is out-of-band; otherwise the buffer is serialized in-band, i.e. inside the pickle stream.

It is an error if buffer_callback is not None and protocol is None or smaller than 5.
job_kws (dict,) – Additional keywords arguments passed to joblib.dump().

Returns:

savefile – returns the filename

Return type:

str,

watex.utils.funcutils.savepath_(nameOfPath)[source]#

Shortcut to create a folder :param nameOfPath: Path name to save file :type nameOfPath: str

Returns:: New folder created. If the nameOfPath exists, will return None

:rtype:str

watex.utils.funcutils.serialize_data(data, filename=None, force=True, savepath=None, verbose=0)[source]#

Store a data into a binary file

Parameters:

data – Object Object to store into a binary file.
filename – str Name of file to serialize. If ‘None’, should create automatically.
savepath – str, PathLike object Directory to save file. If not exists should automaticallycreate.
force – bool If True, remove the old file if it exists, otherwise will create a new incremenmted file.
verbose – int, get more message.

Returns:

dumped or serialized filename.

Example:

>>> import numpy as np
>>> import watex.utils.coreutils import serialize_data
>>> data = np.arange(15)
>>> file = serialize_data(data, filename=None,  force=True,
...                          savepath =None, verbose =3)
>>> file

watex.utils.funcutils.show_stats(nedic, nedir, fmtl='~', lenl=77, obj='EDI')[source]#

Estimate the file successfully read reading over the unread files

Parameters:

nedic – number of input or collected files
nedir – number of files read sucessfully
fmt – str to format the stats line
lenl – length of line denileation.

watex.utils.funcutils.shrunkformat(text, chunksize=7, insert_at=None, sep=None)[source]#

format class and add elipsis when classe are greater than maxview

Parameters:

text – str - a text to shrunk and format. Can also be an iterable object.
chunksize – int, the size limit to keep in the formatage text. default is 7.
insert_at – str, the place to insert the ellipsis. If None, shrunk the text and put the ellipsis, between the text beginning and the text endpoint. Can be beginning, or end.
sep – str if the text is delimited by a kind of character, the sep parameters could be usefull so it would become a starting point for word counting. default is None which means word is counting from the space.

Example:

>>> import numpy as np
>>> from watex.utils.funcutils import shrunkformat
>>> text=" I'm a long text and I will be shrunked and replace by ellipsis."
>>> shrunkformat (text)
... 'Im a long ... and replace by ellipsis.'
>>> shrunkformat (text, insert_at ='end')
...'Im a long ... '
>>> arr = np.arange(30)
>>> shrunkformat (arr, chunksize=10 )
... '0 1 2 3 4  ...  25 26 27 28 29'
>>> shrunkformat (arr, insert_at ='begin')
... ' ...  26 27 28 29'

watex.utils.funcutils.smart_format(iter_obj, choice='and')[source]#

Smart format iterable object.

Parameters:

iter_obj – iterable obj
choice – can be ‘and’ or ‘or’ for optional.

Example:

>>> from watex.utils.funcutils import smart_format
>>> smart_format(['model', 'iter', 'mesh', 'data'])
... 'model','iter','mesh' and 'data'

watex.utils.funcutils.smart_label_classifier(arr, /, values=None, labels=None, order='soft', func=None, raise_warn=True)[source]#

map smartly the numeric array into a class labels from a map function or a given fixed values.

New classes created from the fixed values can be renamed if labels are supplied.

Parameters:

arr (Arraylike 1d,) – array-like whose items are expected to be categorized.
values (float, list of float,) – The threshold item values from which the default categorization must be fixed.
labels (int |str| or List of [str, int],) – The labels values that might be correspond to the fixed values. Note that the number of fixed_labels might be consistent with the fixed values plus one, otherwise a ValueError shall raise if order is set to strict.
order (str, ['soft'|'strict'], default='soft',) – If order is True, the argument passed to values must be self contain as item in the arr, and raise warning otherwise.
func (callable, optional) – Function to map the given array. If given, values dont need to be supply.
raise_warn (bool, default='True') – Raise warning message if order=soft and the fixed values are not found in the arr. Also raise warnings, if labels arguments does not match the number of class from fixed values.

Returns:

arr – categorized array with the same length as the raw

Return type:

array-like 1d

Examples

>>> import numpy as np
>>> from watex.utils.funcutils import smart_label_classifier
>>> sc = np.arange (0, 7, .5 )
>>> smart_label_classifier (sc, values = [1, 3.2 ])
array([0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2], dtype=int64)
>>> # rename labels <=1 : 'l1', ]1; 3.2]: 'l2' and >3.2 :'l3'
>>> smart_label_classifier (sc, values = [1, 3.2 ], labels =['l1', 'l2', 'l3'])
>>> array(['l1', 'l1', 'l1', 'l2', 'l2', 'l2', 'l2', 'l3', 'l3', 'l3', 'l3',
       'l3', 'l3', 'l3'], dtype=object)
>>> def f (v):
        if v <=1: return 'l1'
        elif 1< v<=3.2: return "l2"
        else : return "l3"
>>> smart_label_classifier (sc, func= f )
array(['l1', 'l1', 'l1', 'l2', 'l2', 'l2', 'l2', 'l3', 'l3', 'l3', 'l3',
       'l3', 'l3', 'l3'], dtype=object)
>>> smart_label_classifier (sc, values = 1.)
array([0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int64)
>>> smart_label_classifier (sc, values = 1., labels='l1')
array(['l1', 'l1', 'l1', 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=object)

watex.utils.funcutils.smart_strobj_recognition(name, container, stripitems='_', deep=False)[source]#

Find the likelihood word in the whole containers and returns the value.

Parameters:: name – str - Value of to search. I can not match the exact word in

the container :param container: list, tuple, dict- container of the many string words. :param stripitems: str - ‘str’ items values to sanitize the content

element of the dummy containers. if different items are provided, they can be separated by :, , and ;. The items separators aforementioned can not be used as a component in the name. For isntance:
name= 'dipole_'; stripitems='_' -> means remove the '_'
under the ``dipole_``
name= '+dipole__'; stripitems ='+;__'-> means remove the '+' and
'__' under the value `name`.

Parameters:

deep – bool - Kind of research. Go deeper by looping each items for find the initials that can fit the name. Note that, if given, the first occurence should be consider as the best name…

Returns:

Likelihood object from container or Nonetype if none object is detected.

Example:

>>> from watex.utils.funcutils import smart_strobj_recognition
>>> from watex.methods import ResistivityProfiling
>>> rObj = ResistivityProfiling(AB= 200, MN= 20,)
>>> smart_strobj_recognition ('dip', robj.__dict__))
... None
>>> smart_strobj_recognition ('dipole_', robj.__dict__))
... dipole
>>> smart_strobj_recognition ('dip', robj.__dict__,deep=True )
... dipole
>>> smart_strobj_recognition (
    '+_dipole___', robj.__dict__,deep=True , stripitems ='+;_')
... 'dipole'

watex.utils.funcutils.split_list(lst, /, val, fill_value=None)[source]#

Module to extract a slice of elements from the list

Parameters:

lst (list,) – List composed of item elements
val (int,) – Number of item to group by default.

Return type:

group with slide items

Examples

>>> from watex.utils.funcutils import split_list
>>> lst = [1, 2, 3, 4, 5, 6, 7, 8]
>>> val = 3
>>> print(split_list(lst, val))
[[1, 2, 3], [4, 5, 6], [7, 8]]

watex.utils.funcutils.station_id(id_, is_index='index', how=None, **kws)[source]#

From id get the station name as input and return index id. Index starts at 0.

Parameters:

id – str, of list of the name of the station or indexes .
is_index – bool considered the given station as a index. so it remove all the letter and keep digit as index of each stations.
how – Mode to index the station. Default is ‘Python indexing’ i.e.the counting starts by 0. Any other mode will start the counting by 1. Note that if is_index is True and the param how is set to it default value py, the station index should be downgraded to 1.
kws – additionnal keywords arguments from make_ids().

Returns:

station index. If the list id_ is given will return the tuple.

Example:

>>> from watex.utils.funcutils import station_id
>>> dat1 = ['S13', 's02', 's85', 'pk20', 'posix1256']
>>> station_id (dat1)
... (13, 2, 85, 20, 1256)
>>> station_id (dat1, how='py')
... (12, 1, 84, 19, 1255)
>>> station_id (dat1, is_index= None, prefix ='site')
... ('site1', 'site2', 'site3', 'site4', 'site5')
>>> dat2 = 1
>>> station_id (dat2) # return index like it is
... 1
>>> station_id (dat2, how='py') # considering the index starts from 0
... 0

watex.utils.funcutils.stn_check_split_type(data_lines)[source]#

Read data_line and check for data line the presence of split_type < ‘,’ or ‘ ‘, or any other marks.> Threshold is assume to be third of total data length.

Params data_lines:

list of data to parse .

Returns:

The split _type

Return type:

str

Example:

>>> from watex.utils  import funcutils as func
>>> path =  data/ K6.stn
>>> with open (path, 'r', encoding='utf8') as f :
...                     data= f.readlines()
>>>  print(func.stn_check_split_type(data_lines=data))

watex.utils.funcutils.storeOrwritehdf5(d, /, key=None, mode='a', kind=None, path_or_buf=None, encoding='utf8', csv_sep=',', index=Ellipsis, columns=None, sanitize_columns=Ellipsis, func=None, args=(), applyto=None, **func_kwds)[source]#

Store data to hdf5 or write data to csv file.

Note that by default, the data is not store nor write and return data if frame or transform the Path-Like object to data frame.

Parameters:

d (Dataframe, shape (m_samples, n_features)) – data to store or write or sanitize.
key (str) – Identifier for the group in the store.
mode ({'a', 'w', 'r+'}, default 'a') –
Mode to open file:
- ’w’: write, a new file is created (an existing file with the
  same name would be deleted).
- ’a’: append, an existing file is opened for reading and writing, and if the file does not exist it is created.
- ’r+’: similar to ‘a’, but the file must already exist.
kind (str, {'store', 'write', None} , default=None) –
Type of task to perform:
- ’store’: Store data to hdf5
- ’write’: export data to csv file.
- None: construct a dataframe if array is passed or sanitize it.
path_or_buf (str or pandas.HDFStore, or str, path object, file-like object, or None, default=None) – File path or HDFStore object. String, path object (implementing os.PathLike[str]), or file-like object implementing a write() function. If write=True and None, the result is returned as a string. If a non-binary file object is passed, it should be opened with newline=” “, disabling universal newlines. If a binary file object is passed, mode might need to contain a ‘b’.
encoding (str, default='utf8') – A string representing the encoding to use in the output file, Encoding is not supported if path_or_buf is a non-binary file object.
csv_sep (str, default=',',) – String of length 1. Field delimiter for the output file.
index (bool, index =False,) – Write data to csv with index or not.
columns (list of str, optional) – Usefull to create a dataframe when array is passed. Be aware to fit the number of array columns (shape[1])
sanitize_columns (bool, default=False,) –
remove undesirable character in the data columns using the default argument of regex parameters and fill pattern to underscore ‘_’. The default regex implementation is:
```
>>> import re
>>> re.compile (r'[_#&.)(*@!,;\s-]\s*', flags=re.IGNORECASE)
```
func (callable, Optional) – A custom sanitizing function and apply to each columns of the dataframe. If provide, the expected columns must be listed to applyto parameter.
args (tuple, optional) – Positional arguments of the sanitizing columns
applyto (str or list of str, Optional) – The list of columns to apply the function func. To apply the function to all columns, use the * instead.
func_kwds (dict,) – Keywords arguments of the sanitizing function func.

Returns:

None or d – returns None if kind is set to write or store otherwise return the dataframe.

Return type:

None of dataframe.

Examples

>>> from watex.utils.funcutils import storeOrwritehdf5
>>> from watex.datasets import load_bagoue
>>> data = load_bagoue().frame
>>> data.geol[:5]
0    VOLCANO-SEDIM. SCHISTS
1                  GRANITES
2                  GRANITES
3                  GRANITES
4          GEOSYN. GRANITES
Name: geol, dtype: object
>>> data = storeOrwritehdf5 ( data, sanitize_columns = True)
>>> data[['type', 'geol', 'shape']] # put all to lowercase
  type                    geol shape
0   cp  volcano-sedim. schists     w
1   ec                granites     v
2   ec                granites     v
>>> # compute using func
>>> def test_func ( a, times  , to_percent=False ):
        return ( a * times / 100)   if to_percent else ( a *times )
>>> data.sfi[:5]
0    0.388909
1    1.340127
2    0.446594
3    0.763676
4    0.068501
Name: sfi, dtype: float64
>>> d = storeOrwritehdf5 ( data,  func = test_func, args =(7,), applyto='sfi')
>>> d.sfi[:5]
0    2.722360
1    9.380889
2    3.126156
3    5.345733
4    0.479507
Name: sfi, dtype: float64
>>> storeOrwritehdf5 ( data,  func = test_func, args =(7,),
                      applyto='sfi', to_percent=True).sfi[:5]
0    0.027224
1    0.093809
2    0.031262
3    0.053457
4    0.004795
Name: sfi, dtype: float64
>>> # write data to hdf5 and outputs to current directory
>>> storeOrwritehdf5 ( d, key='test0', path_or_buf= 'test_data.h5',
                      kind ='store')
>>> # export data to csv
>>> storeOrwritehdf5 ( d, key='test0', path_or_buf= 'test_data',
                      kind ='export')

watex.utils.funcutils.str2columns(text, /, regex=None, pattern=None)[source]#

Split text from the non-alphanumeric markers using regular expression.

Remove all string non-alphanumeric and some operator indicators, and fetch attributes names.

Parameters:

text (str,) – text litteral containing the columns the names to retrieve

regex (re object,) –

Regular expresion object. the default is:

>>> import re
>>> re.compile (r'[#&*@!_,;\s-]\s*', flags=re.IGNORECASE)

pattern (str, default = ‘[#&*@!_,;s-]s*’) – The base pattern to split the text into a columns

Returns:

attr

Return type:

List of attributes

Examples

>>> from watex.utils.funcutils import str2columns
>>> text = ('this.is the text to split. It is an: example of; splitting str - to text.')
>>> str2columns (text )
... ['this',
     'is',
     'the',
     'text',
     'to',
     'split',
     'It',
     'is',
     'an:',
     'example',
     'of',
     'splitting',
     'str',
     'to',
     'text']

watex.utils.funcutils.strip_item(item_to_clean, item=None, multi_space=12)[source]#

Function to strip item around string values. if the item to clean is None or item-to clean is “’’”, function will return None value

Parameters:

item_to_clean (*) – List to strip item.
cleaner (*) – item to clean , it may change according the use. The default is ‘’.
multi_space (*) – degree of repetition may find around the item. The default is 12.

Returns:

item_to_clean , cleaned item

Return type:

list or ndarray

Example:

>>> import numpy as np
>>> new_data=_strip_item (item_to_clean=np.array(['      ss_data','    pati   ']))
>>>  print(np.array(['      ss_data','    pati   ']))
... print(new_data)

watex.utils.funcutils.to_hdf5(d, /, fn, objname=None, close=True, **hdf5_kws)[source]#

Store a frame data in hierachical data format 5 (HDF5)

Note that is d is a dataframe, make sure that the dependency ‘pytables’ is already installed, otherwise and error raises.

Parameters:

d (ndarray,) – data to store in HDF5 format
fn (str,) – File path to HDF5 file.
objname (str,) – name of the data to store
close (bool, default =True) – when data is given as an array, data can still be added if close is set to False, otherwise, users need to open again in read mode ‘r’ before pursuing the process of adding.
hdf5_kws (dict of pandas.pd.HDFStore) –
Additional keywords arguments passed to pd.HDFStore. they could be: * mode : {‘a’, ‘w’, ‘r’, ‘r+’}, default ‘a’
'r'
Read-only; no data can be modified.

'w'
Write; a new file is created (an existing file with the same name would be deleted).

'a'
Append; an existing file is opened for reading and writing, and if the file does not exist it is created.

'r+'
It is similar to 'a', but the file must already exist.
- complevelint, 0-9, default None
  Specifies a compression level for data. A value of 0 or None disables compression.
- complib{‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’}, default ‘zlib’
  Specifies the compression library to be used. As of v0.20.2 these additional compressors for Blosc are supported (default if no compressor specified: ‘blosc:blosclz’): {‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’,
  
  ’blosc:zlib’, ‘blosc:zstd’}.
  
  Specifying a compression library which is not available issues a ValueError.
- fletcher32bool, default False
  If applying compression use the fletcher32 checksum.

Returns:

store

Return type:

Dict-like IO interface for storing pandas objects.

Examples

>>> import os
>>> from watex.utils.funcutils import sanitize_frame_cols, to_hdf5
>>> from watex.utils import read_data
>>> data = read_data('data/boreholes/H502.xlsx')
>>> sanitize_frame_cols (data, fill_pattern='_', inplace =True )
>>> store_path = os.path.join('watex/datasets/data', 'h') # 'h' is the name of the data
>>> store = to_hdf5 (data, fn =store_path , objname ='h502' )
>>> store
...
>>> # fetch the data
>>> h502 = store ['h502']
>>> h502.columns[:3]
... Index(['hole_number', 'depth_top', 'depth_bottom'], dtype='object')

watex.utils.funcutils.to_numeric_dtypes(arr, *, columns=None, return_feature_types=Ellipsis, missing_values=nan, pop_cat_features=Ellipsis, sanitize_columns=Ellipsis, regex=None, fill_pattern=None, drop_nan_columns=True, how='all', reset_index=Ellipsis, drop_index=True, verbose=Ellipsis)[source]#

Convert array to dataframe and coerce arguments to appropriate dtypes.

Function includes additional tools to manipulate the transformed data such as:

pop_cat_features to remove the categorical attributes,
sanitize_columns to clean the columns of the dataframe by removing the undesirable characters,
drop_nan_columns to drop all the columns and/or rows that contains full NaN, …

Parameters:

arr (Ndarray or Dataframe, shape (m_samples, n_features)) – Array of dataframe to create, to sanitize or to auto-detect feature categories ( numerical or categorical).
columns (list of str, optional) – Usefull to create a dataframe when array is given. Be aware to fit the number of array columns (shape[1])
return_feature_types (bool, default=False,) – return the list of numerical and categorial features.
missing_values (float, default='NaN') – Replace the missing or empty string if exist in the dataframe.
pop_cat_features (bool, default=False,) – remove the categorial features from the DataFrame.
sanitize_columns (bool, default=False,) –
remove undesirable character in the data columns using the default argument of regex parameters.

New in version 0.1.9.
regex (re object,) –
Regular expresion object used to polish the data columns.
the default is:
>>> import re >>> re.compile (r'[_#&.)(*@!_,;\s-]\s*', flags=re.IGNORECASE)
New in version 0.1.9.
fill_pattern (str, default='') – Pattern to replace the non-alphabetic character in each item of columns.
drop_nan_columns (bool, default=True) –
Remove all columns filled by NaN values.
how (str, default='all') – Drop also the NaN row data. The row data which is composed entirely with NaN or Null values.
reset_index (bool, default=False) –
Reset the index of the dataframe.
drop_index (bool, default=True,) –
Drop index in the dataframe after reseting.
verbose (bool, default=False,) – outputs a message by listing the categorial items dropped from the dataframe if exists.

Returns:

df or (df, nf, cf) – also return nf and cf if return_feature_types is set to``True``.

Return type:

Dataframe of values casted to numeric types

Examples

>>> from watex.datasets.dload import load_bagoue
>>> from watex.utils.funcutils import to_numeric_dtypes
>>> X, y = load_bagoue (as_frame =True )
>>> X0 =X[['shape', 'power', 'magnitude']]
>>> X0.dtypes
... shape        object
    power        object
    magnitude    object
    dtype: object
>>> df = to_numeric_dtypes(X0)
>>> df.dtypes
... shape         object
    power        float64
    magnitude    float64
    dtype: object

watex.utils.funcutils.twinning(*d, on=None, parse_on=False, mode='strict', coerce=False, force=False, decimals=7, raise_warn=True)[source]#

Find indentical object in all data and concatenate them using merge: intersection (cross) strategy.

Parameters:

d (List of DataFrames) – List of pandas DataFrames
on (str, label or list) –
Column or index level names to join on. These must be found in all DataFrames. If on is None and not merging on indexes then a concatenation along columns axis is performed in all DataFrames. Note that on works with parse_on if its argument is a list of columns names passed into single litteral string. For instance:
```
on ='longitude latitude' --[parse_on=True]-> ['longitude' , 'latitude']
```
parse_on (bool, default=False) – Parse on arguments if given as string and return_iterable objects.
mode (str, default='strict') – Mode to the data. Can be [‘soft’|’strict’]. In strict mode, all the data passed must be a DataFrame, otherwise an error raises. in soft mode, ignore the non-DataFrame. Note that any other values should be in strict mode.
coerce (bool, default=False) – Truncate all DataFrame size to much the shorter one before performing the merge.
force (bool, default=False,) – Force on items to be in the all DataFrames, This could be possible at least, on items should be in one DataFrame. If missing in all data, an error occurs.
decimals (int, default=5) –

Decimal is used for comparison between numeric labels in on columns
items. If set, it rounds values of on items in all data before performing the merge.

raise_warn: bool, default=False
Warn user to concatenate data along column axis if on is None.

Returns:

data – A DataFrame of the merged objects.

Return type:

DataFrames

Examples

>>> import watex as wx
>>> from watex.utils.funcutils import twinning
>>> data = wx.make_erp (seed =42 , n_stations =12, as_frame =True )
>>> table1 = wx.DCProfiling ().fit(data).summary()
>>> table1
       dipole   longitude  latitude  ...  shape  type       sfi
line1      10  110.486111  26.05174  ...      C    EC  1.141844
>>> data_no_xy = wx.make_ves ( seed=0 , as_frame =True)
>>> data_no_xy.head(2)
    AB   MN  resistivity
0  1.0  0.4   448.860148
1  2.0  0.4   449.060335
>>> data_xy = wx.make_ves ( seed =0 , as_frame =True , add_xy =True )
>>> data_xy.head(2)
    AB   MN  resistivity   longitude  latitude
0  1.0  0.4   448.860148  109.332931  28.41193
1  2.0  0.4   449.060335  109.332931  28.41193
>>> table = wx.methods.VerticalSounding (
    xycoords = (110.486111,   26.05174)).fit(data_no_xy).summary()
>>> table.table_
         AB    MN   arrangememt  ... nareas   longitude  latitude
area                             ...
None  200.0  20.0  schlumberger  ...      1  110.486111  26.05174
>>> twinning (table1, table.table_,  )
       dipole   longitude  latitude  ...  nareas   longitude  latitude
line1    10.0  110.486111  26.05174  ...     NaN         NaN       NaN
None      NaN         NaN       NaN  ...     1.0  110.486111  26.05174
>>> twinning (table1, table.table_, on =['longitude', 'latitude'] )
Empty DataFrame
>>> # comments: Empty dataframe appears because, decimal is too large
>>> # then it considers values longitude and latitude differents
>>> twinning (table1, table.table_, on =['longitude', 'latitude'], decimals =5 )
    dipole  longitude  latitude  ...  max_depth  ohmic_area  nareas
0      10  110.48611  26.05174  ...      109.0  690.063003       1
>>> # Now is able to find existing dataframe with identical closer coordinates.

watex.utils.funcutils.url_checker(url, install=False, raises='ignore')[source]#

check whether the URL is reachable or not.

function uses the requests library. If not install, set the install parameter to True to subprocess install it.

Parameters:

url (str,) – link to the url for checker whether it is reachable
install (bool,) – Action to install the ‘requests’ module if module is not install yet.
raises (str) – raise errors when url is not recheable rather than returning 0. if raises is ignore, and module ‘requests’ is not installed, it will use the django url validator. However, the latter only assert whether url is right but not validate its reachability.

Return type:

``True``{1} for reacheable and ``False``{0} otherwise.

Example

>>> from watex.utils.funcutils import url_checker
>>> url_checker ("http://www.example.com")
...  0 # not reacheable
>>> url_checker ("https://watex.readthedocs.io/en/latest/api/watex.html")
... 1

watex.utils.funcutils.wrap_infos(phrase, value='', underline='-', unit='', site_number='', **kws)[source]#: Display info from anomaly details.

watex.utils.funcutils.zip_extractor(zip_file, samples='*', ftype=None, savepath=None, pwd=None)[source]#

Extract ZIP archive objects.

Can extract all or a sample objects when the number of object is passed to the parameter samples.

New in version 0.1.5.

Parameters:

zip_file (str) – Full Path to archive Zip file.
samples (int, str, default ='*') – Number of data to retrieve from archive files. This is useful when the archive file contains many data. * means extract all.
savepath (str, optional) – Path to store the decompressed archived files.
ftype (str,) – Is the extension of a the specific file to decompress. Indeed, if the archived files contains many different data formats, specifying the data type would retrieved this specific files from the whole files archieved.
pwd (int, optional) – Password to pass if the zip file is encrypted.

Returns:

objnames – List of decompressed objects.

Return type:

list,

Examples

>>> from watex.utils.funcutils import zip_extractor
>>> zip_extractor ('watex/datasets/data/edis/e.E.zip')