- exception watex.utils.validator.DataConversionWarning[source]#
Bases:
UserWarningWarning used to notify implicit data conversions happening in the code. This warning occurs when some input data needs to be converted or interpreted in a way that may not match the user’s expectations. For example, this warning may occur when the user
passes an integer array to a function which expects float input and will convert the input
requests a non-copying operation, but a copy is required to meet the implementation’s data-type expectations;
passes an input whose shape can be interpreted ambiguously.
Changed in version 0.18: Moved from sklearn.utils.validation.
- exception watex.utils.validator.PositiveSpectrumWarning[source]#
Bases:
UserWarningWarning raised when the eigenvalues of a PSD matrix have issues This warning is typically raised by
_check_psd_eigenvalueswhen the eigenvalues of a positive semidefinite (PSD) matrix such as a gram matrix (kernel) present significant negative eigenvalues, or bad conditioning i.e. very small non-zero eigenvalues compared to the largest eigenvalue. .. versionadded:: 0.22
- watex.utils.validator.array_to_frame(X, *, to_frame=False, columns=None, raise_exception=False, raise_warning=True, input_name='', force=False)[source]#
Added part of is_frame dedicated to X and y frame reconversion validation.
- Parameters:
X (Array-like) – Array to convert to frame.
columns (str or list of str) – Series name or columns names for pandas.Series and DataFrame.
to_frame (str, default=False) – If
True, reconvert the array to frame using the columns orthewise no-action is performed and return the same array.input_name (str, default="") – The data name used to construct the error message.
raise_warning (bool, default=True) – If True then raise a warning if conversion is required. If
ignore, warnings silence mode is triggered.raise_exception (bool, default=False) – If True then raise an exception if array is not symmetric.
force (bool, default=False) – Force conversion array to a frame is columns is not supplied. Use the combinaison, input_name and X.shape[1] range.
- Returns:
X
- Return type:
converted array
Example
>>> from watex.datasets import fetch_data >>> from watex.utils.validator import array_to_frame >>> data = fetch_data ('hlogs').frame >>> array_to_frame (data.k.values , to_frame= True, columns =None, input_name= 'y', raise_warning="silence" ) ... array([nan, nan, nan, ..., nan, nan, nan]) # mute
- watex.utils.validator.assert_all_finite(X, *, allow_nan=False, estimator_name=None, input_name='')[source]#
Throw a ValueError if X contains NaN or infinity. :param X: The input data. :type X: {ndarray, sparse matrix} :param allow_nan: If True, do not throw error when X contains NaN. :type allow_nan: bool, default=False :param estimator_name: The estimator name, used to construct the error message. :type estimator_name: str, default=None :param input_name: The data name used to construct the error message. In particular
if input_name is “X” and the data has NaN values and allow_nan is False, the error message will link to the imputer documentation.
- watex.utils.validator.check_X_y(X, y, accept_sparse=False, *, accept_large_sparse=True, dtype='numeric', order=None, copy=False, force_all_finite=True, ensure_2d=True, allow_nd=False, multi_output=False, ensure_min_samples=1, ensure_min_features=1, y_numeric=False, estimator=None, to_frame=False)[source]#
Input validation for standard estimators. Checks X and y for consistent length, enforces X to be 2D and y 1D. By default, X is checked to be non-empty and containing only finite values. Standard input checks are also applied to y, such as checking that y does not have np.nan or np.inf targets. For multi-label y, set multi_output=True to allow 2D and sparse y. If the dtype of X is object, attempt converting to float, raising on failure. :param X: Input data. :type X: {ndarray, list, sparse matrix} :param y: Labels. :type y: {ndarray, list, sparse matrix} :param accept_sparse: String[s] representing allowed sparse matrix formats, such as ‘csc’,
‘csr’, etc. If the input is sparse but not in the allowed format, it will be converted to the first listed format. True allows the input to be any format. False means that a sparse matrix input will raise an error.
- Parameters:
accept_large_sparse (bool, default=True) – If a CSR, CSC, COO or BSR sparse matrix is supplied and accepted by accept_sparse, accept_large_sparse will cause it to be accepted only if its indices are stored with a 32-bit dtype. .. versionadded:: 0.20
dtype ('numeric', type, list of type or None, default='numeric') – Data type of result. If None, the dtype of the input is preserved. If “numeric”, dtype is preserved unless array.dtype is object. If dtype is a list of types, conversion on the first type is only performed if the dtype of the input is not in the list.
order ({'F', 'C'}, default=None) – Whether an array will be forced to be fortran or c-style.
copy (bool, default=False) – Whether a forced copy will be triggered. If copy=False, a copy might be triggered by a conversion.
force_all_finite (bool or 'allow-nan', default=True) –
Whether to raise an error on np.inf, np.nan, pd.NA in X. This parameter does not influence whether y can have np.inf, np.nan, pd.NA values. The possibilities are: - True: Force all values of X to be finite. - False: accepts np.inf, np.nan, pd.NA in X. - ‘allow-nan’: accepts only np.nan or pd.NA values in X. Values cannot
be infinite.
New in version 0.20:
force_all_finiteaccepts the string'allow-nan'.Changed in version 0.23: Accepts pd.NA and converts it into np.nan
ensure_2d (bool, default=True) – Whether to raise a value error if X is not 2D.
allow_nd (bool, default=False) – Whether to allow X.ndim > 2.
multi_output (bool, default=False) – Whether to allow 2D y (array or sparse matrix). If false, y will be validated as a vector. y cannot have np.nan or np.inf values if multi_output=True.
ensure_min_samples (int, default=1) – Make sure that X has a minimum number of samples in its first axis (rows for a 2D array).
ensure_min_features (int, default=1) – Make sure that the 2D array has some minimum number of features (columns). The default value of 1 rejects empty datasets. This check is only enforced when X has effectively 2 dimensions or is originally 1D and
ensure_2dis True. Setting to 0 disables this check.y_numeric (bool, default=False) – Whether to ensure that y has a numeric type. If dtype of y is object, it is converted to float64. Should only be used for regression algorithms.
estimator (str or estimator instance, default=None) – If passed, include the name of the estimator in warning messages.
- Returns:
X_converted (object) – The converted and validated X.
y_converted (object) – The converted and validated y.
- watex.utils.validator.check_array(array, *, accept_large_sparse=True, dtype='numeric', accept_sparse=False, order=None, copy=False, force_all_finite=True, ensure_2d=True, allow_nd=False, ensure_min_samples=1, ensure_min_features=1, estimator=None, input_name='', to_frame=True)[source]#
Input validation on an array, list, or similar. By default, the input is checked to be a non-empty 2D array containing only finite values. If the dtype of the array is object, attempt converting to float, raising on failure.
- Parameters:
array (object) – Input object to check / convert.
accept_sparse (str, bool or list/tuple of str, default=False) – String[s] representing allowed sparse matrix formats, such as ‘csc’, ‘csr’, etc. If the input is sparse but not in the allowed format, it will be converted to the first listed format. True allows the input to be any format. False means that a sparse matrix input will raise an error.
accept_large_sparse (bool, default=True) – If a CSR, CSC, COO or BSR sparse matrix is supplied and accepted by accept_sparse, accept_large_sparse=False will cause it to be accepted only if its indices are stored with a 32-bit dtype.
dtype ('numeric', type, list of type or None, default='numeric') – Data type of result. If None, the dtype of the input is preserved. If “numeric”, dtype is preserved unless array.dtype is object. If dtype is a list of types, conversion on the first type is only performed if the dtype of the input is not in the list.
order ({'F', 'C'} or None, default=None) – Whether an array will be forced to be fortran or c-style. When order is None (default), then if copy=False, nothing is ensured about the memory layout of the output array; otherwise (copy=True) the memory layout of the returned array is kept as close as possible to the original array.
copy (bool, default=False) – Whether a forced copy will be triggered. If copy=False, a copy might be triggered by a conversion.
force_all_finite (bool or 'allow-nan', default=True) –
Whether to raise an error on np.inf, np.nan, pd.NA in array. The possibilities are: - True: Force all values of array to be finite. - False: accepts np.inf, np.nan, pd.NA in array. - ‘allow-nan’: accepts only np.nan and pd.NA values in array. Values
cannot be infinite.
force_all_finiteaccepts the string'allow-nan'.Accepts pd.NA and converts it into np.nan
ensure_2d (bool, default=True) – Whether to raise a value error if array is not 2D.
ensure_min_samples (int, default=1) – Make sure that the array has a minimum number of samples in its first axis (rows for a 2D array). Setting to 0 disables this check.
ensure_min_features (int, default=1) – Make sure that the 2D array has some minimum number of features (columns). The default value of 1 rejects empty datasets. This check is only enforced when the input data has effectively 2 dimensions or is originally 1D and
ensure_2dis True. Setting to 0 disables this check.estimator (str or estimator instance, default=None) – If passed, include the name of the estimator in warning messages.
input_name (str, default="") – The data name used to construct the error message. In particular if input_name is “X” and the data has NaN values and allow_nan is False, the error message will link to the imputer documentation.
to_frame (bool, default=False) – Reconvert array back to pd.Series or pd.DataFrame if the original array is pd.Series or pd.DataFrame.
- Returns:
array_converted – The converted and validated array.
- Return type:
object
- watex.utils.validator.check_consistent_length(*arrays)[source]#
Check that all arrays have consistent first dimensions. Checks whether all objects in arrays have the same shape or length. :param *arrays: Objects that will be checked for consistent length. :type *arrays: list or tuple of input objects.
- watex.utils.validator.check_memory(memory)[source]#
Check that
memoryis joblib.Memory-like. joblib.Memory-like means thatmemorycan be converted into a joblib.Memory instance (typically a str denoting thelocation) or has the same interface (has acachemethod). :param memory:If string, the location where to create the joblib.Memory interface.
If None, no caching is done and the Memory object is completely transparent.
- Returns:
memory – A correct joblib.Memory object.
- Return type:
object with the joblib.Memory interface
- Raises:
ValueError – If
memoryis not joblib.Memory-like.
- watex.utils.validator.check_random_state(seed)[source]#
Turn seed into a np.random.RandomState instance. :param seed: If seed is None, return the RandomState singleton used by np.random.
If seed is an int, return a new RandomState instance seeded with seed. If seed is already a RandomState instance, return it. Otherwise raise ValueError.
- Returns:
The random state object based on seed parameter.
- Return type:
numpy:numpy.random.RandomState
- watex.utils.validator.check_scalar(x, name, target_type, *, min_val=None, max_val=None, include_boundaries='both')[source]#
Validate scalar parameters type and value. :param x: The scalar parameter to validate. :type x: object :param name: The name of the parameter to be printed in error messages. :type name: str :param target_type: Acceptable data types for the parameter. :type target_type: type or tuple :param min_val: The minimum valid value the parameter can take. If None (default) it
is implied that the parameter does not have a lower bound.
- Parameters:
max_val (float or int, default=None) – The maximum valid value the parameter can take. If None (default) it is implied that the parameter does not have an upper bound.
include_boundaries ({"left", "right", "both", "neither"}, default="both") –
Whether the interval defined by min_val and max_val should include the boundaries. Possible choices are: - “left”: only min_val is included in the valid interval.
It is equivalent to the interval [ min_val, max_val ).
”right”: only max_val is included in the valid interval. It is equivalent to the interval ( min_val, max_val ].
”both”: min_val and max_val are included in the valid interval. It is equivalent to the interval [ min_val, max_val ].
”neither”: neither min_val nor max_val are included in the valid interval. It is equivalent to the interval ( min_val, max_val ).
- Returns:
x – The validated number.
- Return type:
numbers.Number
- Raises:
TypeError – If the parameter’s type does not match the desired type.
ValueError – If the parameter’s value violates the given bounds. If min_val, max_val and include_boundaries are inconsistent.
- watex.utils.validator.check_symmetric(array, *, tol=1e-10, raise_warning=True, raise_exception=False)[source]#
Make sure that array is 2D, square and symmetric. If the array is not symmetric, then a symmetrized version is returned. Optionally, a warning or exception is raised if the matrix is not symmetric. :param array: Input object to check / convert. Must be two-dimensional and square,
otherwise a ValueError will be raised.
- Parameters:
tol (float, default=1e-10) – Absolute tolerance for equivalence of arrays. Default = 1E-10.
raise_warning (bool, default=True) – If True then raise a warning if conversion is required.
raise_exception (bool, default=False) – If True then raise an exception if array is not symmetric.
- Returns:
array_sym – Symmetrized version of the input array, i.e. the average of array and array.transpose(). If sparse, then duplicate entries are first summed and zeros are eliminated.
- Return type:
{ndarray, sparse matrix}
- watex.utils.validator.check_y(y, multi_output=False, y_numeric=False, input_name='y', estimator=None, to_frame=False, allow_nan=False)[source]#
- Parameters:
multi_output (bool, default=False) – Whether to allow 2D y (array or sparse matrix). If false, y will be validated as a vector. y cannot have np.nan or np.inf values if multi_output=True.
y_numeric (bool, default=False) – Whether to ensure that y has a numeric type. If dtype of y is object, it is converted to float64. Should only be used for regression algorithms.
input_name (str, default="y") – The data name used to construct the error message. In particular if input_name is “y”.
estimator (str or estimator instance, default=None) – If passed, include the name of the estimator in warning messages.
allow_nan (bool, default=False) – If True, do not throw error when y contains NaN.
to_frame (bool, default=False,) – reconvert array to its initial type if it is given as pd.Series or pd.DataFrame.
- Returns:
y (array-like,)
y_converted (object) – The converted and validated y.
- watex.utils.validator.get_estimator_name(estimator, /)[source]#
Get the estimator name whatever it is an instanciated object or not
- Parameters:
estimator – callable or instanciated object, callable or instance object that has a fit method.
- Returns:
str, name of the estimator.
- watex.utils.validator.has_fit_parameter(estimator, parameter)[source]#
Check whether the estimator’s fit method supports the given parameter. :param estimator: An estimator to inspect. :type estimator: object :param parameter: The searched parameter. :type parameter: str
- Returns:
is_parameter – Whether the parameter was found to be a named parameter of the estimator’s fit method.
- Return type:
bool
Examples
>>> from sklearn.svm import SVC >>> from sklearn.utils.validation import has_fit_parameter >>> has_fit_parameter(SVC(), "sample_weight") True
- watex.utils.validator.is_frame(arr, /)[source]#
Return bool wether array is a frame ( pd.Series or pd.DataFrame )
Isolated part of
array_to_frame()dedicated to X and y frame reconversion validation.
- watex.utils.validator.is_valid_dc_data(d, /, method='erp', exception=<class 'TypeError'>, extra='')[source]#
Detect the kind of DC data passed and raise error if data is not the appropriate DC data expected.
Data must be Vertical Electrical Sounding (VES) or Electrical Resistivity Profiling (ERP).
- Parameters:
d (pd.dataframe) – DC -resistivity data. Must be ERP or VES data
dc (str, default='erp') – kind of DC-resistivity methods.
exception (
BaseException, [‘VESError’ |’ERPError’], default=TypeError) – Kind of error to raise.extra (str,) – Extra message to improve the error.
- Returns:
d – DC-resistiviy frame.
- Return type:
pd.dataframe
- watex.utils.validator.set_array_back(X, *, to_frame=False, columns=None, input_name='X')[source]#
Set array back to frame, reconvert the Numpy array to pandas series or dataframe.
- Parameters:
X (Array-like) – Array to convert to frame.
columns (str or list of str) – Series name or columns names for pandas.Series and DataFrame.
to_frame (str, default=False) – If
True, reconvert the array to frame using the columns orthewise no-action is performed and return the same array.input_name (str, default="") – The data name used to construct the error message.
force (bool, default=False,) – Force columns creating using the combination
input_nameand columns range if columns is not supplied.
- Returns:
X, columns – columns if X is dataframe and name if Series. Otherwwise returns None.
- Return type:
Array-like
- watex.utils.validator.to_dtype_str(arr, /, return_values=False)[source]#
Convert numeric or object dtype to string dtype.
This will avoid a particular TypeError when an array is filled by np.nan and at the same time contains string values. Converting the array to dtype str rather than keeping to ‘object’ will pass this error.
- Parameters:
arr – array-like array with all numpy datatype or pandas dtypes
return_values – bool, default=False returns array values in string dtype. This might be usefull when a series with dtype equals to object or numeric is passed.
- Returns:
array-like array-like with dtype str Note that if the dataframe or serie is passed, the object datatype will change only if return_values is set to
True, otherwise returns the same object.