watex.datasets.load_mxs#

watex.datasets.load_mxs(*, return_X_y=False, as_frame=False, key=None, tag=None, samples=None, tnames=None, data_names=None, split_X_y=False, seed=None, shuffle=False, test_ratio=0.2, **kws)[source]#

Load the dataset after implementing the mixture learning strategy (MXS).

Dataset is composed of 11 boreholes merged with multiple-target that can be used for a classification problem.

Parameters

return_X_y (bool, default=False) – If True, returns (data, target) instead of a Bowlspace object. See below for more information about the data and target object.
as_frame (bool, default=False) – If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric). The target is a pandas DataFrame or Series depending on the number of target columns. If return_X_y is True, then (data, target) will be pandas DataFrames or Series as described below.
split_X_y (bool, default=False,) – If True, the data is splitted to hold the training set (X, y) and the testing set (Xt, yt) based on to the test_ratio value.
tnames (str, optional) – the name of the target to retrieve. If None the full target columns are collected and compose a multioutput y. For a singular classification or regression problem, it is recommended to indicate the name of the target that is needed for the learning task.
(tag (None) – tag and data_names do nothing. just for API purpose and to allow fetching the same data uing the func:~watex.data.fetch_data since the latter already holds tag and data_names as parameters.
data_names) (None) – tag and data_names do nothing. just for API purpose and to allow fetching the same data uing the func:~watex.data.fetch_data since the latter already holds tag and data_names as parameters.
samples (int,optional) – Ratio or number of items from axis to fetch in the data. Default = .5 if samples is None.
key (str, default='data') –
Kind of MXS data to fetch. Can also be:
- ”sparse”: for a compressed sparsed row matrix format of train set X.
- ”scale”: returns a scaled X using the standardization strategy
- ”num”: Exclusive numerical data and exclude the ‘strata’ feature.
- ”test”: test data X and y
- ”train”: train data X and y with preprocessing already performed
- ”raw”: for original dataset X and y with no preprocessing
- ”data”: Default when key is not supplied. It returns the Bowlspace objects.
When k is not supplied, “data” is used instead and return a Bowlspace objects. where:
- target_map: is the mapping of MXS labels in the target y.
- nga_labels: is the y predicted for Naive Group of Aquifer.
drop_observations (bool, default='False') – Drop the remark column in the logging data if set to True.
seed (int, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional) – If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given.
shuffle (bool, default =False,) – If True, borehole data should be shuffling before sampling.
test_ratio (float, default is 0.2 i.e. 20% (X, y)) – The ratio to split the data into training (X, y) and testing (Xt, yt) set respectively.

Returns

data (Boxspace) – Dictionary-like object, with the following attributes. data : {ndarray, dataframe}

The data matrix. If as_frame=True, data will be a pandas DataFrame.

target: {ndarray, Series}
The classification target. If as_frame=True, target will be a pandas Series.

feature_names: list
The names of the dataset columns.

target_names: list
The names of target classes.

target_map: dict,
is the mapping of MXS labels in the target y.

nga_labels: arryalike 1D,
is the y predicted for Naive Group of Aquifer.

frame: DataFrame
Only present when as_frame=True. DataFrame with data and target.

DESCR: str
The full description of the dataset.

filename: str
The path to the location of the data.
data, target (tuple if return_X_y is True) – A tuple of two ndarray. The first containing a 2D array of shape (n_samples, n_features) with each row representing one sample and each column representing the features. The second ndarray of shape (n_samples,) containing the target samples.
X, Xt, y, yt (Tuple if split_X_y is True) – A tuple of two ndarray (X, Xt). The first containing a 2D array of training and test data whereas y and yt are training and test labels. The number of samples are based on the test_ratio.

Examples

>>> from watex.datasets.dload import load_mxs
>>> load_mxs (return_X_y= True, key ='sparse', samples ='*')
(<1038x21 sparse matrix of type '<class 'numpy.float64'>'
        with 8298 stored elements in Compressed Sparse Row format>,
 array([1, 1, 1, ..., 5, 5, 5], dtype=int64))