watex.datasets.load_mxs#
- watex.datasets.load_mxs(*, return_X_y=False, as_frame=False, key=None, tag=None, samples=None, tnames=None, data_names=None, split_X_y=False, seed=None, shuffle=False, test_ratio=0.2, **kws)[source]#
Load the dataset after implementing the mixture learning strategy (MXS).
Dataset is composed of 11 boreholes merged with multiple-target that can be used for a classification problem.
- Parameters
return_X_y (bool, default=False) – If True, returns
(data, target)instead of a Bowlspace object. See below for more information about the data and target object.as_frame (bool, default=False) – If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric). The target is a pandas DataFrame or Series depending on the number of target columns. If return_X_y is True, then (data, target) will be pandas DataFrames or Series as described below.
split_X_y (bool, default=False,) – If True, the data is splitted to hold the training set (X, y) and the testing set (Xt, yt) based on to the test_ratio value.
tnames (str, optional) – the name of the target to retrieve. If
Nonethe full target columns are collected and compose a multioutput y. For a singular classification or regression problem, it is recommended to indicate the name of the target that is needed for the learning task.(tag (None) – tag and data_names do nothing. just for API purpose and to allow fetching the same data uing the func:~watex.data.fetch_data since the latter already holds tag and data_names as parameters.
data_names) (None) – tag and data_names do nothing. just for API purpose and to allow fetching the same data uing the func:~watex.data.fetch_data since the latter already holds tag and data_names as parameters.
samples (int,optional) – Ratio or number of items from axis to fetch in the data. Default = .5 if samples is
None.key (str, default='data') –
Kind of MXS data to fetch. Can also be:
”sparse”: for a compressed sparsed row matrix format of train set X.
”scale”: returns a scaled X using the standardization strategy
”num”: Exclusive numerical data and exclude the ‘strata’ feature.
”test”: test data X and y
”train”: train data X and y with preprocessing already performed
”raw”: for original dataset X and y with no preprocessing
”data”: Default when key is not supplied. It returns the
Bowlspaceobjects.
When k is not supplied, “data” is used instead and return a
Bowlspaceobjects. where:target_map: is the mapping of MXS labels in the target y.
nga_labels: is the y predicted for Naive Group of Aquifer.
drop_observations (bool, default='False') – Drop the
remarkcolumn in the logging data if set toTrue.seed (int, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional) – If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given.
shuffle (bool, default =False,) – If
True, borehole data should be shuffling before sampling.test_ratio (float, default is 0.2 i.e. 20% (X, y)) – The ratio to split the data into training (X, y) and testing (Xt, yt) set respectively.
- Returns
data (
Boxspace) – Dictionary-like object, with the following attributes. data : {ndarray, dataframe}The data matrix. If
as_frame=True, data will be a pandas DataFrame.- target: {ndarray, Series}
The classification target. If as_frame=True, target will be a pandas Series.
- feature_names: list
The names of the dataset columns.
- target_names: list
The names of target classes.
- target_map: dict,
is the mapping of MXS labels in the target y.
- nga_labels: arryalike 1D,
is the y predicted for Naive Group of Aquifer.
- frame: DataFrame
Only present when as_frame=True. DataFrame with data and target.
- DESCR: str
The full description of the dataset.
- filename: str
The path to the location of the data.
data, target (tuple if
return_X_yis True) – A tuple of two ndarray. The first containing a 2D array of shape (n_samples, n_features) with each row representing one sample and each column representing the features. The second ndarray of shape (n_samples,) containing the target samples.X, Xt, y, yt (Tuple if
split_X_yis True) – A tuple of two ndarray (X, Xt). The first containing a 2D array of training and test data whereas y and yt are training and test labels. The number of samples are based on the test_ratio.
Examples
>>> from watex.datasets.dload import load_mxs >>> load_mxs (return_X_y= True, key ='sparse', samples ='*') (<1038x21 sparse matrix of type '<class 'numpy.float64'>' with 8298 stored elements in Compressed Sparse Row format>, array([1, 1, 1, ..., 5, 5, 5], dtype=int64))