watex.datasets.load_mxs#

watex.datasets.load_mxs(*, return_X_y=False, as_frame=False, key=None, tag=None, samples=None, tnames=None, data_names=None, split_X_y=False, seed=None, shuffle=False, test_ratio=0.2, **kws)[source]#

Load the dataset after implementing the mixture learning strategy (MXS).

Dataset is composed of 11 boreholes merged with multiple-target that can be used for a classification problem.

Parameters:
  • return_X_y (bool, default=False) – If True, returns (data, target) instead of a Bowlspace object. See below for more information about the data and target object.

  • as_frame (bool, default=False) – If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric). The target is a pandas DataFrame or Series depending on the number of target columns. If return_X_y is True, then (data, target) will be pandas DataFrames or Series as described below.

  • split_X_y (bool, default=False,) – If True, the data is splitted to hold the training set (X, y) and the testing set (Xt, yt) with with test ratio fixed to 20 %

  • tnames (str, optional) – the name of the target to retreive. If None the full target columns are collected and compose a multioutput y. For a singular classification or regression problem, it is recommended to indicate the name of the target that is needed for the learning task.

  • (tag (None) – tag and data_names do nothing. just for API purpose and to allow fetching the same data uing the func:~watex.data.fetch_data since the latter already holds tag and data_names as parameters.

  • data_names) (None) – tag and data_names do nothing. just for API purpose and to allow fetching the same data uing the func:~watex.data.fetch_data since the latter already holds tag and data_names as parameters.

  • samples (int,optional) – Ratio or number of items from axis to fetch in the data. Default = .5 if samples is None.

  • key (str, default='data') –

    Kind of MXS data to fetch. Can also be:

    • ”sparse”: for a compressed sparsed matric of train set X

    • ”scale”: returns a scaled X using the standardization strategy

    • ”num”: Exclusive numerical data and exclude the ‘strata’ feature.

    • ”test”: test data X and y

    • ”train”: train data X and y with preprocessing already performed

    • ”raw”: for original dataset X and y with no preprocessing

    • ”data”: Default when key is not supplied. It returns the Bowlspace objects.

    When k is not supplied, “data” is used instead and return a Bowlspace objects. where:

    • target_map: is the mapping of MXS labels in the target y.

    • nga_labels: is the y predicted for Naive Group of Aquifer.

  • drop_observations (bool, default='False') – Drop the remark column in the logging data if set to True.

  • seed (int, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional) – If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given.

  • shuffle (bool, default =False,) – If True, borehole data should be shuffling before sampling.

  • test_ratio (float, default is 0.2 i.e. 20% (X, y)) – The ratio to split the data into training (X, y) and testing (Xt, yt) set respectively.

Returns:

  • data (Boxspace) – Dictionary-like object, with the following attributes. data : {ndarray, dataframe}

    The data matrix. If as_frame=True, data will be a pandas DataFrame.

    target: {ndarray, Series}

    The classification target. If as_frame=True, target will be a pandas Series.

    feature_names: list

    The names of the dataset columns.

    target_names: list

    The names of target classes.

    target_map: dict,

    is the mapping of MXS labels in the target y.

    nga_labels: arryalike 1D,

    is the y predicted for Naive Group of Aquifer.

    frame: DataFrame

    Only present when as_frame=True. DataFrame with data and target.

    DESCR: str

    The full description of the dataset.

    filename: str

    The path to the location of the data.

  • data, target (tuple if return_X_y is True) – A tuple of two ndarray. The first containing a 2D array of shape (n_samples, n_features) with each row representing one sample and each column representing the features. The second ndarray of shape (n_samples,) containing the target samples. .. versionadded:: 0.1.2

  • X, Xt, y, yt (Tuple if split_X_y is True) – A tuple of two ndarray (X, Xt). The first containing a 2D array of:

Examples

>>> from watex.datasets.dload import load_mxs
>>> load_mxs (return_X_y= True, key ='sparse', samples ='*')
(<1038x21 sparse matrix of type '<class 'numpy.float64'>'
        with 8298 stored elements in Compressed Sparse Row format>,
 array([1, 1, 1, ..., 5, 5, 5], dtype=int64))