watex.datasets.load_nlogs#

watex.datasets.load_nlogs(*, return_X_y=False, as_frame=False, key=None, split_X_y=False, test_ratio=0.3, tag=None, tnames=None, data_names=None, samples=None, seed=None, shuffle=False, **kws)[source]#

Load the Nanshang Engineering and hydrogeological drilling dataset.

Dataset contains multi-target and can be used for a classification or regression problem.

Parameters:
  • return_X_y (bool, default=False) – If True, returns (data, target) instead of a Bowlspace object. See below for more information about the data and target object.

  • as_frame (bool, default=False) – If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric). The target is a pandas DataFrame or Series depending on the number of target columns. If return_X_y is True, then (data, target) will be pandas DataFrames or Series as described below.

  • split_X_y (bool, default=False,) – If True, the data is splitted to hold the training set (X, y) and the testing set (Xt, yt) with the according to the test size ratio.

  • test_ratio (float, default is {{.3}} i.e. 30% (X, y)) – The ratio to split the data into training (X, y) and testing (Xt, yt) set respectively.

  • tnames (str, optional) – the name of the target to retreive. If None the full target columns are collected and compose a multioutput y. For a singular classification or regression problem, it is recommended to indicate the name of the target that is needed for the learning task.

  • (tag (None) – tag and data_names do nothing. just for API purpose and to allow fetching the same data uing the func:~watex.data.fetch_data since the latter already holds tag and data_names as parameters.

  • data_names) (None) – tag and data_names do nothing. just for API purpose and to allow fetching the same data uing the func:~watex.data.fetch_data since the latter already holds tag and data_names as parameters.

  • key (str, default='b0') – Kind of drilling data to fetch. Can also be the borehole [“ns”]. The ns data refer mostly to engineering drilling whereas the b0 refers to pure hydrogeological drillings. In the former case, the 'ground_height_distance' attribute used to control soil settlement is the target while the latter targets fit the water inflow, the drawdown and the static water level.

  • samples (int,optional) – Ratio or number of items from axis to fetch in the data. fetch all data if samples is None.

  • seed (int, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional) – If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given.

  • shuffle (bool, default =False,) – If True, borehole data should be shuffling before sampling.

Returns:

  • data (Boxspace) – Dictionary-like object, with the following attributes. data : {ndarray, dataframe}

    The data matrix. If as_frame=True, data will be a pandas DataFrame.

    target: {ndarray, Series}

    The classification target. If as_frame=True, target will be a pandas Series.

    feature_names: list

    The names of the dataset columns.

    target_names: list

    The names of target classes.

    frame: DataFrame

    Only present when as_frame=True. DataFrame with data and target. .. versionadded:: 0.1.1

    DESCR: str

    The full description of the dataset.

    filename: str

    The path to the location of the data. .. versionadded:: 0.1.2

  • data, target (tuple if return_X_y is True) – A tuple of two ndarray. The first containing a 2D array of shape (n_samples, n_features) with each row representing one sample and each column representing the features. The second ndarray of shape (n_samples,) containing the target samples. .. versionadded:: 0.1.2

  • X, Xt, y, yt (Tuple if split_X_y is True) – A tuple of two ndarray (X, Xt). The first containing a 2D array of:

    \[ \begin{align}\begin{aligned}\text{shape}(X, y) = 1- \text{test_ratio} * (n_{samples}, n_{features}) *100\\\text{shape}(Xt, yt)= \text{test_ratio} * (n_{samples}, n_{features}) *100\end{aligned}\end{align} \]

    where each row representing one sample and each column representing the features. The second ndarray of shape(n_samples,) containing the target samples.

Examples

Let’s say ,we do not have any idea of the columns that compose the target, thus, the best approach is to run the function without passing any parameters and then DESCR attributes to get the unit of each attribute:

>>> from watex.datasets.dload import load_nlogs
>>> b= load_nlogs()
>>> b.target_names

Out[241]: [‘static_water_level’,

‘drawdown’, ‘water_inflow’, ‘unit_water_inflow’, ‘water_inflow_in_m3_d’]

>>> b.DESCR
... (...)
>>> # Let's say we are interested of the targets 'drawdown' and
>>> # 'static_water_level' and returns `y'
>>> _, y = load_nlogs (as_frame=True, # return as frame X and y
                       tnames =['drawdown','static_water_level'],
                       )
>>> list(y.columns)
... ['drawdown', 'static_water_level']
>>> y.head(2)
   drawdown  static_water_level
0     70.03                4.21
1      7.38                3.60