watex.datasets.load_hlogs#
- watex.datasets.load_hlogs(*, return_X_y=False, as_frame=False, key=None, split_X_y=False, test_size=0.3, tag=None, tnames=None, data_names=None, **kws)[source]#
Load the hydro-logging dataset.
Dataset contains multi-target and can be used for a classification or regression problem.
- Parameters:
return_X_y (bool, default=False) – If True, returns
(data, target)instead of a Bowlspace object. See below for more information about the data and target object. .. versionadded:: 0.1.2as_frame (bool, default=False) – If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric). The target is a pandas DataFrame or Series depending on the number of target columns. If return_X_y is True, then (data, target) will be pandas DataFrames or Series as described below. .. versionadded:: 0.1.3
split_X_y (bool, default=False,) – If True, the data is splitted to hold the training set (X, y) and the testing set (Xt, yt) with the according to the test size ratio.
test_size (float, default is {{.3}} i.e. 30% (X, y)) – The ratio to split the data into training (X, y) and testing (Xt, yt) set respectively.
tnames (str, optional) – the name of the target to retreive. If
Nonethe full target columns are collected and compose a multioutput y. For a singular classification or regression problem, it is recommended to indicate the name of the target that is needed for the learning task.(tag (None) – tag and data_names do nothing. just for API purpose and to allow fetching the same data uing the func:~watex.data.fetch_data since the latter already holds tag and data_names as parameters.
data_names) (None) – tag and data_names do nothing. just for API purpose and to allow fetching the same data uing the func:~watex.data.fetch_data since the latter already holds tag and data_names as parameters.
key (str, default='h502') – Kind of logging data to fetch. Can also be the borehole [“h2601”, “*”]. If
key='*', all the data is aggregated on a single frame of borehole. .. versionadded:: 0.1.5drop_observations (bool, default='False') – Drop the
remarkcolumn in the logging data if set toTrue. .. versionadded:: 0.1.5
- Returns:
data (
Boxspace) – Dictionary-like object, with the following attributes. data : {ndarray, dataframe}The data matrix. If
as_frame=True, data will be a pandas DataFrame.- target: {ndarray, Series}
The classification target. If as_frame=True, target will be a pandas Series.
- feature_names: list
The names of the dataset columns.
- target_names: list
The names of target classes.
- frame: DataFrame
Only present when as_frame=True. DataFrame with data and target. .. versionadded:: 0.1.1
- DESCR: str
The full description of the dataset.
- filename: str
The path to the location of the data. .. versionadded:: 0.1.2
data, target (tuple if
return_X_yis True) – A tuple of two ndarray. The first containing a 2D array of shape (n_samples, n_features) with each row representing one sample and each column representing the features. The second ndarray of shape (n_samples,) containing the target samples. .. versionadded:: 0.1.2X, Xt, y, yt (Tuple if
split_X_yis True) – A tuple of two ndarray (X, Xt). The first containing a 2D array of:\[ \begin{align}\begin{aligned}\text{shape}(X, y) = 1- \text{test_ratio} * (n_{samples}, n_{features}) *100\\\text{shape}(Xt, yt)= \text{test_ratio} * (n_{samples}, n_{features}) *100\end{aligned}\end{align} \]where each row representing one sample and each column representing the features. The second ndarray of shape(n_samples,) containing the target samples.
Examples
Let’s say ,we do not have any idea of the columns that compose the target, thus, the best approach is to run the function without passing any parameters:
>>> from watex.datasets.dload import load_hlogs >>> b= load_hlogs() >>> b.target_names
- [‘aquifer_group’,
‘pumping_level’, ‘aquifer_thickness’, ‘hole_depth’, ‘pumping_depth’, ‘section_aperture’, ‘k’, ‘kp’, ‘r’, ‘rp’, ‘remark’]
>>> # Let's say we are interested of the targets 'pumping_level' and >>> # 'aquifer_thickness' and returns `y' >>> _, y = load_hlogs (as_frame=True, # return as frame X and y tnames =['pumping_level','aquifer_thickness'], ) >>> list(y.columns) ... ['pumping_level', 'aquifer_thickness']