watex.datasets.load_nlogs#
- watex.datasets.load_nlogs(*, return_X_y=False, as_frame=False, key=None, years=None, split_X_y=False, test_ratio=0.3, tag=None, tnames=None, data_names=None, samples=None, seed=None, shuffle=False, **kws)[source]#
Load the Nanshang Engineering and hydrogeological drilling dataset.
Dataset contains multi-target and can be used for a classification or regression problem.
- Parameters:
return_X_y (bool, default=False) – If True, returns
(data, target)instead of a Bowlspace object. See below for more information about the data and target object.as_frame (bool, default=False) – If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric). The target is a pandas DataFrame or Series depending on the number of target columns. If return_X_y is True, then (data, target) will be pandas DataFrames or Series as described below.
split_X_y (bool, default=False,) – If True, the data is splitted to hold the training set (X, y) and the testing set (Xt, yt) with the according to the test size ratio.
test_ratio (float, default is {{.3}} i.e. 30% (X, y)) – The ratio to split the data into training (X, y) and testing (Xt, yt) set respectively.
tnames (str, optional) – the name of the target to retreive. If
Nonethe full target columns are collected and compose a multioutput y. For a singular classification or regression problem, it is recommended to indicate the name of the target that is needed for the learning task. When collecting data for land subsidence withkey="ls", tnames and years are used interchangeability.(tag (None) – tag and data_names do nothing. just for API purpose and to allow fetching the same data uing the func:~watex.data.fetch_data since the latter already holds tag and data_names as parameters.
data_names) (None) – tag and data_names do nothing. just for API purpose and to allow fetching the same data uing the func:~watex.data.fetch_data since the latter already holds tag and data_names as parameters.
key (str, default='b0') – Kind of drilling data to fetch. Can also be the borehole [“ns”, “ls”]. The
nsdata refer mostly to engineering drilling whereas theb0refers to pure hydrogeological drillings. In the former case, the'ground_height_distance'attribute used to control soil settlement is the target while the latter targets fit the water inflow, the drawdown and the static water level. The “ls” key is used for collection the times series land subsidence data from 2015-2018. It should be used in combinaison with the years parameter for collecting the specific year data. The default land-subsidence data is2022.years (str, default="2022") –
the year of land subsidence. Note that land subsidence data are collected from 2015 to 2022. For instance to select two years subsidence, use space between years like
years ="2015 2022". The star*argument can be used for selecting all years data.New in version 0.2.7: Years of Nanshan land subsidence data collected are added. Use key ls and years for retrieving the subsidence data of each year.
samples (int,optional) – Ratio or number of items from axis to fetch in the data. fetch all data if samples is
None.seed (int, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional) – If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given.
shuffle (bool, default =False,) – If
True, borehole data should be shuffling before sampling.drop_display_rate (bool, default=True) –
Display the rate is used for image visualization. To increase the image pixels.
- Returns:
data (
Boxspace) – Dictionary-like object, with the following attributes. data : {ndarray, dataframe}The data matrix. If
as_frame=True, data will be a pandas DataFrame.- target: {ndarray, Series}
The classification target. If as_frame=True, target will be a pandas Series.
- feature_names: list
The names of the dataset columns.
- target_names: list
The names of target classes.
- frame: DataFrame
Only present when as_frame=True. DataFrame with data and target. .. versionadded:: 0.1.1
- DESCR: str
The full description of the dataset.
- filename: str
The path to the location of the data. .. versionadded:: 0.1.2
data, target (tuple if
return_X_yis True) – A tuple of two ndarray. The first containing a 2D array of shape (n_samples, n_features) with each row representing one sample and each column representing the features. The second ndarray of shape (n_samples,) containing the target samples. .. versionadded:: 0.1.2X, Xt, y, yt (Tuple if
split_X_yis True) – A tuple of two ndarray (X, Xt). The first containing a 2D array of:\[ \begin{align}\begin{aligned}\text{shape}(X, y) = 1- \text{test_ratio} * (n_{samples}, n_{features}) *100\\\text{shape}(Xt, yt)= \text{test_ratio} * (n_{samples}, n_{features}) *100\end{aligned}\end{align} \]where each row representing one sample and each column representing the features. The second ndarray of shape(n_samples,) containing the target samples.
Examples
Let’s say ,we do not have any idea of the columns that compose the target, thus, the best approach is to run the function without passing any parameters and then DESCR attributes to get the unit of each attribute:
>>> from watex.datasets.dload import load_nlogs >>> b= load_nlogs() >>> b.target_names
Out[241]: [‘static_water_level’,
‘drawdown’, ‘water_inflow’, ‘unit_water_inflow’, ‘water_inflow_in_m3_d’]
>>> b.DESCR ... (...) >>> # Let's say we are interested of the targets 'drawdown' and >>> # 'static_water_level' and returns `y' >>> _, y = load_nlogs (as_frame=True, # return as frame X and y tnames =['drawdown','static_water_level'], ) >>> list(y.columns) ... ['drawdown', 'static_water_level'] >>> y.head(2) drawdown static_water_level 0 70.03 4.21 1 7.38 3.60 >>> # let say we want subsidence data of 2015 and 2018 with the >>> # diplay resolution rate. Because the display is removed, we must set >>> # it to False so keep it included in the data. >>> n= load_nlogs (key ='ls', samples = 3 , years = "2015 2018 disp", drop_display_rate =False ) >>> n.frame easting northing longitude ... 2015 2018 disp_rate 0 2.531191e+06 1.973515e+07 113.291328 ... -0.494959 -27.531837 -7.352538 1 2.531536e+06 1.973519e+07 113.291847 ... -1.104473 -21.852705 -7.999145 2 2.531479e+06 1.973520e+07 113.291847 ... -1.139404 -22.022655 -7.894940