watex.datasets.fetch_data#
- watex.datasets.fetch_data(tag, **kws)[source]#
Fetch dataset from tag.
A tag corresponds to the name area of data collection or each level of data processing.
- Parameters
tag (str, ['bagoue', 'tankesse', 'semien', 'iris', 'boundiali', 'gbalo']) –
name of the area of data to fetch. For instance set the tag to
bagouewill load the bagoue datasets. If the tag name is following by a suffix, the later specifies the stage of the data processing. As an example, bagoue original or bagoue prepared will retrieve the original data and the transformed data after applying default transformers respectively.There are different options to retrieve data such as:
- [‘original’] => original or raw data -& returns a dict of details
contex combine with get method to get the dataframe like:
>>> fetch_data ('bagoue original').get ('data=df')
[‘stratified’] => stratification data
- [‘mid’ |'semi'|’preprocess’|’fit’]=> data cleaned with
attributes experience combinaisons.
[‘pipe’]=> default pipeline created during the data preparing.
- [‘analyses’|’pca’|’reduce dimension’]=> data with text attributes
only encoded using the ordinal encoder + attributes combinaisons.
[‘test’] => stratified test set data
- Returns
dict, X, y –
- If tag is following by suffix in the case of ‘bagoue’ area, it returns:
data: Original data
X, y : Stratified train set and training target
- X0, y0: data cleaned after dropping useless features and combined
numerical attributes combinaisons if
True
- X_prepared, y_prepared: Data prepared after applying all the
transformation via the transformer (pipeline).
XT, yT : stratified test set and test label
- _X: Stratified training set for data analysis. So None sparse
matrix is contained. The text attributes (categorical) are converted using Ordianal Encoder.
_pipeline: the default pipeline.
- Return type
frame of
Boxspaceobject
Examples
>>> from watex.datasets import fetch_data >>> b = fetch_data('bagoue' ) # no suffix returns 'Boxspace' object >>> b.tnames ... array(['flow'], dtype='<U4') >>> b.feature_names ... ['num', 'name', 'east', 'north', 'power', 'magnitude', 'shape', 'type', 'sfi', 'ohmS', 'lwi', 'geol'] >>> X, y = fetch_data('bagoue prepared' ) >>> X # is transformed # ready for prediction >>> X[0] ... <1x18 sparse matrix of type '<class 'numpy.float64'>' with 8 stored elements in Compressed Sparse Row format> >>> y ... array([2, 1, 2, 2, 1, 0, ... , 3, 2, 3, 3, 2], dtype=int64)