Remote Loader#

Fetch data online from zenodo record or repository.

class watex.datasets.rload.Loader(zenodo_record=None, content_url=None, repo_url=None, tgz_file=None, blobcontent_url=None, zip_or_rar_file=None, csv_file=None, verbose=0)[source]#

Bases: object

Load data from online

Parameters:
  • *zenodo_record* (str) – A zenod digital object identifier (doi) of filepath to zenodo record.

  • *content_url* (str,) – File path to the repository user content. If your use GitHub where the data is located in default branch for example a master branch, it can be ‘https://raw.githubusercontent.com/WEgeophysics/watex/master/

  • *repo_url* (str) – A url for repository that host the project

  • *tgzfile* (str,) – Data can be save in TGZ file format. It that is the case, can provide to fetch the data if all attempt to fetched the file failed.

  • *verbose* (int,) – Level of verbosity. Higher equals to more messages.

  • *root2blobcontent* (str) – Root to blob master is a nested way to the convenient way to retrieve raw data in GitHUB

  • *csv_file* (str) – Path to the main csv file to retreive in the record.

property f#
fit(f=None)[source]#

Retreive Bagoue dataset from Github repository or zenodo record.

It will take a while when fetching data for the first time outsite of this repository. Since cloning the repository come with examples dataset located to its appropriate directory. It’s probably a rare to fectch using internet unless dataset as well as the tarfile are deleted from its located directory.

Parameters:

f (str) – f is the reference to the main file containing the data acting like a path -like object.

Return type:

self Loader instance

Notes

Retreiving dataset line Bagoue dataset from Github repository or zenodo record. It could take a while to fetch data for the first time outsite of therepository. Since cloning the repository come with examples dataset located to its appropriate directory, it’s probably not useful to fectch the data from internet unless the dataset ( with the tarfileor not ) are deleted from the local directory.

Example

>>> from watex.datasets.load import Loader
>>> loadObj = Loader (
        zenodo_record= '10.5281/zenodo.5571534',
        content_url=  'https://raw.githubusercontent.com/WEgeophysics/watex/master/',
        repo_url= 'https://github.com/WEgeophysics/watex',
        tgz_file='https://raw.githubusercontent.com/WEgeophysics/watex/master/data/__tar.tgz/fmain.bagciv.data.tar.gz',
        blobcontent_url =   'https://github.com/WEgeophysics/watex/blob/master/',
        zip_or_rar_file= 'BagoueCIV__dataset__main.rar',
        csv_file =  '/__tar.tgz_files__/___fmain.bagciv.data.csv',
        verbose=  10
        )
>>> loadObj.fit('data/geodata/main.bagciv.data.csv')
... ### -> Wait while decompressing 'fmain.bagciv.data.tar.gz' file ...
... --- -> Fail to decompress 'fmain.bagciv.data.tar.gz' file
... --- -> 'main.bagciv.data.csv' not found in the  local machine
... ### -> Wait while fetching data from 'https://raw.githubusercontent.com/WEgeophysics/watex/master/'...
... +++ -> Load data from 'https://raw.githubusercontent.com/WEgeophysics/watex/master/' successfully done!
dataset: 100%|##################################| 1/1 [00:04<00:00,  4.95s/B]
Out[23]: <watex.datasets.load.Loader at 0x2210bedf880>
unZipFileFetchedFromZenodo(f=None, zip_or_rar_file=None, csv_file=None)[source]#

Unzip or Unrar the archived file and shift from the local directory created if not exits.

Parameters:
  • f (str) – Path -like object. f is the main file containing the data

  • zip_or_rar (str) – Path like object to *.zip or *.rar file.

  • csv_file (str) – Path to the main csv file to retreive in the record.

Returns:

str

Return type:

path like object to the unzipped File

property update_zenodo_record#
watex.datasets.rload.fetchSingleRARData(zip_file, member_to_extract, zipdir, verbose)[source]#

RAR archived file domwloading process.

watex.datasets.rload.fetchSingleZIPData(zip_file, zipdir, **zip_kws)[source]#

Find only the archived zip file and save to the current directory.

Parameters:
  • zip_file (str or Path-like obj) – Name of archived zip file

  • zipdir (str or Path-like obj) – Directory where zip_file is located.

Examples

>>> from watex.datasets.property import fetchSingleZIPData
>>> fetchSingleZIPData(zip_file= zip_file, zipdir = zipdir,
     file_to_extract='__tar.tgz_files__/___fmain.bagciv.data.csv',
    savepath=save_zip_file, rename_outfile='main.bagciv.data.csv')
watex.datasets.rload.loadBagoueDataset()[source]#

Load a Bagoue dataset

Example

>>> from watex.datasets import Loader
>>> loadBagoueDataset ()
... dataset:   0%|                                          | 0/1 [00:00<?, ?B/s]
... ### -> Wait while decompressing 'fmain.bagciv.data.tar.gz' file ...
... --- -> Fail to decompress 'fmain.bagciv.data.tar.gz' file
... --- -> 'main.bagciv.data.csv' not found in the  local machine
... ### -> Wait while fetching data from 'https://raw.githubusercontent.com/WEgeophysics/watex/master/'...
... +++ -> Load data from 'https://raw.githubusercontent.com/WEgeophysics/watex/master/' successfully done!
... dataset: 100%|##################################| 1/1 [00:03<00:00,  3.38s/B]
watex.datasets.rload.move_file(filename, directory)[source]#
watex.datasets.rload.retrieveZIPmember(zipObj, *, file_to_extract='__tar.tgz_files__/___fmain.bagciv.data.csv', savepath=None, rename_outfile='main.bagciv.data.csv')[source]#

Retreive member from zip and collapse the extracted directory by ” “saving into a new directory

Parameters:
  • ZipObj (Obj zip) – Reference zip object

  • file_to_extract (str or Path-Like Object) – File to extract existing in zip archived. It should be a name list of archived file.

  • savepath (str or Path-Like obj) – Destination path after fetching the single data from zip archive.

  • rename_outfile (str or Path-Like obj) – Rename the file_to_extract if think it necessary.

Returns:

  • The name of path retreived. If file is renamed than shoud take it

  • new names.