watex.utils.get_compressed_vector#

watex.utils.get_compressed_vector(d, /, sname, stratum=None, strategy='average', as_frame=False, random_state=None)[source]#

Compresses base stratum data into a singular vector composed of all feature names in the targetted data d.

Parameters
  • d (pandas DataFrame) – Valid data containing the strata. If dataframe is passed, ‘sname’ is needed to fetch strata values.

  • sname (str, optional) – Name of column in the dataframe that contains the strata values. Dont confuse ‘sname’ with ‘stratum’ which is the name of the valid layer/rock in the array/Series of strata.

  • stratum (str, optional) – Name of the base stratum. Must be self contain as an item of the strata data. Note that if stratum is passed, the auto-detection of base stratum is not triggered. It returns the same stratum , however it can gives the rate and occurence of this stratum if return_rate or return_counts is set to True.

  • strategy (str , default='average' or 'mean',) – strategy used to select or compute the numerical data into a singular series. It can be [‘naive’]. In that case , a single serie if randomly picked up into the base strata data.

  • as_frame (bool, default='False') – Returns compressed vector into a dataframe rather that keeping in series.

  • random_state (int, optional,) – State for randomly selected a compressed vector when naive is passed as strategy.

Returns

ms – returns a compressed vector in pandas series compose of all features. Note , the vector here does not refer as math vector compose of numerical values only. A compressed vector here is a series that is the result of averaging the numerical features of the base stratum and incluing its corresponding categorical values. Note there, the ms can contain categorical values and has the same number and features as the original frame d.

Return type

pandas series/dataframe

Example

>>> from watex.datasets import load_hlogs
>>> from watex.utils.hydroutils import get_compressed_vector
>>> data = load_hlogs().frame # get only the frame
>>> get_compressed_vector (data, sname='strata_name')[:4]
... hole_number           H502
    strata_name      siltstone
    aquifer_group           II
    pumping_level       ZFSAII
    dtype: object
>>> get_compressed_vector (data, sname='strata_name', as_frame=True )
...   hole_number strata_name aquifer_group  ...        r     rp remark
    0        H502   siltstone            II  ...  41.7075  59.23    NaN
    [1 rows x 23 columns]
>>> get_compressed_vector (data, sname='strata_name', strategy='naive')
... hole_number          H502
    depth_top          379.15
    depth_bottom        379.7
    strata_name     siltstone
    Name: 39, dtype: object