watex.utils.get_compressed_vector#
- watex.utils.get_compressed_vector(d, /, sname, stratum=None, strategy='average', as_frame=False, random_state=None)[source]#
Compresses base stratum data into a singular vector composed of all feature names in the targetted data d.
- Parameters
d (pandas DataFrame) – Valid data containing the strata. If dataframe is passed, ‘sname’ is needed to fetch strata values.
sname (str, optional) – Name of column in the dataframe that contains the strata values. Dont confuse ‘sname’ with ‘stratum’ which is the name of the valid layer/rock in the array/Series of strata.
stratum (str, optional) – Name of the base stratum. Must be self contain as an item of the strata data. Note that if stratum is passed, the auto-detection of base stratum is not triggered. It returns the same stratum , however it can gives the rate and occurence of this stratum if return_rate or return_counts is set to
True.strategy (str , default='average' or 'mean',) – strategy used to select or compute the numerical data into a singular series. It can be [‘naive’]. In that case , a single serie if randomly picked up into the base strata data.
as_frame (bool, default='False') – Returns compressed vector into a dataframe rather that keeping in series.
random_state (int, optional,) – State for randomly selected a compressed vector when
naiveis passed as strategy.
- Returns
ms – returns a compressed vector in pandas series compose of all features. Note , the vector here does not refer as math vector compose of numerical values only. A compressed vector here is a series that is the result of averaging the numerical features of the base stratum and incluing its corresponding categorical values. Note there, the ms can contain categorical values and has the same number and features as the original frame d.
- Return type
pandas series/dataframe
Example
>>> from watex.datasets import load_hlogs >>> from watex.utils.hydroutils import get_compressed_vector >>> data = load_hlogs().frame # get only the frame >>> get_compressed_vector (data, sname='strata_name')[:4] ... hole_number H502 strata_name siltstone aquifer_group II pumping_level ZFSAII dtype: object >>> get_compressed_vector (data, sname='strata_name', as_frame=True ) ... hole_number strata_name aquifer_group ... r rp remark 0 H502 siltstone II ... 41.7075 59.23 NaN [1 rows x 23 columns] >>> get_compressed_vector (data, sname='strata_name', strategy='naive') ... hole_number H502 depth_top 379.15 depth_bottom 379.7 strata_name siltstone Name: 39, dtype: object