watex.base.Missing#
- class watex.base.Missing(in_percent=False, sample=None, kind=None, drop_columns=None, **kws)[source]#
Deal with missing values in Data
Most algorithms will not work with missing data. Notable exceptions are the recent boosting libraries such as the XGBoost (watex.documentation.xgboost.__doc__) CatBoost and LightGBM. As with many things in machine learning , there are no hard answaers for how to treat a missing data. Also, missing data could represent different situations. There are three warious way to handle missing data:
* Remove any row with missing data * Remove any columns with missing data * Impute missing values * Create an indicator columns to indicator data was missing
- Parameters:
in_percent (bool,) β give the statistic of missing data in percentage if ser to
True.sample (int, Optional,) β Number of row to visualize or the limit of the number of sample to be able to see the patterns. This is usefull when data is composed of many rows. Skrunked the data to keep some sample for visualization is recommended.
Noneplot all the samples ( or examples) in the datakind (str, Optional) β
type of visualization. Can be
dendrogramm,mbarorbar.corrplot for dendrogram ,msnobar,pltandmsnocorrelation visualization respectively:barplot counts the nonmissing data using pandasmbaruse themsnopackage to count the numberof nonmissing data.
- dendrogram`` show the clusterings of where the data is missing.
leaves that are the same level predict one onother presence (empty of filled). The vertical arms are used to indicate how different cluster are. short arms mean that branch are similar.
- ``corr` creates a heat map showing if there are correlations
where the data is missing. In this case, it does look like the locations where missing data are corollated.
Noneis the default vizualisation. It is useful for viewingcontiguous area of the missing data which would indicate that the missing data is not random. The
matrixfunction includes a sparkline along the right side. Patterns here would also indicate non-random missing data. It is recommended to limit the number of sample to be able to see the patterns.
Any other value will raise an error
Examples
>>> from watex.base import Missing >>> data ='data/geodata/main.bagciv.data.csv' >>> ms= Missing().fit(data) >>> ms.plot_.fig_size = (12, 4 ) >>> ms.plot ()