watex.view.ExPlot.plotmissing#

ExPlot.plotmissing(*, kind=None, sample=None, **kwd)[source]#

Vizualize patterns in the missing data.

Parameters
  • data (Dataframe or shape (M, N) from pandas.DataFrame) – Dataframe containing samples M and features N

  • kind (str, Optional) –

    kind of visualization. Can be dendrogramm, mbar or bar plot for dendrogram , msno bar and plt visualization respectively:

    • bar plot counts the nonmissing data using pandas

    • mbar use the msno package to count the number

      of nonmissing data.

    • dendrogram`` show the clusterings of where the data is missing.

      leaves that are the same level predict one onother presence (empty of filled). The vertical arms are used to indicate how different cluster are. short arms mean that branch are similar.

    • ``corr` creates a heat map showing if there are correlations

      where the data is missing. In this case, it does look like the locations where missing data are corollated.

    • mpatterns is the default vizualisation. It is useful for viewing

      contiguous area of the missing data which would indicate that the missing data is not random. The matrix function includes a sparkline along the right side. Patterns here would also indicate non-random missing data. It is recommended to limit the number of sample to be able to see the patterns.

    Any other value will raise an error

  • sample (int, Optional) – Number of row to visualize. This is usefull when data is composed of many rows. Skrunked the data to keep some sample for visualization is recommended. None plot all the samples ( or examples) in the data

  • kws (dict) – Additional keywords arguments of msno.matrix plot.

Returns

``self`` – returns self for easy method chaining.

Return type

ExPlot instance

Example

>>> import pandas as pd
>>> from watex.view import ExPlot
>>> data = pd.read_csv ('data/geodata/main.bagciv.data.csv' )
>>> p = ExPlot().fit(data)
>>> p.fig_size = (12, 4)
>>> p.plotmissing(kind ='corr')