watex.utils.correlatedfeatures#

watex.utils.correlatedfeatures(df, corr='pearson', threshold=0.95, fmt=False)[source]#

Find the correlated features/columns in the dataframe.

Indeed, highly correlated columns don’t add value and can throw off features importance and interpretation of regression coefficients. If we had correlated columns, choose to remove either the columns from level_0 or level_1 from the features data is a good choice.

Parameters:
  • df (Dataframe or shape (M, N) from pandas.DataFrame) – Dataframe containing samples M and features N

  • corr (str, ['pearson'|'spearman'|'covariance']) – Method of correlation to perform. Note that the ‘person’ and ‘covariance’ don’t support string value. If such kind of data is given, turn the corr to spearman. default is pearson

  • threshold (int, default is 0.95) – the value from which can be considered as a correlated data. Should not be greater than 1.

  • fmt (bool, default {False}) – format the correlated dataframe values

Returns:

df – Dataframe with cilumns equals to [level_0, level_1, pearson]

Return type:

pandas.DataFrame

Examples

>>> from watex.utils.mlutils import correlatedcolumns
>>> df_corr = correlatedcolumns (data , corr='spearman',
                                 fmt=None, threshold=.95
                                 )