watex.utils.correlatedfeatures#
- watex.utils.correlatedfeatures(df, corr='pearson', threshold=0.95, fmt=False)[source]#
Find the correlated features/columns in the dataframe.
Indeed, highly correlated columns don’t add value and can throw off features importance and interpretation of regression coefficients. If we had correlated columns, choose to remove either the columns from level_0 or level_1 from the features data is a good choice.
- Parameters
df (Dataframe or shape (M, N) from
pandas.DataFrame) – Dataframe containing samples M and features Ncorr (str, ['pearson'|'spearman'|'covariance']) – Method of correlation to perform. Note that the ‘person’ and ‘covariance’ don’t support string value. If such kind of data is given, turn the corr to spearman. default is
pearsonthreshold (int, default is
0.95) – the value from which can be considered as a correlated data. Should not be greater than 1.fmt (bool, default {
False}) – format the correlated dataframe values
- Returns
df – Dataframe with cilumns equals to [level_0, level_1, pearson]
- Return type
pandas.DataFrame
Examples
>>> from watex.utils.mlutils import correlatedcolumns >>> df_corr = correlatedcolumns (data , corr='spearman', fmt=None, threshold=.95 )