watex.view.plotDendrogram#

watex.view.plotDendrogram(df, columns=None, labels=None, metric='euclidean', method='complete', kind=None, return_r=False, verbose=False, **kwd)[source]#

Visualizes the linkage matrix in the results of dendrogram.

Note that the categorical features if exist in the dataframe should automatically be discarded.

Parameters
  • df (dataframe or NDArray of (n_samples, n_features)) – dataframe of Ndarray. If array is given , must specify the column names to much the array shape 1

  • columns (list) – list of labels to name each columns of arrays of (n_samples, n_features) If dataframe is given, don’t need to specify the columns.

  • kind (str, ['squareform'|'condense'|'design'], default is {'design'}) – kind of approach to summing up the linkage matrix. Indeed, a condensed distance matrix is a flat array containing the upper triangular of the distance matrix. This is the form that pdist returns. Alternatively, a collection of \(m\) observation vectors in \(n\) dimensions may be passed as an \(m\) by \(n\) array. All elements of the condensed distance matrix must be finite, i.e., no NaNs or infs. Alternatively, we could used the squareform distance matrix to yield different distance values than expected. the design approach uses the complete inpout example matrix also called ‘design matrix’ to lead correct linkage matrix similar to squareform and condense`.

  • metric (str or callable, default is {'euclidean'}) – The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by sklearn.metrics.pairwise.pairwise_distances(). If X is the distance array itself, use “precomputed” as the metric. Precomputed distance matrices must have 0 along the diagonal.

  • method (str, optional, default is {'complete'}) – The linkage algorithm to use. See the Linkage Methods section below for full descriptions in watex.utils.exmath.linkage_matrix()

  • labels (ndarray, optional) – By default, labels is None so the index of the original observation is used to label the leaf nodes. Otherwise, this is an \(n\)-sized sequence, with n == Z.shape[0] + 1. The labels[i] value is the text to put under the \(i\) th leaf node only if it corresponds to an original observation and not a non-singleton cluster.

  • return_r (bool, default='False',) – return r-dictionnary if set to ‘True’ otherwise returns nothing

  • verbose (int, bool, default='False') – If True, output message of the name of categorical features dropped.

  • kwd (dict) – additional keywords arguments passes to scipy.cluster.hierarchy.dendrogram()

Returns

r – A dictionary of data structures computed to render the dendrogram. Its has the following keys:

'color_list'

A list of color names. The k’th element represents the color of the k’th link.

'icoord' and 'dcoord'

Each of them is a list of lists. Let icoord = [I1, I2, ..., Ip] where Ik = [xk1, xk2, xk3, xk4] and dcoord = [D1, D2, ..., Dp] where Dk = [yk1, yk2, yk3, yk4], then the k’th link painted is (xk1, yk1) - (xk2, yk2) - (xk3, yk3) - (xk4, yk4).

'ivl'

A list of labels corresponding to the leaf nodes.

'leaves'

For each i, H[i] == j, cluster node j appears in position i in the left-to-right traversal of the leaves, where \(j < 2n-1\) and \(i < n\). If j is less than n, the i-th leaf node corresponds to an original observation. Otherwise, it corresponds to a non-singleton cluster.

'leaves_color_list'

A list of color names. The k’th element represents the color of the k’th leaf.

Return type

dict

Examples

>>> from watex.datasets import load_iris
>>> from watex.view import plotDendrogram
>>> data = load_iris ()
>>> X =data.data[:, :2]
>>> plotDendrogram (X, columns =['X1', 'X2' ] )