watex.utils.linkage_matrix#
- watex.utils.linkage_matrix(df, columns=None, kind='design', metric='euclidean', method='complete', as_frame=False, optimal_ordering=False)[source]#
Compute the distance matrix from the hierachical clustering algorithm
- Parameters
df (dataframe or NDArray of (n_samples, n_features)) – dataframe of Ndarray. If array is given , must specify the column names to much the array shape 1
columns (list) – list of labels to name each columns of arrays of (n_samples, n_features) If dataframe is given, don’t need to specify the columns.
kind (str, ['squareform'|'condense'|'design'], default is {'design'}) – kind of approach to summing up the linkage matrix. Indeed, a condensed distance matrix is a flat array containing the upper triangular of the distance matrix. This is the form that
pdistreturns. Alternatively, a collection of \(m\) observation vectors in \(n\) dimensions may be passed as an \(m\) by \(n\) array. All elements of the condensed distance matrix must be finite, i.e., no NaNs or infs. Alternatively, we could used thesquareformdistance matrix to yield different distance values than expected. thedesignapproach uses the complete inpout example matrix also called ‘design matrix’ to lead correct linkage matrix similar to squareform and condense`.metric (str or callable, default is {'euclidean'}) – The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by
sklearn.metrics.pairwise.pairwise_distances(). IfXis the distance array itself, use “precomputed” as the metric. Precomputed distance matrices must have 0 along the diagonal.method (str, optional, default is {'complete'}) – The linkage algorithm to use. See the
Linkage Methodssection below for full descriptions.optimal_ordering (bool, optional) – If True, the linkage matrix will be reordered so that the distance between successive leaves is minimal. This results in a more intuitive tree structure when the data are visualized. defaults to False, because this algorithm can be slow, particularly on large datasets. See also
scipy.cluster.hierarchy.linkage().
- Returns
row_clusters – consist of several rows where each rw represents one merge. The first and second columns denotes the most dissimilar members of each cluster and the third columns reports the distance between those members
- Return type
linkage matrix