watex.transformers.KMeansFeaturizer#
- class watex.transformers.KMeansFeaturizer(n_clusters=7, target_scale=5.0, random_state=None, n_components=None)[source]#
Transforms numeric data into k-means cluster memberships.
This transformer runs k-means on the input data and converts each data point into the ID of the closest cluster. If a target variable is present, it is scaled and included as input to k-means in order to derive clusters that obey the classification boundary as well as group similar points together.
- Parameters:
n_clusters (int, default=7) – Number of initial clusters
target_scale (float, default=5.0) – Apply appropriate scaling and include it in the input data to k-means.
n_components (int, optional) – Number of components for reducted down the predictor. It uses the PCA to reduce down dimension to the importance components.
random_state (int, Optional) – State for shuffling the data
- km_model#
- Type:
KMeans featurization model used to transform
Examples
>>> # (1) Use a common dataset >>> import matplotlib.pyplot as plt >>> from sklearn.datasets import make_moons >>> from watex.utils.plotutils import plot_voronoi >>> from watex.datasets import load_mxs >>> X, y = make_moons(n_samples=5000, noise=0.2) >>> kmf_hint = KMeansFeaturizer(n_clusters=50, target_scale=10).fit(X,y) >>> kmf_no_hint = KMeansFeaturizer(n_clusters=50, target_scale=0).fit(X, y) >>> fig, ax = plt.subplots(2,1, figsize =(7, 7)) >>> plot_voronoi ( X, y ,cluster_centers=kmf_hint.cluster_centers_, fig_title ='KMeans with hint', ax = ax [0] ) >>> plot_voronoi ( X, y ,cluster_centers=kmf_no_hint.cluster_centers_, fig_title ='KMeans No hint' , ax = ax[1]) <AxesSubplot:title={'center':'KMeans No hint'}> >>> # (2) Use a concrete data set >>> X, y = load_mxs ( return_X_y =True, key ='numeric' ) >>> # get the most principal components >>> from watex.analysis import nPCA >>> Xpca =nPCA (X, n_components = 2 ) # veronoi plot expect two dimensional data >>> kmf_hint = KMeansFeaturizer(n_clusters=7, target_scale=10).fit(Xpca,y) >>> kmf_no_hint = KMeansFeaturizer(n_clusters=7, target_scale=0).fit(Xpca, y) >>> fig, ax = plt.subplots(2,1, figsize =(7, 7)) >>> plot_voronoi ( Xpca, y ,cluster_centers=kmf_hint.cluster_centers_, fig_title ='KMeans with hint', ax = ax [0] ) >>> plot_voronoi ( Xpca, y ,cluster_centers=kmf_no_hint.cluster_centers_, fig_title ='KMeans No hint' , ax = ax[1])