watex.transformers.KMeansFeaturizer#

class watex.transformers.KMeansFeaturizer(n_clusters=7, target_scale=5.0, random_state=None, n_components=None)[source]#

Transforms numeric data into k-means cluster memberships.

This transformer runs k-means on the input data and converts each data point into the ID of the closest cluster. If a target variable is present, it is scaled and included as input to k-means in order to derive clusters that obey the classification boundary as well as group similar points together.

Parameters:
  • n_clusters (int, default=7) – Number of initial clusters

  • target_scale (float, default=5.0) – Apply appropriate scaling and include it in the input data to k-means.

  • n_components (int, optional) – Number of components for reducted down the predictor. It uses the PCA to reduce down dimension to the importance components.

  • random_state (int, Optional) – State for shuffling the data

km_model#
Type:

KMeans featurization model used to transform

Examples

>>> # (1) Use a common dataset
>>> import matplotlib.pyplot as plt
>>> from sklearn.datasets import make_moons
>>> from watex.utils.plotutils import plot_voronoi
>>> from watex.datasets import load_mxs
>>> X, y = make_moons(n_samples=5000, noise=0.2)
>>> kmf_hint = KMeansFeaturizer(n_clusters=50, target_scale=10).fit(X,y)
>>> kmf_no_hint = KMeansFeaturizer(n_clusters=50, target_scale=0).fit(X, y)
>>> fig, ax = plt.subplots(2,1, figsize =(7, 7))
>>> plot_voronoi ( X, y ,cluster_centers=kmf_hint.cluster_centers_,
                  fig_title ='KMeans with hint', ax = ax [0] )
>>> plot_voronoi ( X, y ,cluster_centers=kmf_no_hint.cluster_centers_,
                  fig_title ='KMeans No hint' , ax = ax[1])
<AxesSubplot:title={'center':'KMeans No hint'}>
>>> # (2)  Use a concrete data set
>>> X, y = load_mxs ( return_X_y =True, key ='numeric' )
>>> # get the most principal components
>>> from watex.analysis import nPCA
>>> Xpca =nPCA (X, n_components = 2  ) # veronoi plot expect two dimensional data
>>> kmf_hint = KMeansFeaturizer(n_clusters=7, target_scale=10).fit(Xpca,y)
>>> kmf_no_hint = KMeansFeaturizer(n_clusters=7, target_scale=0).fit(Xpca, y)
>>> fig, ax = plt.subplots(2,1, figsize =(7, 7))
>>> plot_voronoi ( Xpca, y ,cluster_centers=kmf_hint.cluster_centers_,
                  fig_title ='KMeans with hint', ax = ax [0] )
>>> plot_voronoi ( Xpca, y ,cluster_centers=kmf_no_hint.cluster_centers_,
                  fig_title ='KMeans No hint' , ax = ax[1])