v Group Observations Using K-Means Clustering - Machine Learning

Group Observations Using K-Means Clustering

Preliminaries

# Load libraries
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
import pandas as pd

Create Data

# Make simulated feature matrix
X, _ = make_blobs(n_samples = 50,
                  n_features = 2,
                  centers = 3,
                  random_state = 1)

# Create DataFrame
df = pd.DataFrame(X, columns=['feature_1','feature_2'])

Train Clusterer

# Make k-means clusterer
clusterer = KMeans(3, random_state=1)

# Fit clusterer
clusterer.fit(X)
KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=3, n_init=10, n_jobs=1, precompute_distances='auto',
    random_state=1, tol=0.0001, verbose=0)

Create Feature Based On Predicted Cluster

# Predict values
df['group'] = clusterer.predict(X)

# First few observations
df.head(5)
feature_1 feature_2 group
0 -9.877554 -3.336145 0
1 -7.287210 -8.353986 2
2 -6.943061 -7.023744 2
3 -7.440167 -8.791959 2
4 -6.641388 -8.075888 2