# Mini-Batch k-Means Clustering

Mini-batch k-means works similarly to the k-means algorithm discussed in the last recipe. Without going into too much detail, the difference is that in mini-batch k-means the most computationally costly step is conducted on only a random sample of observations as opposed to all observations. This approach can significantly reduce the time required for the algorithm to find convergence (i.e. fit the data) with only a small cost in quality.

## Preliminaries

# Load libraries from sklearn import datasets from sklearn.preprocessing import StandardScaler from sklearn.cluster import MiniBatchKMeans

## Load Iris Flower Dataset

# Load data iris = datasets.load_iris() X = iris.data

## Standardize Features

# Standarize features scaler = StandardScaler() X_std = scaler.fit_transform(X)

## Conduct k-Means Clustering

`MiniBatchKMeans`

works similarly to `KMeans`

, with one significance difference: the `batch_size`

parameter. `batch_size`

controls the number of randomly selected observations in each batch. The larger the the size of the batch, the more computationally costly the training process.

# Create k-mean object clustering = MiniBatchKMeans(n_clusters=3, random_state=0, batch_size=100) # Train model model = clustering.fit(X_std)