# Load libraries from sklearn import datasets from sklearn.preprocessing import StandardScaler from sklearn.cluster import DBSCAN
Load Iris Flower Dataset
# Load data iris = datasets.load_iris() X = iris.data
# Standarize features scaler = StandardScaler() X_std = scaler.fit_transform(X)
Conduct DBSCAN Clustering
DBSCAN has three main parameters to set:
eps: The maximum distance from an observation for another observation to be considered its neighbor.
min_samples: The minimum number of observation less than
epsdistance from an observation for to be considered a core observation.
metric: The distance metric used by
eps. For example,
euclidean, etc. (note that if Minkowski distance is used, the parameter
pcan be used to set the power of the Minkowski metric)
If we look at the clusters in our training data we can see two clusters have been identified,
1, while outlier observations are labeled
# Create meanshift object clt = DBSCAN(n_jobs=-1) # Train model model = clt.fit(X_std)