Loading scikit-learn's Boston Housing Dataset


# Load libraries
from sklearn import datasets
import matplotlib.pyplot as plt 

Load Boston Housing Dataset

The Boston housing dataset is a famous dataset from the 1970s. It contains 506 observations on housing prices around Boston. It is often used in regression examples and contains 15 features.

# Load digits dataset
boston = datasets.load_boston()

# Create feature matrix
X = boston.data

# Create target vector
y = boston.target

# View the first observation's feature values
array([  6.32000000e-03,   1.80000000e+01,   2.31000000e+00,
         0.00000000e+00,   5.38000000e-01,   6.57500000e+00,
         6.52000000e+01,   4.09000000e+00,   1.00000000e+00,
         2.96000000e+02,   1.53000000e+01,   3.96900000e+02,

As you can see, the features are not standardized. This is more easily seen if we display the values as decimals:

# Display each feature value of the first observation as floats
['{:f}'.format(x) for x in X[0]]

Therefore, it is often beneficial and/or required to standardize the value of the features.