# Calibrate Predicted Probabilities

Class probabilities are a common and useful part of machine learning models. In scikit-learn, most learning algortihms allow us to see the predicted probabilities of class membership using `predict_proba`

. This can be extremely useful if, for instance, we want to only predict a certain class if the model predicts the probability that they are that class is over 90%. However, some models, including naive Bayes classifiers output probabilities that are not based on the real world. That is, `predict_proba`

might predict an observation has a 0.70 chance of being a certain class, when the reality is that it is 0.10 or 0.99. Specifically in naive Bayes, while the ranking of predicted probabilities for the different target classes is valid, the raw predicted probabilities tend to take on extreme values close to 0 and 1.

To obtain meaningful predicted probabilities we need conduct what is called calibration. In scikit-learn we can use the `CalibratedClassifierCV`

class to create well calibrated predicted probabilities using k-fold cross-validation. In `CalibratedClassifierCV`

the training sets are used to train the model and the test sets is used to calibrate the predicted probabilities. The returned predicted probabilities are the average of the k-folds.

## Preliminaries

# Load libraries from sklearn import datasets from sklearn.naive_bayes import GaussianNB from sklearn.calibration import CalibratedClassifierCV

## Load Iris Flower Dataset

# Load data iris = datasets.load_iris() X = iris.data y = iris.target

## Create Naive Bayes Classifier

# Create Gaussian Naive Bayes object clf = GaussianNB()

## Create Calibrator

# Create calibrated cross-validation with sigmoid calibration clf_sigmoid = CalibratedClassifierCV(clf, cv=2, method='sigmoid')

## Create Classifier With Calibrated Probabilities

# Calibrate probabilities clf_sigmoid.fit(X, y)

CalibratedClassifierCV(base_estimator=GaussianNB(priors=None), cv=2, method='sigmoid')

## Create Previously Unseen Observation

# Create new observation new_observation = [[ 2.6, 2.6, 2.6, 0.4]]

## View Calibrated Probabilities

# View calibrated probabilities clf_sigmoid.predict_proba(new_observation)

array([[ 0.31859969, 0.63663466, 0.04476565]])