# Handling Imbalanced Classes In Logistic Regression

Like many other learning algorithms in scikit-learn, `LogisticRegression`

comes with a built-in method of handling imbalanced classes. If we have highly imbalanced classes and have no addressed it during preprocessing, we have the option of using the `class_weight`

parameter to weight the classes to make certain we have a balanced mix of each class. Specifically, the `balanced`

argument will automatically weigh classes inversely proportional to their frequency:

$$w_j = \frac{n}{kn_{j}}$$

where \(w_j\) is the weight to class \(j\), \(n\) is the number of observations, \(n_j\) is the number of observations in class \(j\), and \(k\) is the total number of classes.

## Preliminaries

# Load libraries from sklearn.linear_model import LogisticRegression from sklearn import datasets from sklearn.preprocessing import StandardScaler import numpy as np

## Load Iris Flower Dataset

# Load data iris = datasets.load_iris() X = iris.data y = iris.target

## Make Classes Imbalanced

# Make class highly imbalanced by removing first 40 observations X = X[40:,:] y = y[40:] # Create target vector indicating if class 0, otherwise 1 y = np.where((y == 0), 0, 1)

## Standardize Features

# Standarize features scaler = StandardScaler() X_std = scaler.fit_transform(X)

## Train A Logistic Regression With Weighted Classes

# Create decision tree classifer object clf = LogisticRegression(random_state=0, class_weight='balanced') # Train model model = clf.fit(X_std, y)