# Selecting The Best Number Of Components For LDA

In scikit-learn, LDA is implemented using `LinearDiscriminantAnalysis`

includes a parameter, `n_components`

indicating the number of features we want returned. To figure out what argument value to use with `n_components`

(e.g. how many parameters to keep), we can take advantage of the fact that `explained_variance_ratio_`

tells us the variance explained by each outputted feature and is a sorted array.

Specifically, we can run `LinearDiscriminantAnalysis`

with `n_components`

set to `None`

to return ratio of variance explained by every component feature, then calculate how many components are required to get above some threshold of variance explained (often 0.95 or 0.99).

## Preliminaries

# Load libraries from sklearn import datasets from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

## Load Iris Data

# Load the Iris flower dataset: iris = datasets.load_iris() X = iris.data y = iris.target

## Run Linear Discriminant Analysis

# Create and run an LDA lda = LinearDiscriminantAnalysis(n_components=None) X_lda = lda.fit(X, y)

## Create List Of Explained Variances

# Create array of explained variance ratios lda_var_ratios = lda.explained_variance_ratio_

## Create Function Calculating Number Of Components Required To Pass Threshold

# Create a function def select_n_components(var_ratio, goal_var: float) -> int: # Set initial variance explained so far total_variance = 0.0 # Set initial number of features n_components = 0 # For the explained variance of each feature: for explained_variance in var_ratio: # Add the explained variance to the total total_variance += explained_variance # Add one to the number of components n_components += 1 # If we reach our goal level of explained variance if total_variance >= goal_var: # End the loop break # Return the number of components return n_components

## Run Function

# Run function select_n_components(lda_var_ratios, 0.95)

1