v Stemming Words - Machine Learning

Stemming Words

Authors: Chris Albon

Preliminaries

# Load library
from nltk.stem.porter import PorterStemmer

Create Text Data

# Create word tokens
tokenized_words = ['i', 'am', 'humbled', 'by', 'this', 'traditional', 'meeting']

Stem Words

Stemming reduces a word to its stem by identifying and removing affixes (e.g. gerunds) while keeping the root meaning of the word. NLTK's PorterStemmer implements the widely used Porter stemming algorithm.

# Create stemmer
porter = PorterStemmer()

# Apply stemmer
[porter.stem(word) for word in tokenized_words]
['i', 'am', 'humbl', 'by', 'thi', 'tradit', 'meet']