Text Mining #4 Stemming and Lemmatization

Machine Learning

Text Mining #4 Stemming and Lemmatization

grace21110 2023. 9. 6. 08:00

Stemming and Lemmatization are for grammatically or semantically changing archetypes of words.

In Stemming, there is a tendency to extract some misspelled root words from the original word by applying general methods or by applying more simplified methods when converting them into circular words. However, Lemmatization finds root words in correct spelling considering grammatical elements such as part-time speech and more meaningful parts.

Let's compare Stemming and Lemmatization. Stemming finds the prototype more simply as the verb changes according to progressive, third-person singular, past tense, and adjective changes according to comparison and superlative.

# Stemming Example code:

import nltk

from nltk.stem import PorterStemmer

# Initialize the Porter Stemmer

stemmer = PorterStemmer()

# Example words for stemming

words = ["running", "flies", "happily", "generously"]

# Stem the words

stemmed_words = [stemmer.stem(word) for word in words]

# Print the stemmed words

print(stemmed_words)

Output:

['run', 'fli', 'happili', 'gener']

Lemmatization needs to input the part of the word for more accurate circular word extraction.

# Lemmatization Example code:

import nltk

from nltk.stem import WordNetLemmatizer

# Initialize the WordNet Lemmatizer

lemmatizer = WordNetLemmatizer()

# Example words for lemmatization

words = ["running", "flies", "happily", "generously"]

# Lemmatize the words

lemmatized_words = [lemmatizer.lemmatize(word, pos='v') for word in words]

# Print the lemmatized words

print(lemmatized_words)

Output:

['run', 'fly', 'happily', 'generously']