Text Mining #3 Removing Stop Word

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

Data Science Explorer

Text Mining #3 Removing Stop Word 본문

Machine Learning

Text Mining #3 Removing Stop Word

grace21110 2023. 9. 5. 09:27

Stop words are common words that are filtered out or removed from text data during the preprocessing phase. Common stop words include articles (e.g., "a," "an," "the"), prepositions (e.g., "in," "on," "at"), and conjunctions (e.g., "and," "but," "or").

Example code:

1) Install nltk

pip install nltk

2) Import NLTK and Download Punkt Tokenizer Models (if not already downloaded)

import nltk

nltk.download('punkt')

3) Code to Remove Stop Words

import nltk

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize

# Sample text

text = "This is an example sentence with some stop words."

# Tokenize the text into words

words = word_tokenize(text)

# Remove stop words

stop_words = set(stopwords.words('english'))

filtered_words = [word for word in words if word.lower() not in stop_words]

# Print the filtered words

print(filtered_words)

4) Output

['This', 'example', 'sentence', 'stop', 'words', '.']

'Machine Learning' 카테고리의 다른 글

Machine Learning : Standard Deviation (0)	2023.11.13
Mean Median Mode (0)	2023.11.12
Text Mining #4 Stemming and Lemmatization (0)	2023.09.06
Text Mining #2 Text Normalization (0)	2023.09.05
Text Mining #1 Theory (0)	2023.09.04

'Machine Learning' Related Articles

Data Science Explorer

Text Mining #3 Removing Stop Word 본문

Text Mining #3 Removing Stop Word

'Machine Learning' 카테고리의 다른 글

티스토리툴바