Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 | 31 |
Tags
- Default X points
- matplotlib
- self parameter
- AS
- break
- Text mining
- error
- polynomial regression
- line width
- Github
- line color
- Text Analytics
- iterates
- data distribution
- __init__
- start exercise
- Else
- SQL
- train/test
- multiple lines
- PANDAS
- MySQL
- For loops
- variables
- pie charts
- Python
- matplotlib.pyplot
- continue
- machine learning
- PROJECT
Archives
- Today
- Total
Data Science Explorer
Text Mining #3 Removing Stop Word 본문
반응형
Stop words are common words that are filtered out or removed from text data during the preprocessing phase. Common stop words include articles (e.g., "a," "an," "the"), prepositions (e.g., "in," "on," "at"), and conjunctions (e.g., "and," "but," "or").
Example code:
1) Install nltk
pip install nltk
2) Import NLTK and Download Punkt Tokenizer Models (if not already downloaded)
import nltk
nltk.download('punkt')
3) Code to Remove Stop Words
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
# Sample text
text = "This is an example sentence with some stop words."
# Tokenize the text into words
words = word_tokenize(text)
# Remove stop words
stop_words = set(stopwords.words('english'))
filtered_words = [word for word in words if word.lower() not in stop_words]
# Print the filtered words
print(filtered_words)
4) Output
['This', 'example', 'sentence', 'stop', 'words', '.']
'Machine Learning' 카테고리의 다른 글
Machine Learning : Standard Deviation (0) | 2023.11.13 |
---|---|
Mean Median Mode (0) | 2023.11.12 |
Text Mining #4 Stemming and Lemmatization (0) | 2023.09.06 |
Text Mining #2 Text Normalization (0) | 2023.09.05 |
Text Mining #1 Theory (0) | 2023.09.04 |