일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 |
- Text mining
- For loops
- error
- multiple lines
- MySQL
- data distribution
- self parameter
- continue
- pie charts
- iterates
- PROJECT
- Else
- __init__
- start exercise
- SQL
- line width
- Text Analytics
- train/test
- matplotlib.pyplot
- machine learning
- line color
- break
- AS
- PANDAS
- Python
- matplotlib
- Github
- polynomial regression
- variables
- Default X points
- Today
- Total
목록machine learning (11)
Data Science Explorer

To measure if the model is good or not, we can use a method called Train/Test. Train/Test It is for measuring the accuracy of the model, and it is called train/test because you separate the data set into two: training and testing set. Train the modeal means create the model and test the model means that the accuracy of the model. 80 % for training and 20 % for testing. Example Our data set illus..

Polynomial Regression It uses relationship between the variables x and y to find the best way to draw a line through the data points. import numpy import matplotlib.pyplot as plt x = [1,2,3,5,6,7,8,9,10,12,13,14,15,16,18,19,21,22] y = [100,90,80,60,60,55,60,65,70,70,75,76,78,79,90,99,99,100] mymodel = numpy.poly1d(numpy.polyfit(x, y, 3)) myline = numpy.linspace(1, 22, 100) plt.scatter(x, y) plt...
I was working on data cleaning and faced a struggle today. I had to duplicate each row and put it together and did not know how to deal with this. It took me a few hours to think and look up the ways to figure it out. Luckily I found the way out so I am going to show you how I did it! Step #1: You have to import the data. import pandas as pd ss = pd.read_csv('/content/총물량데이터.csv') print (ss) Ste..

Linear Regression It uses the relationship between the data-points to draw a straight line through all them. Example import matplotlib.pyplot as plt from scipy import stats x = [5,7,8,7,2,17,2,9,4,11,12,9,6] y = [99,86,87,88,111,86,103,87,94,78,77,85,86] slope, intercept, r, p, std_err = stats.linregress(x, y) def myfunc(x): return slope * x + intercept mymodel = list(map(myfunc, x)) plt.scatter..

Percentiles Percentiles are used in statistics to give you a number that describes the value that a given percent of the values are lower than. Example Use the Numpy percentile () method to find the percentiles. import numpy ages = [5,31,43,48,50,41,7,11,15,39,80,82,32,2,8,6,25,36,27,61,31] x = numpy.percentile(ages, 75) print(x) What is the 75. percentile? The answer is 43, meaning that 75 perc..

Standard Deviation It is a number that describes how spread out the values are. A low standard deviation means that most of the numbers are close to the mean(average). A high standard deviation means that values are spread out over a wider range. You can use std() to get a standard deviation value. Example import numpy speed = [11, 20, 582, 12] x = numpy.std(speed) print(x) 245.83162428784462 Va..

ModeResult(mode=11, count=1) In machine learning, there are three values. Mean: the average value Median: the mid point value Mode: the most common value Mean It is the average value, divide the sum by the number of the values Example (99+86+87+88+111+86+103+87+94+78+77+85+86) / 13 = 89.77 import numpy speed =[99,86,87,88,111,86,103,87,94,78,77,85,86] x = numpy.mean(speed) print(x) 89.7692307692..

Stemming and Lemmatization are for grammatically or semantically changing archetypes of words. In Stemming, there is a tendency to extract some misspelled root words from the original word by applying general methods or by applying more simplified methods when converting them into circular words. However, Lemmatization finds root words in correct spelling considering grammatical elements such as..

Stop words are common words that are filtered out or removed from text data during the preprocessing phase. Common stop words include articles (e.g., "a," "an," "the"), prepositions (e.g., "in," "on," "at"), and conjunctions (e.g., "and," "but," "or"). Example code: 1) Install nltk pip install nltk 2) Import NLTK and Download Punkt Tokenizer Models (if not already downloaded) import nltk nltk.do..

By following the previous lesson, we have the idea of what Text Mining is. For this writing, we are going to jump into Text Normalization. Text Nomoralization is categorized into 5 in total which are Cleansing, Tokenization, Filtering/Removing Stopword/Correcting spelling, Stemming and Lemmatization. Today, we are going to dive into Text Tokenization. There are two types of tokenization which ar..