일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 |
- data distribution
- break
- machine learning
- AS
- line color
- PROJECT
- matplotlib
- continue
- Else
- SQL
- self parameter
- matplotlib.pyplot
- variables
- Text Analytics
- error
- __init__
- line width
- Default X points
- PANDAS
- iterates
- start exercise
- Python
- For loops
- Text mining
- train/test
- MySQL
- pie charts
- Github
- multiple lines
- polynomial regression
- Today
- Total
Data Science Explorer
Machine Learning: Train/Test 본문
To measure if the model is good or not, we can use a method called Train/Test.
- Train/Test
It is for measuring the accuracy of the model, and it is called train/test because you separate the data set into two: training and testing set.
Train the modeal means create the model and test the model means that the accuracy of the model.
80 % for training and 20 % for testing.
Example
Our data set illustrates 100 customers in a shop, and their shopping habits.
import numpy
import matplotlib.pyplot as plt
numpy.random.seed(2)
x = numpy.random.normal(1, 1, 100)
y = numpy.random.normal(50, 40, 100) / x
plt.scatter(x, y)
plt.show()
Result
The x axis represents the number of minutes before making a purchase.
The y axis represents the amount of money spent on the purchase.
- Split Into Train/Test
The training set should be a random selection of 80 percent of the original data.
The testing set should be the remaining.
train_x = x[:80]
train_y = y[:80]
test_x = x[80:]
test_y = y[80:]
Display the training set
plt.scatter(train_x, train_y)
plt.show()
Result
Display the testing set
plt.scatter(train_x, train_y)
plt.show()
Result
- R2
The sklearn module has a method called r2_score() that will help us fin this relationship.
In this case we would like to measure the relationship between the minutes a customer stays in the shop and how much money they spend.
import numpy
from sklearn.metrics import r2_score
numpy.random.seed(2)
x = numpy.random.normal(1, 1, 100)
y = numpy.random.normal(50, 40, 100) / x
train_x = x[:80]
train_y = y[:80]
test_x = x[80:]
test_y = y[80:]
mymodel = numpy.poly1d(numpy.polyfit(train_x, train_y, 4))
r2 = r2_score(train_y, mymodel(train_x))
0.035400891945391755
- Bring in the testing set
import numpy
from sklearn.metrics import r2_score
numpy.random.seed(2)
x = numpy.random.normal(1, 1, 100)
y = numpy.random.normal(50, 40, 100) / x
train_x = x[:80]
train_y = y[:80]
test_x = x[80:]
test_y = y[80:]
mymodel = numpy.poly1d(numpy.polyfit(train_x, train_y, 4))
r2 = r2_score(test_y, mymodel(test_x))
print(r2)
-3.0082394488149955
- Predict Values
Example
How much money will a buying customer spend, if she or he stays in the shop for 5 minutes?
print(mymodel(5))
'Machine Learning' 카테고리의 다른 글
Machine Learning : Polynomial Regression (0) | 2023.11.17 |
---|---|
Machine Learning: Linear Regression (0) | 2023.11.15 |
Machine Learning: Percentiles (0) | 2023.11.14 |
Machine Learning : Standard Deviation (0) | 2023.11.13 |
Mean Median Mode (0) | 2023.11.12 |