Data Science Explorer

Machine Learning: Train/Test 본문

Machine Learning

Machine Learning: Train/Test

grace21110 2023. 11. 18. 22:10
반응형

To measure if the model is good or not, we can use a method called Train/Test.

 

  • Train/Test

It is for measuring the accuracy of the model, and it is called train/test because you separate the data set into two: training and testing set.

 

Train the modeal means create the model and test the model means that the accuracy of the model.

80 % for training and 20 % for testing. 

 

Example 

Our data set illustrates 100 customers in a shop, and their shopping habits.

import numpy
import matplotlib.pyplot as plt
numpy.random.seed(2)

x = numpy.random.normal(1, 1, 100)
y = numpy.random.normal(50, 40, 100) / x

plt.scatter(x, y)
plt.show()

Result 

The x axis represents the number of minutes before making a purchase.

The y axis represents the amount of money spent on the purchase.

 

  • Split Into Train/Test

The training set should be a random selection of 80 percent of the original data.

The testing set should be the remaining. 

train_x = x[:80]
train_y = y[:80]

test_x = x[80:]
test_y = y[80:]

 

Display the training set 

plt.scatter(train_x, train_y)

plt.show()

 

Result 

Display the testing set

plt.scatter(train_x, train_y)
plt.show()

Result

  • R2

The sklearn module has a method called r2_score() that will help us fin this relationship. 

In this case we would like to measure the relationship between the minutes a customer stays in the shop and how much money they spend.

import numpy
from sklearn.metrics import r2_score
numpy.random.seed(2)

x = numpy.random.normal(1, 1, 100)
y = numpy.random.normal(50, 40, 100) / x

train_x = x[:80]
train_y = y[:80]

test_x = x[80:]
test_y = y[80:]

mymodel = numpy.poly1d(numpy.polyfit(train_x, train_y, 4))

r2 = r2_score(train_y, mymodel(train_x))
0.035400891945391755

 

  • Bring in the testing set 
import numpy
from sklearn.metrics import r2_score
numpy.random.seed(2)

x = numpy.random.normal(1, 1, 100)
y = numpy.random.normal(50, 40, 100) / x

train_x = x[:80]
train_y = y[:80]

test_x = x[80:]
test_y = y[80:]

mymodel = numpy.poly1d(numpy.polyfit(train_x, train_y, 4))

r2 = r2_score(test_y, mymodel(test_x))

print(r2)
-3.0082394488149955

 

  • Predict Values 

Example 

How much money will a buying customer spend, if she or he stays in the shop for 5 minutes?

print(mymodel(5))

 

'Machine Learning' 카테고리의 다른 글

Machine Learning : Polynomial Regression  (0) 2023.11.17
Machine Learning: Linear Regression  (0) 2023.11.15
Machine Learning: Percentiles  (0) 2023.11.14
Machine Learning : Standard Deviation  (0) 2023.11.13
Mean Median Mode  (0) 2023.11.12