Machine Learning: Train/Test
To measure if the model is good or not, we can use a method called Train/Test.
- Train/Test
It is for measuring the accuracy of the model, and it is called train/test because you separate the data set into two: training and testing set.
Train the modeal means create the model and test the model means that the accuracy of the model.
80 % for training and 20 % for testing.
Example
Our data set illustrates 100 customers in a shop, and their shopping habits.
import numpy
import matplotlib.pyplot as plt
numpy.random.seed(2)
x = numpy.random.normal(1, 1, 100)
y = numpy.random.normal(50, 40, 100) / x
plt.scatter(x, y)
plt.show()
Result
The x axis represents the number of minutes before making a purchase.
The y axis represents the amount of money spent on the purchase.
- Split Into Train/Test
The training set should be a random selection of 80 percent of the original data.
The testing set should be the remaining.
train_x = x[:80]
train_y = y[:80]
test_x = x[80:]
test_y = y[80:]
Display the training set
plt.scatter(train_x, train_y)
plt.show()
Result
Display the testing set
plt.scatter(train_x, train_y)
plt.show()
Result
- R2
The sklearn module has a method called r2_score() that will help us fin this relationship.
In this case we would like to measure the relationship between the minutes a customer stays in the shop and how much money they spend.
import numpy
from sklearn.metrics import r2_score
numpy.random.seed(2)
x = numpy.random.normal(1, 1, 100)
y = numpy.random.normal(50, 40, 100) / x
train_x = x[:80]
train_y = y[:80]
test_x = x[80:]
test_y = y[80:]
mymodel = numpy.poly1d(numpy.polyfit(train_x, train_y, 4))
r2 = r2_score(train_y, mymodel(train_x))
0.035400891945391755
- Bring in the testing set
import numpy
from sklearn.metrics import r2_score
numpy.random.seed(2)
x = numpy.random.normal(1, 1, 100)
y = numpy.random.normal(50, 40, 100) / x
train_x = x[:80]
train_y = y[:80]
test_x = x[80:]
test_y = y[80:]
mymodel = numpy.poly1d(numpy.polyfit(train_x, train_y, 4))
r2 = r2_score(test_y, mymodel(test_x))
print(r2)
-3.0082394488149955
- Predict Values
Example
How much money will a buying customer spend, if she or he stays in the shop for 5 minutes?
print(mymodel(5))