This document outlines linear regression, which is a machine learning technique for predicting real-valued outputs based on numerical input variables. It assumes a linear relationship between the inputs and outputs. Linear regression finds the linear equation that best fits the training data by minimizing a sum of squared errors function. The parameters of the linear equation can be estimated analytically through differentiation and solving for when the partial derivatives are equal to zero.
1. Christof Monz
Informatics Institute
University of Amsterdam
Data Mining
Week 1: Linear Regression
Outline
Christof Monz
Data Mining - Week 1: Linear Regression
1
Plotting real-valued predictions
Linear regression
Error function
2. Linear Regression
Christof Monz
Data Mining - Week 1: Linear Regression
2
Predict real-values (as opposed to discrete
classes)
Simple machine learning prediction task
Assumes linear correlation between data and
target values
Scatter Plots
Christof Monz
Data Mining - Week 1: Linear Regression
3
10 15 20 25 30 35 40 45
10152025303540
x
y
3. Linear Regression
Christof Monz
Data Mining - Week 1: Linear Regression
4
Find the line that approximates the data as
closely as possible
ˆy = a +b ·x
where b is the slope, and a is the y-intercept
a and b should be chosen such that they
minimize the difference between the predicted
values and the values in the training data
Error Functions
Christof Monz
Data Mining - Week 1: Linear Regression
5
There are a number of ways to define an error
function
Sum of absolute errors = ∑
i∈D
|yi −(a +bxi)|
Sum of squared errors = ∑
i∈D
(yi −(a +bxi))2
where yi is the true value
Squared error is most commonly used
Task: Find the parameters a and b that
minimize the squared error over the training
data
4. Error Functions
Christof Monz
Data Mining - Week 1: Linear Regression
6
Normalized error functions:
Mean squared error = ∑
i∈D
(yi −(a+bxi ))2
|D|
Relative squared error = ∑i∈D(yi −(a+bxi ))2
∑i∈D(yi −¯y)2
where ¯y = 1
|D| ∑i∈D yi
Root relative squared error = ∑i∈D(yi −(a+bxi ))2
∑i∈D(yi −¯y)2
Minimizing Error Functions
Christof Monz
Data Mining - Week 1: Linear Regression
7
There are roughly two ways:
• Try different parameter instantiations and see which
ones lead to the lowest error (search)
• Solve mathematically (closed form)
Most parameter estimation problems in machine
learning can only be solved by searching
For linear regression, we can solve it
mathematically
5. Minimizing SSE
Christof Monz
Data Mining - Week 1: Linear Regression
8
SSE = ∑
i∈D
(yi −(a +bxi))2
Take the partial derivatives with respect to a
and b
Set each partial derivative equal to zero and
solve for a and b respectively
The resulting values for a and b minimize the
error rate and can be used to predict unseen
data instances
Applying Linear Regression
Christof Monz
Data Mining - Week 1: Linear Regression
9
For a given training set we first compute b:
b =
|D|∑i∈D xi yi −∑i∈D xi ∑i∈D yi
|D|∑i∈D x2
i −(∑i∈D xi )2
and then a, using the value computed for b:
a = ¯y −b¯x
For any new instances x (i.e. instances that
were not in the training set), the predicted value
is: a +bx
Extendible to multi-valued functions
6. Linear Regression
Christof Monz
Data Mining - Week 1: Linear Regression
10
Used to predict real-number values, given
numerical input variables
Parameters can be estimated analytically (i.e.
by applying some mathematics), which won’t be
the case for most parameter estimation
algorithms we’ll see later on
Extendible to non-linear functions, e.g.
log-linear regression
Correlation
Christof Monz
Data Mining - Week 1: Linear Regression
11
So far we have used linear regression to predict
target values (prediction)
Linear regression can also be used to determine
how closely to variables are correlated
(description)
The smaller the error rate, the stronger the
correlation between the variables
Correlation does mean that there is some
(interesting relation) between variables (not
necessarily causal)