UM Amsterdam Linear Regression Week 1

Christof Monz
Informatics Institute
University of Amsterdam
Data Mining
Week 1: Linear Regression
Outline
Christof Monz
Data Mining - Week 1: Linear Regression
1
Plotting real-valued predictions
Linear regression
Error function

Linear Regression
Christof Monz
2
Predict real-values (as opposed to discrete
classes)
Simple machine learning prediction task
Assumes linear correlation between data and
target values
Scatter Plots
Christof Monz
3
10 15 20 25 30 35 40 45
10152025303540
x
y

Linear Regression
Christof Monz
4
Find the line that approximates the data as
closely as possible
ˆy = a +b ·x
where b is the slope, and a is the y-intercept
a and b should be chosen such that they
minimize the diﬀerence between the predicted
values and the values in the training data
Error Functions
Christof Monz
5
There are a number of ways to deﬁne an error
function
Sum of absolute errors = ∑
i∈D
|yi −(a +bxi)|
Sum of squared errors = ∑
i∈D
(yi −(a +bxi))2
where yi is the true value
Squared error is most commonly used
Task: Find the parameters a and b that
minimize the squared error over the training
data

Error Functions
Christof Monz
6
Normalized error functions:
Mean squared error = ∑
i∈D
(yi −(a+bxi ))2
|D|
Relative squared error = ∑i∈D(yi −(a+bxi ))2
∑i∈D(yi −¯y)2
where ¯y = 1
|D| ∑i∈D yi
Root relative squared error = ∑i∈D(yi −(a+bxi ))2
∑i∈D(yi −¯y)2
Minimizing Error Functions
Christof Monz
7
There are roughly two ways:
• Try diﬀerent parameter instantiations and see which
ones lead to the lowest error (search)
• Solve mathematically (closed form)
Most parameter estimation problems in machine
learning can only be solved by searching
For linear regression, we can solve it
mathematically

Minimizing SSE
Christof Monz
8
SSE = ∑
i∈D
(yi −(a +bxi))2
Take the partial derivatives with respect to a
and b
Set each partial derivative equal to zero and
solve for a and b respectively
The resulting values for a and b minimize the
error rate and can be used to predict unseen
data instances
Applying Linear Regression
Christof Monz
9
For a given training set we ﬁrst compute b:
b =
|D|∑i∈D xi yi −∑i∈D xi ∑i∈D yi
|D|∑i∈D x2
i −(∑i∈D xi )2
and then a, using the value computed for b:
a = ¯y −b¯x
For any new instances x (i.e. instances that
were not in the training set), the predicted value
is: a +bx
Extendible to multi-valued functions

Linear Regression
Christof Monz
10
Used to predict real-number values, given
numerical input variables
Parameters can be estimated analytically (i.e.
by applying some mathematics), which won’t be
the case for most parameter estimation
algorithms we’ll see later on
Extendible to non-linear functions, e.g.
log-linear regression
Correlation
Christof Monz
11
So far we have used linear regression to predict
target values (prediction)
Linear regression can also be used to determine
how closely to variables are correlated
(description)
The smaller the error rate, the stronger the
correlation between the variables
Correlation does mean that there is some
(interesting relation) between variables (not
necessarily causal)

Recap
Christof Monz
12
Linear regression
Error rates
Analytical parameter estimation

UM Amsterdam Linear Regression Week 1

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a UM Amsterdam Linear Regression Week 1

Similar a UM Amsterdam Linear Regression Week 1 (20)

Más de okeee

Más de okeee (20)

Último

Último (20)

UM Amsterdam Linear Regression Week 1