SlideShare una empresa de Scribd logo
1 de 7
Descargar para leer sin conexión
Christof Monz
Informatics Institute
University of Amsterdam
Data Mining
Week 1: Linear Regression
Outline
Christof Monz
Data Mining - Week 1: Linear Regression
1
Plotting real-valued predictions
Linear regression
Error function
Linear Regression
Christof Monz
Data Mining - Week 1: Linear Regression
2
Predict real-values (as opposed to discrete
classes)
Simple machine learning prediction task
Assumes linear correlation between data and
target values
Scatter Plots
Christof Monz
Data Mining - Week 1: Linear Regression
3
10 15 20 25 30 35 40 45
10152025303540
x
y
Linear Regression
Christof Monz
Data Mining - Week 1: Linear Regression
4
Find the line that approximates the data as
closely as possible
ˆy = a +b ·x
where b is the slope, and a is the y-intercept
a and b should be chosen such that they
minimize the difference between the predicted
values and the values in the training data
Error Functions
Christof Monz
Data Mining - Week 1: Linear Regression
5
There are a number of ways to define an error
function
Sum of absolute errors = ∑
i∈D
|yi −(a +bxi)|
Sum of squared errors = ∑
i∈D
(yi −(a +bxi))2
where yi is the true value
Squared error is most commonly used
Task: Find the parameters a and b that
minimize the squared error over the training
data
Error Functions
Christof Monz
Data Mining - Week 1: Linear Regression
6
Normalized error functions:
Mean squared error = ∑
i∈D
(yi −(a+bxi ))2
|D|
Relative squared error = ∑i∈D(yi −(a+bxi ))2
∑i∈D(yi −¯y)2
where ¯y = 1
|D| ∑i∈D yi
Root relative squared error = ∑i∈D(yi −(a+bxi ))2
∑i∈D(yi −¯y)2
Minimizing Error Functions
Christof Monz
Data Mining - Week 1: Linear Regression
7
There are roughly two ways:
• Try different parameter instantiations and see which
ones lead to the lowest error (search)
• Solve mathematically (closed form)
Most parameter estimation problems in machine
learning can only be solved by searching
For linear regression, we can solve it
mathematically
Minimizing SSE
Christof Monz
Data Mining - Week 1: Linear Regression
8
SSE = ∑
i∈D
(yi −(a +bxi))2
Take the partial derivatives with respect to a
and b
Set each partial derivative equal to zero and
solve for a and b respectively
The resulting values for a and b minimize the
error rate and can be used to predict unseen
data instances
Applying Linear Regression
Christof Monz
Data Mining - Week 1: Linear Regression
9
For a given training set we first compute b:
b =
|D|∑i∈D xi yi −∑i∈D xi ∑i∈D yi
|D|∑i∈D x2
i −(∑i∈D xi )2
and then a, using the value computed for b:
a = ¯y −b¯x
For any new instances x (i.e. instances that
were not in the training set), the predicted value
is: a +bx
Extendible to multi-valued functions
Linear Regression
Christof Monz
Data Mining - Week 1: Linear Regression
10
Used to predict real-number values, given
numerical input variables
Parameters can be estimated analytically (i.e.
by applying some mathematics), which won’t be
the case for most parameter estimation
algorithms we’ll see later on
Extendible to non-linear functions, e.g.
log-linear regression
Correlation
Christof Monz
Data Mining - Week 1: Linear Regression
11
So far we have used linear regression to predict
target values (prediction)
Linear regression can also be used to determine
how closely to variables are correlated
(description)
The smaller the error rate, the stronger the
correlation between the variables
Correlation does mean that there is some
(interesting relation) between variables (not
necessarily causal)
Recap
Christof Monz
Data Mining - Week 1: Linear Regression
12
Linear regression
Error rates
Analytical parameter estimation

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

AP Calculus January 5, 2009
AP Calculus January 5, 2009AP Calculus January 5, 2009
AP Calculus January 5, 2009
 
Alg2 Notes Unit 1 Day 5
Alg2 Notes Unit 1 Day 5Alg2 Notes Unit 1 Day 5
Alg2 Notes Unit 1 Day 5
 
Examen du seconde semestre g8
Examen du seconde semestre g8Examen du seconde semestre g8
Examen du seconde semestre g8
 
AP Calculus Slides December 10, 2007
AP Calculus Slides December 10, 2007AP Calculus Slides December 10, 2007
AP Calculus Slides December 10, 2007
 
Abstract PDF
Abstract PDFAbstract PDF
Abstract PDF
 
Activity 2
Activity 2Activity 2
Activity 2
 
Activity 02
Activity 02Activity 02
Activity 02
 
Math hssc-ii-a1
Math hssc-ii-a1Math hssc-ii-a1
Math hssc-ii-a1
 
130701 04-01-2013
130701 04-01-2013130701 04-01-2013
130701 04-01-2013
 
Subtractor (1)
Subtractor (1)Subtractor (1)
Subtractor (1)
 
Module 12 topic 1 notes
Module 12 topic 1 notesModule 12 topic 1 notes
Module 12 topic 1 notes
 
4.5 graph using slope int form - day 2
4.5 graph using slope int form - day 24.5 graph using slope int form - day 2
4.5 graph using slope int form - day 2
 
Subtractor
SubtractorSubtractor
Subtractor
 
Matrices, Arrays and Vectors in MATLAB
Matrices, Arrays and Vectors in MATLABMatrices, Arrays and Vectors in MATLAB
Matrices, Arrays and Vectors in MATLAB
 
Examplelf flowchart
Examplelf flowchartExamplelf flowchart
Examplelf flowchart
 
Funções 2
Funções 2Funções 2
Funções 2
 
Chirantan (java)
Chirantan   (java)Chirantan   (java)
Chirantan (java)
 
8 6 Notes
8 6 Notes8 6 Notes
8 6 Notes
 
Implementation
ImplementationImplementation
Implementation
 
Day 3 Angles In Polygons
Day 3 Angles In PolygonsDay 3 Angles In Polygons
Day 3 Angles In Polygons
 

Similar a UM Amsterdam Linear Regression Week 1

Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear RegressionAndrew Ferlitsch
 
Regression: A skin-deep dive
Regression: A skin-deep diveRegression: A skin-deep dive
Regression: A skin-deep diveabulyomon
 
Dm week01 prob-refresher.handout
Dm week01 prob-refresher.handoutDm week01 prob-refresher.handout
Dm week01 prob-refresher.handoutokeee
 
Unit One - Solved problems on error analysis .ppt
Unit One - Solved problems on error analysis .pptUnit One - Solved problems on error analysis .ppt
Unit One - Solved problems on error analysis .pptashugizaw1506
 
optimal subsampling
optimal subsamplingoptimal subsampling
optimal subsamplingTian Tian
 
Data flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into FlinkData flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into FlinkMikio L. Braun
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
 
ISI MSQE Entrance Question Paper (2006)
ISI MSQE Entrance Question Paper (2006)ISI MSQE Entrance Question Paper (2006)
ISI MSQE Entrance Question Paper (2006)CrackDSE
 
SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015Stefan Kühn
 
Dm part03 neural-networks-handout
Dm part03 neural-networks-handoutDm part03 neural-networks-handout
Dm part03 neural-networks-handoutokeee
 

Similar a UM Amsterdam Linear Regression Week 1 (20)

Talk iccf 19_ben_hammouda
Talk iccf 19_ben_hammoudaTalk iccf 19_ben_hammouda
Talk iccf 19_ben_hammouda
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Regression
RegressionRegression
Regression
 
Optimization tutorial
Optimization tutorialOptimization tutorial
Optimization tutorial
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
 
1
11
1
 
Regression: A skin-deep dive
Regression: A skin-deep diveRegression: A skin-deep dive
Regression: A skin-deep dive
 
Dynamic pgmming
Dynamic pgmmingDynamic pgmming
Dynamic pgmming
 
3ml.pdf
3ml.pdf3ml.pdf
3ml.pdf
 
Dm week01 prob-refresher.handout
Dm week01 prob-refresher.handoutDm week01 prob-refresher.handout
Dm week01 prob-refresher.handout
 
Unit One - Solved problems on error analysis .ppt
Unit One - Solved problems on error analysis .pptUnit One - Solved problems on error analysis .ppt
Unit One - Solved problems on error analysis .ppt
 
optimal subsampling
optimal subsamplingoptimal subsampling
optimal subsampling
 
Data flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into FlinkData flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into Flink
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
Dynamicpgmming
DynamicpgmmingDynamicpgmming
Dynamicpgmming
 
ISI MSQE Entrance Question Paper (2006)
ISI MSQE Entrance Question Paper (2006)ISI MSQE Entrance Question Paper (2006)
ISI MSQE Entrance Question Paper (2006)
 
SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015
 
Network Security CS3-4
Network Security CS3-4 Network Security CS3-4
Network Security CS3-4
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Dm part03 neural-networks-handout
Dm part03 neural-networks-handoutDm part03 neural-networks-handout
Dm part03 neural-networks-handout
 

Más de okeee

Week02 answer
Week02 answerWeek02 answer
Week02 answerokeee
 
Dm uitwerkingen wc4
Dm uitwerkingen wc4Dm uitwerkingen wc4
Dm uitwerkingen wc4okeee
 
Dm uitwerkingen wc2
Dm uitwerkingen wc2Dm uitwerkingen wc2
Dm uitwerkingen wc2okeee
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1okeee
 
Dm uitwerkingen wc3
Dm uitwerkingen wc3Dm uitwerkingen wc3
Dm uitwerkingen wc3okeee
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1okeee
 
Dm part03 neural-networks-homework
Dm part03 neural-networks-homeworkDm part03 neural-networks-homework
Dm part03 neural-networks-homeworkokeee
 
10[1].1.1.115.9508
10[1].1.1.115.950810[1].1.1.115.9508
10[1].1.1.115.9508okeee
 
Hcm p137 hilliges
Hcm p137 hilligesHcm p137 hilliges
Hcm p137 hilligesokeee
 
Prob18
Prob18Prob18
Prob18okeee
 
Overfit10
Overfit10Overfit10
Overfit10okeee
 
Decision tree.10.11
Decision tree.10.11Decision tree.10.11
Decision tree.10.11okeee
 
Dm week02 decision-trees-handout
Dm week02 decision-trees-handoutDm week02 decision-trees-handout
Dm week02 decision-trees-handoutokeee
 
Dm week01 intro.handout
Dm week01 intro.handoutDm week01 intro.handout
Dm week01 intro.handoutokeee
 
Dm week01 homework(1)
Dm week01 homework(1)Dm week01 homework(1)
Dm week01 homework(1)okeee
 
Chapter7 huizing
Chapter7 huizingChapter7 huizing
Chapter7 huizingokeee
 
Chapter8 choo
Chapter8 chooChapter8 choo
Chapter8 choookeee
 
Chapter6 huizing
Chapter6 huizingChapter6 huizing
Chapter6 huizingokeee
 
Kbms text-image
Kbms text-imageKbms text-image
Kbms text-imageokeee
 
Kbms audio
Kbms audioKbms audio
Kbms audiookeee
 

Más de okeee (20)

Week02 answer
Week02 answerWeek02 answer
Week02 answer
 
Dm uitwerkingen wc4
Dm uitwerkingen wc4Dm uitwerkingen wc4
Dm uitwerkingen wc4
 
Dm uitwerkingen wc2
Dm uitwerkingen wc2Dm uitwerkingen wc2
Dm uitwerkingen wc2
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1
 
Dm uitwerkingen wc3
Dm uitwerkingen wc3Dm uitwerkingen wc3
Dm uitwerkingen wc3
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1
 
Dm part03 neural-networks-homework
Dm part03 neural-networks-homeworkDm part03 neural-networks-homework
Dm part03 neural-networks-homework
 
10[1].1.1.115.9508
10[1].1.1.115.950810[1].1.1.115.9508
10[1].1.1.115.9508
 
Hcm p137 hilliges
Hcm p137 hilligesHcm p137 hilliges
Hcm p137 hilliges
 
Prob18
Prob18Prob18
Prob18
 
Overfit10
Overfit10Overfit10
Overfit10
 
Decision tree.10.11
Decision tree.10.11Decision tree.10.11
Decision tree.10.11
 
Dm week02 decision-trees-handout
Dm week02 decision-trees-handoutDm week02 decision-trees-handout
Dm week02 decision-trees-handout
 
Dm week01 intro.handout
Dm week01 intro.handoutDm week01 intro.handout
Dm week01 intro.handout
 
Dm week01 homework(1)
Dm week01 homework(1)Dm week01 homework(1)
Dm week01 homework(1)
 
Chapter7 huizing
Chapter7 huizingChapter7 huizing
Chapter7 huizing
 
Chapter8 choo
Chapter8 chooChapter8 choo
Chapter8 choo
 
Chapter6 huizing
Chapter6 huizingChapter6 huizing
Chapter6 huizing
 
Kbms text-image
Kbms text-imageKbms text-image
Kbms text-image
 
Kbms audio
Kbms audioKbms audio
Kbms audio
 

Último

DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptxJonalynLegaspi2
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 

Último (20)

Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptx
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 

UM Amsterdam Linear Regression Week 1

  • 1. Christof Monz Informatics Institute University of Amsterdam Data Mining Week 1: Linear Regression Outline Christof Monz Data Mining - Week 1: Linear Regression 1 Plotting real-valued predictions Linear regression Error function
  • 2. Linear Regression Christof Monz Data Mining - Week 1: Linear Regression 2 Predict real-values (as opposed to discrete classes) Simple machine learning prediction task Assumes linear correlation between data and target values Scatter Plots Christof Monz Data Mining - Week 1: Linear Regression 3 10 15 20 25 30 35 40 45 10152025303540 x y
  • 3. Linear Regression Christof Monz Data Mining - Week 1: Linear Regression 4 Find the line that approximates the data as closely as possible ˆy = a +b ·x where b is the slope, and a is the y-intercept a and b should be chosen such that they minimize the difference between the predicted values and the values in the training data Error Functions Christof Monz Data Mining - Week 1: Linear Regression 5 There are a number of ways to define an error function Sum of absolute errors = ∑ i∈D |yi −(a +bxi)| Sum of squared errors = ∑ i∈D (yi −(a +bxi))2 where yi is the true value Squared error is most commonly used Task: Find the parameters a and b that minimize the squared error over the training data
  • 4. Error Functions Christof Monz Data Mining - Week 1: Linear Regression 6 Normalized error functions: Mean squared error = ∑ i∈D (yi −(a+bxi ))2 |D| Relative squared error = ∑i∈D(yi −(a+bxi ))2 ∑i∈D(yi −¯y)2 where ¯y = 1 |D| ∑i∈D yi Root relative squared error = ∑i∈D(yi −(a+bxi ))2 ∑i∈D(yi −¯y)2 Minimizing Error Functions Christof Monz Data Mining - Week 1: Linear Regression 7 There are roughly two ways: • Try different parameter instantiations and see which ones lead to the lowest error (search) • Solve mathematically (closed form) Most parameter estimation problems in machine learning can only be solved by searching For linear regression, we can solve it mathematically
  • 5. Minimizing SSE Christof Monz Data Mining - Week 1: Linear Regression 8 SSE = ∑ i∈D (yi −(a +bxi))2 Take the partial derivatives with respect to a and b Set each partial derivative equal to zero and solve for a and b respectively The resulting values for a and b minimize the error rate and can be used to predict unseen data instances Applying Linear Regression Christof Monz Data Mining - Week 1: Linear Regression 9 For a given training set we first compute b: b = |D|∑i∈D xi yi −∑i∈D xi ∑i∈D yi |D|∑i∈D x2 i −(∑i∈D xi )2 and then a, using the value computed for b: a = ¯y −b¯x For any new instances x (i.e. instances that were not in the training set), the predicted value is: a +bx Extendible to multi-valued functions
  • 6. Linear Regression Christof Monz Data Mining - Week 1: Linear Regression 10 Used to predict real-number values, given numerical input variables Parameters can be estimated analytically (i.e. by applying some mathematics), which won’t be the case for most parameter estimation algorithms we’ll see later on Extendible to non-linear functions, e.g. log-linear regression Correlation Christof Monz Data Mining - Week 1: Linear Regression 11 So far we have used linear regression to predict target values (prediction) Linear regression can also be used to determine how closely to variables are correlated (description) The smaller the error rate, the stronger the correlation between the variables Correlation does mean that there is some (interesting relation) between variables (not necessarily causal)
  • 7. Recap Christof Monz Data Mining - Week 1: Linear Regression 12 Linear regression Error rates Analytical parameter estimation