SlideShare una empresa de Scribd logo
1 de 48
Linear regression Machine Learning; Mon Apr 21, 2008
Motivation
Motivation Prediction for target? New observed predictor value
Motivation Problem:  We want a general way of obtaining a distribution  p ( x , t )  fitted to observed data. If we don't try to interpret the distribution, then any distribution with non-zero value at the data points will do. We will use theory from last week to construct  generic  approaches to learning distributions from data.
Motivation Problem:  We want a general way of obtaining a distribution  p ( x , t )  fitted to observed data. If we don't try to interpret the distribution, then any distribution with non-zero value at the data points will do. We will use theory from last week to construct  generic  approaches to learning distributions from data. In this lecture:  linear (normal/Gaussian) models.
Linear Gaussian Models In a linear Guassian model, we model  p ( x , t )  as a conditional Guassian distribution where the  x  dependent mean depends linearly on a set of weights  w .
Example
Example
General linear in input ...or adding a pseudo input x 0 =1
Non-linear in input (but still in weights)
Non-linear in input (but still in weights) But remember that we do not know the “true” underlying function...
Non-linear in input (but still in weights) ...nor the noise around the function...
General linear model Basis functions. Sometimes called “features”.
Examples of basis functions Polynomials Gaussians Sigmoids
Estimating parameters Log likelihood: Observed data:
Estimating parameters Log likelihood: Maximizing wrt  w  means minimizing  E   – the   error function . Observed data:
Estimating parameters
Estimating parameters
Estimating parameters Notice:  This is not just pure mathematics but an actual algorithm for estimating (learning) the parameters!
Estimating parameters Notice:  This is not just pure mathematics but an actual algorithm for estimating (learning) the parameters! C with GSL and CBLAS
Estimating parameters Notice:  This is not just pure mathematics but an actual algorithm for estimating (learning) the parameters! Octave/ Matlab
Geometrical interpretation Geometrically  y  is the projection of  t  onto the space spanned by the features:
Bayesian linear regression For the Bayesian approach we need a prior over the parameters  w  and  b  =  1/ s 2 Conjugate for Gaussian is Gaussian: Functions of observed values
Bayesian linear regression For the Bayesian approach we need a prior over the parameters  w  and  b  =  1/ s 2 Conjugate for Gaussian is Gaussian: Proof not  exactly  like before, but similar, and uses linearity results from Gaussians' from 2.3.3.
Example
Bayesian linear regression Predictor for future observations is also Guassian (again result from 2.3.3):
Bayesian linear regression Predictor for future observations is also Guassian (again result from 2.3.3):
Bayesian linear regression Predictor for future observations is also Guassian (again result from 2.3.3): Both  mean and variance of this distribution depends on  y !
Example
Over fitting Problem:  Over-fitting is always a problem when we fit data to generic models. With nested models, the ML parameters will  never  prefer a simple model over a more complex model...
Maximum likelihood problems
Bayesian model selection We can take a more Bayesian approach and select model based on posterior model probabilities:
Bayesian model selection We can take a more Bayesian approach and select model based on posterior model probabilities: The normalizing factor is the same for all models:
Bayesian model selection We can take a more Bayesian approach and select model based on posterior model probabilities: The prior captures our preferences in the models.
Bayesian model selection We can take a more Bayesian approach and select model based on posterior model probabilities: The likelihood captures the data's preferences in models.
The marginal likelihood The likelihood of the model is the integral over all the models parameters:
The marginal likelihood The likelihood of the model is the integral over all the models parameters: which is also the normalizing factor for the posterior:
Implicit over-fitting penalty Assume this is the shape of prior and posterior prior posterior
Implicit over-fitting penalty Assume this is the shape of prior and posterior By proportionality prior p( D | w )p( w )
Implicit over-fitting penalty Assume this is the shape of prior and posterior By proportionality Integral approximately “width” times “height”
Implicit over-fitting penalty Increasingly negative as posterior becomes “pointy” compared to prior Close fitting to data is implicitly penalized, and the marginal likelihood is a trade-off between maximizing the posterior and minimizing this penalty.
Implicit over-fitting penalty Penalty increases with number of parameters  M   Close fitting to data is implicitly penalized, and the marginal likelihood is a trade-off between maximizing the posterior and minimizing this penalty.
On average we prefer the true model Penalty increases with number of parameters  M   This doesn't mean we always prefer the simplest model! One can show  with zero only when i.e. on average the right model is the  preferred  model.
On average we prefer the true model Penalty increases with number of parameters  M   This doesn't mean we always prefer the simplest model! One can show  with zero only when i.e. on average the right model is the  preferred  model. Negative when we prefer the second model positive when we prefer the first
On average we prefer the true model Penalty increases with number of parameters  M   This doesn't mean we always prefer the simplest model! One can show  with zero only when i.e. on average the right model is the  preferred  model. On average, we will not prefer the second model when the first is true...
On average we prefer the true model Penalty increases with number of parameters  M   Close fitting to data is implicitly penalized, and the marginal likelihood is a trade-off between maximizing the posterior and minimizing this penalty. This doesn't mean we always prefer the simplest model! One can show  with zero only when i.e. on average the right model is the  preferred  model.
On average we prefer the true model
Summary ,[object Object],[object Object],[object Object],[object Object],[object Object]

Más contenido relacionado

La actualidad más candente

Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Simplilearn
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
butest
 
Bayesian Inference: An Introduction to Principles and ...
Bayesian Inference: An Introduction to Principles and ...Bayesian Inference: An Introduction to Principles and ...
Bayesian Inference: An Introduction to Principles and ...
butest
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 

La actualidad más candente (20)

GMM
GMMGMM
GMM
 
Chapter 05 k nn
Chapter 05 k nnChapter 05 k nn
Chapter 05 k nn
 
Machine learning session9(clustering)
Machine learning   session9(clustering)Machine learning   session9(clustering)
Machine learning session9(clustering)
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
 
Validation and Over fitting , Validation strategies
Validation and Over fitting , Validation strategiesValidation and Over fitting , Validation strategies
Validation and Over fitting , Validation strategies
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
 
Bias in Research Methods
Bias in Research Methods Bias in Research Methods
Bias in Research Methods
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
 
Machine Learning - Simple Linear Regression
Machine Learning - Simple Linear RegressionMachine Learning - Simple Linear Regression
Machine Learning - Simple Linear Regression
 
Machine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear RegressionMachine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear Regression
 
Multivariate adaptive regression splines
Multivariate adaptive regression splinesMultivariate adaptive regression splines
Multivariate adaptive regression splines
 
How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysis
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
R - Multiple Regression
R - Multiple RegressionR - Multiple Regression
R - Multiple Regression
 
Explore ML day 1
Explore ML day 1Explore ML day 1
Explore ML day 1
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
Dummy variables
Dummy variablesDummy variables
Dummy variables
 
Bayesian Inference: An Introduction to Principles and ...
Bayesian Inference: An Introduction to Principles and ...Bayesian Inference: An Introduction to Principles and ...
Bayesian Inference: An Introduction to Principles and ...
 
Assumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine LearningAssumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine Learning
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
 

Destacado

Linear Regression Ex
Linear Regression ExLinear Regression Ex
Linear Regression Ex
mailund
 
Linear regression
Linear regressionLinear regression
Linear regression
Tech_MX
 
Multiple regression presentation
Multiple regression presentationMultiple regression presentation
Multiple regression presentation
Carlo Magno
 
Simple linear regression (final)
Simple linear regression (final)Simple linear regression (final)
Simple linear regression (final)
Harsh Upadhyay
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Lior Rokach
 

Destacado (15)

Linear Regression Ex
Linear Regression ExLinear Regression Ex
Linear Regression Ex
 
Data Mining of Informational Stream in Social Networks
Data Mining of Informational Stream in Social Networks   Data Mining of Informational Stream in Social Networks
Data Mining of Informational Stream in Social Networks
 
PresentationMachine Learning, Linear and Bayesian Models for Logistic Regres...
PresentationMachine Learning, Linear and Bayesian Models  for Logistic Regres...PresentationMachine Learning, Linear and Bayesian Models  for Logistic Regres...
PresentationMachine Learning, Linear and Bayesian Models for Logistic Regres...
 
Chap14 multiple regression model building
Chap14 multiple regression model buildingChap14 multiple regression model building
Chap14 multiple regression model building
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Multiple regression presentation
Multiple regression presentationMultiple regression presentation
Multiple regression presentation
 
Simple linear regression (final)
Simple linear regression (final)Simple linear regression (final)
Simple linear regression (final)
 
An Overview of Simple Linear Regression
An Overview of Simple Linear RegressionAn Overview of Simple Linear Regression
An Overview of Simple Linear Regression
 
Pearson Correlation, Spearman Correlation &Linear Regression
Pearson Correlation, Spearman Correlation &Linear RegressionPearson Correlation, Spearman Correlation &Linear Regression
Pearson Correlation, Spearman Correlation &Linear Regression
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 

Similar a Linear Regression

Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selection
chenhm
 
November, 2006 CCKM'06 1
November, 2006 CCKM'06 1 November, 2006 CCKM'06 1
November, 2006 CCKM'06 1
butest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
butest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
butest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
butest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
butest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
butest
 

Similar a Linear Regression (20)

chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.ppt
 
Statistics for deep learning
Statistics for deep learningStatistics for deep learning
Statistics for deep learning
 
Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selection
 
Chapter 18,19
Chapter 18,19Chapter 18,19
Chapter 18,19
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
 
Navies bayes
Navies bayesNavies bayes
Navies bayes
 
Regularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptxRegularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptx
 
November, 2006 CCKM'06 1
November, 2006 CCKM'06 1 November, 2006 CCKM'06 1
November, 2006 CCKM'06 1
 
Linear models for data science
Linear models for data scienceLinear models for data science
Linear models for data science
 
PRML Chapter 3
PRML Chapter 3PRML Chapter 3
PRML Chapter 3
 
Bootcamp of new world to taken seriously
Bootcamp of new world to taken seriouslyBootcamp of new world to taken seriously
Bootcamp of new world to taken seriously
 
Anomaly detection Full Article
Anomaly detection Full ArticleAnomaly detection Full Article
Anomaly detection Full Article
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 
Into to prob_prog_hari
Into to prob_prog_hariInto to prob_prog_hari
Into to prob_prog_hari
 
Unit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptxUnit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptx
 
PRML Chapter 8
PRML Chapter 8PRML Chapter 8
PRML Chapter 8
 

Más de mailund

Ku 05 08 2009
Ku 05 08 2009Ku 05 08 2009
Ku 05 08 2009
mailund
 
Association mapping using local genealogies
Association mapping using local genealogiesAssociation mapping using local genealogies
Association mapping using local genealogies
mailund
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
mailund
 
Probability And Stats Intro
Probability And Stats IntroProbability And Stats Intro
Probability And Stats Intro
mailund
 
Probability And Stats Intro2
Probability And Stats Intro2Probability And Stats Intro2
Probability And Stats Intro2
mailund
 

Más de mailund (20)

Chapter 9 divide and conquer handouts with notes
Chapter 9   divide and conquer handouts with notesChapter 9   divide and conquer handouts with notes
Chapter 9 divide and conquer handouts with notes
 
Chapter 9 divide and conquer handouts
Chapter 9   divide and conquer handoutsChapter 9   divide and conquer handouts
Chapter 9 divide and conquer handouts
 
Chapter 9 divide and conquer
Chapter 9   divide and conquerChapter 9   divide and conquer
Chapter 9 divide and conquer
 
Chapter 7 recursion handouts with notes
Chapter 7   recursion handouts with notesChapter 7   recursion handouts with notes
Chapter 7 recursion handouts with notes
 
Chapter 7 recursion handouts
Chapter 7   recursion handoutsChapter 7   recursion handouts
Chapter 7 recursion handouts
 
Chapter 7 recursion
Chapter 7   recursionChapter 7   recursion
Chapter 7 recursion
 
Chapter 5 searching and sorting handouts with notes
Chapter 5   searching and sorting handouts with notesChapter 5   searching and sorting handouts with notes
Chapter 5 searching and sorting handouts with notes
 
Chapter 5 searching and sorting handouts
Chapter 5   searching and sorting handoutsChapter 5   searching and sorting handouts
Chapter 5 searching and sorting handouts
 
Chapter 5 searching and sorting
Chapter 5   searching and sortingChapter 5   searching and sorting
Chapter 5 searching and sorting
 
Chapter 4 algorithmic efficiency handouts (with notes)
Chapter 4   algorithmic efficiency handouts (with notes)Chapter 4   algorithmic efficiency handouts (with notes)
Chapter 4 algorithmic efficiency handouts (with notes)
 
Chapter 4 algorithmic efficiency handouts
Chapter 4   algorithmic efficiency handoutsChapter 4   algorithmic efficiency handouts
Chapter 4 algorithmic efficiency handouts
 
Chapter 4 algorithmic efficiency
Chapter 4   algorithmic efficiencyChapter 4   algorithmic efficiency
Chapter 4 algorithmic efficiency
 
Chapter 3 introduction to algorithms slides
Chapter 3 introduction to algorithms slidesChapter 3 introduction to algorithms slides
Chapter 3 introduction to algorithms slides
 
Chapter 3 introduction to algorithms handouts (with notes)
Chapter 3 introduction to algorithms handouts (with notes)Chapter 3 introduction to algorithms handouts (with notes)
Chapter 3 introduction to algorithms handouts (with notes)
 
Chapter 3 introduction to algorithms handouts
Chapter 3 introduction to algorithms handoutsChapter 3 introduction to algorithms handouts
Chapter 3 introduction to algorithms handouts
 
Ku 05 08 2009
Ku 05 08 2009Ku 05 08 2009
Ku 05 08 2009
 
Association mapping using local genealogies
Association mapping using local genealogiesAssociation mapping using local genealogies
Association mapping using local genealogies
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Probability And Stats Intro
Probability And Stats IntroProbability And Stats Intro
Probability And Stats Intro
 
Probability And Stats Intro2
Probability And Stats Intro2Probability And Stats Intro2
Probability And Stats Intro2
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 

Linear Regression

  • 1. Linear regression Machine Learning; Mon Apr 21, 2008
  • 3. Motivation Prediction for target? New observed predictor value
  • 4. Motivation Problem: We want a general way of obtaining a distribution p ( x , t ) fitted to observed data. If we don't try to interpret the distribution, then any distribution with non-zero value at the data points will do. We will use theory from last week to construct generic approaches to learning distributions from data.
  • 5. Motivation Problem: We want a general way of obtaining a distribution p ( x , t ) fitted to observed data. If we don't try to interpret the distribution, then any distribution with non-zero value at the data points will do. We will use theory from last week to construct generic approaches to learning distributions from data. In this lecture: linear (normal/Gaussian) models.
  • 6. Linear Gaussian Models In a linear Guassian model, we model p ( x , t ) as a conditional Guassian distribution where the x dependent mean depends linearly on a set of weights w .
  • 9. General linear in input ...or adding a pseudo input x 0 =1
  • 10. Non-linear in input (but still in weights)
  • 11. Non-linear in input (but still in weights) But remember that we do not know the “true” underlying function...
  • 12. Non-linear in input (but still in weights) ...nor the noise around the function...
  • 13. General linear model Basis functions. Sometimes called “features”.
  • 14. Examples of basis functions Polynomials Gaussians Sigmoids
  • 15. Estimating parameters Log likelihood: Observed data:
  • 16. Estimating parameters Log likelihood: Maximizing wrt w means minimizing E – the error function . Observed data:
  • 19. Estimating parameters Notice: This is not just pure mathematics but an actual algorithm for estimating (learning) the parameters!
  • 20. Estimating parameters Notice: This is not just pure mathematics but an actual algorithm for estimating (learning) the parameters! C with GSL and CBLAS
  • 21. Estimating parameters Notice: This is not just pure mathematics but an actual algorithm for estimating (learning) the parameters! Octave/ Matlab
  • 22. Geometrical interpretation Geometrically y is the projection of t onto the space spanned by the features:
  • 23. Bayesian linear regression For the Bayesian approach we need a prior over the parameters w and b = 1/ s 2 Conjugate for Gaussian is Gaussian: Functions of observed values
  • 24. Bayesian linear regression For the Bayesian approach we need a prior over the parameters w and b = 1/ s 2 Conjugate for Gaussian is Gaussian: Proof not exactly like before, but similar, and uses linearity results from Gaussians' from 2.3.3.
  • 26. Bayesian linear regression Predictor for future observations is also Guassian (again result from 2.3.3):
  • 27. Bayesian linear regression Predictor for future observations is also Guassian (again result from 2.3.3):
  • 28. Bayesian linear regression Predictor for future observations is also Guassian (again result from 2.3.3): Both mean and variance of this distribution depends on y !
  • 30. Over fitting Problem: Over-fitting is always a problem when we fit data to generic models. With nested models, the ML parameters will never prefer a simple model over a more complex model...
  • 32. Bayesian model selection We can take a more Bayesian approach and select model based on posterior model probabilities:
  • 33. Bayesian model selection We can take a more Bayesian approach and select model based on posterior model probabilities: The normalizing factor is the same for all models:
  • 34. Bayesian model selection We can take a more Bayesian approach and select model based on posterior model probabilities: The prior captures our preferences in the models.
  • 35. Bayesian model selection We can take a more Bayesian approach and select model based on posterior model probabilities: The likelihood captures the data's preferences in models.
  • 36. The marginal likelihood The likelihood of the model is the integral over all the models parameters:
  • 37. The marginal likelihood The likelihood of the model is the integral over all the models parameters: which is also the normalizing factor for the posterior:
  • 38. Implicit over-fitting penalty Assume this is the shape of prior and posterior prior posterior
  • 39. Implicit over-fitting penalty Assume this is the shape of prior and posterior By proportionality prior p( D | w )p( w )
  • 40. Implicit over-fitting penalty Assume this is the shape of prior and posterior By proportionality Integral approximately “width” times “height”
  • 41. Implicit over-fitting penalty Increasingly negative as posterior becomes “pointy” compared to prior Close fitting to data is implicitly penalized, and the marginal likelihood is a trade-off between maximizing the posterior and minimizing this penalty.
  • 42. Implicit over-fitting penalty Penalty increases with number of parameters M Close fitting to data is implicitly penalized, and the marginal likelihood is a trade-off between maximizing the posterior and minimizing this penalty.
  • 43. On average we prefer the true model Penalty increases with number of parameters M This doesn't mean we always prefer the simplest model! One can show with zero only when i.e. on average the right model is the preferred model.
  • 44. On average we prefer the true model Penalty increases with number of parameters M This doesn't mean we always prefer the simplest model! One can show with zero only when i.e. on average the right model is the preferred model. Negative when we prefer the second model positive when we prefer the first
  • 45. On average we prefer the true model Penalty increases with number of parameters M This doesn't mean we always prefer the simplest model! One can show with zero only when i.e. on average the right model is the preferred model. On average, we will not prefer the second model when the first is true...
  • 46. On average we prefer the true model Penalty increases with number of parameters M Close fitting to data is implicitly penalized, and the marginal likelihood is a trade-off between maximizing the posterior and minimizing this penalty. This doesn't mean we always prefer the simplest model! One can show with zero only when i.e. on average the right model is the preferred model.
  • 47. On average we prefer the true model
  • 48.