SlideShare una empresa de Scribd logo
1 de 16
Support Vector Machines for Regression
July 15, 2015
1 / 16
Overview
1 Linear Regression
2 Non-linear Regression and Kernels
2 / 16
Linear Regression Model
The linear regression model
f(x) = xT
β + β0
To estimate β, we consider minimization of
H(β, β0) =
N
i=1
V (yi − f(xi)) +
λ
2
β 2
with a loss function V and a regularization λ
2 β 2
• How to apply SVM to solve the linear regression problem?
3 / 16
Linear Regression Model (Cont)
The basic idea:
Given training data set (x1, y1), ..., (xN , yN )
Target: find a function f(x) that has at most deviation from targets
yi for all the training data and at the same time is as less complex
(flat) as possible.
In other words we do not care about errors as long as they are less
than but will not accept any deviation larger than this.
4 / 16
Linear Regression Model (Cont)
• We want to find one ” -tube” that can contains all the samples.
• Intuitively, a tube, with a small width, seems to over-fit with the training
data.
We should find f(x) that its -tube’s width is as big as possible (more
generalization capability, less prediction error in future).
• With a defined , a bigger tube
corresponds to a smaller β
(flatter function).
• Optimization problem:
minimize
1
2
β 2
s.t
yi − f(xi) ≤
f(xi) − yi ≤
5 / 16
Linear Regression Model (Cont)
With a defined , this problem is not always feasible, so we also want to
allow some errors.
Use slack variables ξi, ξ∗
i , the
new optimization problem:
minimize
1
2
β 2
+ C
N
i=1
(ξi + ξ∗
i )
s.t



yi − f(xi) ≤ + ξ∗
i
f(xi) − yi ≤ + ξi
ξi, ξ∗
i ≥ 0
6 / 16
Linear Regression Model (Cont)
Let λ = 1
C
Use an ” -insensitive” error measure,
ignoring errors of size less than
V (r) =
0 if |r| <
|r| − , otherwise.
We have the minimization of
H(β, β0) =
N
i=1
V (yi − f(xi)) +
λ
2
β 2
7 / 16
Linear Regression Model (Cont)
The Lagrange (primal) function:
LP =
1
2
β 2
+ C
N
i=1
(ξ∗
i + ξi) −
N
i=1
α∗
i ( + ξ∗
i − yi + xT
i β + β0)
−
N
i=1
αi(ε + ξi + yi − xT
i β − β0) −
N
i=1
(η∗
i ξ∗
i + ηiξi)
which we minimize w.r.t β, β0, ξi, ξ∗
i . Setting the respective derivatives to
0, we get
0 =
N
i=1
(α∗
i − αi)
β =
N
i=1
(α∗
i − αi)xi
α
(∗)
i = C − η
(∗)
i , ∀i
8 / 16
Linear Regression Model (Cont)
Substitute to the primal function, we obtain the dual optimization problem:
max
αi,α∗
i
−
N
i=1
(α∗
i +αi)+
N
i=1
yi(α∗
i −αi)−
1
2
N
i,i =1
(α∗
i −αi)(α∗
i −αi ) xi, xi
s.t



0 ≤ αi, α∗
i ≤ C(= 1/λ)
N
i=1(α∗
i − αi) = 0
αiα∗
i = 0
The solution function has the form
ˆβ =
N
i=1
(ˆα∗
i − ˆαi)xi
ˆf(x) =
N
i=1
(ˆα∗
i − ˆαi) x, xi + β0
9 / 16
Linear Regression Model (Cont)
Follow KKT conditions, we have
ˆα∗
i ( + ξ∗
i − yi + ˆf(xi)) = 0
ˆαi( + ξi + yi − ˆf(xi)) = 0
(C − ˆα∗
i )ˆξ∗
i = 0
(C − ˆαi)ˆξi = 0
→ For all data points inside the -tube, ˆαi = ˆα∗
i = 0. Only data points
outside may have (ˆα∗
i − ˆαi) = 0.
→ Do not need all xi to describe β. The associated data points are called
the support vectors.
10 / 16
Linear Regression Model (Cont)
Parameter controls the width of the -insensitive tube. The value of
can affect the number of support vectors used to construct the
regression function. The bigger , the fewer support vectors are
selected, the ”flatter” estimates.
It is associated with the choice of the loss function ( -insensitive loss
function, quadratic loss function or Huber loss function, etc.)
Parameter C (1
λ) determines the trade off between the model
complexity (flatness) and the degree to which deviations larger than
are tolerated.
It is interpreted as a traditional regularization parameter that can be
estimated by cross-validation for example
11 / 16
Non-linear Regression and Kernels
When the data is non-linear, use a map ϕ to transform the data into a
higher dimensional feature space to make it possible to perform the linear
regression.
12 / 16
Non-linear Regression and Kernels (Cont)
Suppose we consider approximation of the regression function in term of a
set of basis function {hm(x)}, m = 1, 2, ..., M:
f(x) =
M
m=1
βmhm(x) + β0
To estimate β and β0, minimize
H(β, β0) =
N
i=1
V (yi − f(xi)) +
λ
2
β2
m
for some general error measure V (r). The solution has the form
ˆf(x) =
N
i=1
ˆαiK(x, xi)
with K(x, x ) = M
m=1 hm(x)hm(x )
13 / 16
Non-linear Regression and Kernels (Cont)
Let work out with V (r) = r2. Let H be the N x M basis matrix with imth
element hm(xi) For simplicity assume β0 = 0. Estimate β by minimize
H(β) = (y − Hβ)T
(y − Hβ) + λ β 2
Setting the first derivative to zero, we have the solution ˆy = Hˆβ with ˆβ
determined by
−2HT
(y − Hˆβ) + 2λˆβ = 0
−HT
(y − Hˆβ) + λˆβ = 0
−HHT
(y − Hˆβ) + λHˆβ = 0 (premultiply by H)
(HHT
+ λI)Hˆβ = HHT
y
Hˆβ = (HHT
+ λI)−1
HHT
y
14 / 16
Non-linear Regression and Kernels (Cont)
We have estimate function:
f(x) = h(x)T ˆβ
= h(x)T
HT
(HHT
)−1
Hˆβ
= h(x)T
HT
(HHT
)−1
(HHT
+ λI)−1
HHT
y
= h(x)T
HT
[(HHT
+ λI)(HHT
)]−1
HHT
y
= h(x)T
HT
[(HHT
)(HHT
) + λ(HHT
)I]−1
HHT
y
= h(x)T
HT
[(HHT
)(HHT
+ λI)]−1
HHT
y
= h(x)T
HT
(HHT
+ λI)−1
(HHT
)−1
HHT
y
= h(x)T
HT
(HHT
+ λI)−1
y
= [K(x, x1)K(x, x2)...K(x, xN )]ˆα
=
N
i=1
ˆαiK(x, xi)
where ˆα = (HHT
+ λI)−1y. 15 / 16
• The matrix N x N HHT
consists of inner products between pair of
observation i, i . {HHT
}i,i = K(xi, xi )
→ Need not specify or evaluate the large set of functions
h1(x), h2(x), ..., hM (x).
Only the inner product kernel K(xi, xi ) need be evaluated, at the N
training points and at points x for predictions there.
• Some popular choices of K are
dth-Degree polynomial: K(x, x ) = (1 + x, x )d
Radial basis: K(x, x ) = exp(−γ x − x 2)
Neural network: K(x, x ) = tanh(κ1 x, x + κ2)
• This property depends on the choice of squared norm β 2
16 / 16

Más contenido relacionado

La actualidad más candente

Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tearsAnkit Sharma
 
Principles of soft computing-Associative memory networks
Principles of soft computing-Associative memory networksPrinciples of soft computing-Associative memory networks
Principles of soft computing-Associative memory networksSivagowry Shathesh
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine LearningKuppusamy P
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms Hakky St
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)Sharayu Patil
 
1. Linear Algebra for Machine Learning: Linear Systems
1. Linear Algebra for Machine Learning: Linear Systems1. Linear Algebra for Machine Learning: Linear Systems
1. Linear Algebra for Machine Learning: Linear SystemsCeni Babaoglu, PhD
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronomaraldabash
 
Classification using back propagation algorithm
Classification using back propagation algorithmClassification using back propagation algorithm
Classification using back propagation algorithmKIRAN R
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANNMohamed Talaat
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networksAkash Goel
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descentkandelin
 
0 1 knapsack using branch and bound
0 1 knapsack using branch and bound0 1 knapsack using branch and bound
0 1 knapsack using branch and boundAbhishek Singh
 

La actualidad más candente (20)

Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
 
Principles of soft computing-Associative memory networks
Principles of soft computing-Associative memory networksPrinciples of soft computing-Associative memory networks
Principles of soft computing-Associative memory networks
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine Learning
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
 
Learning With Complete Data
Learning With Complete DataLearning With Complete Data
Learning With Complete Data
 
1. Linear Algebra for Machine Learning: Linear Systems
1. Linear Algebra for Machine Learning: Linear Systems1. Linear Algebra for Machine Learning: Linear Systems
1. Linear Algebra for Machine Learning: Linear Systems
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Classification using back propagation algorithm
Classification using back propagation algorithmClassification using back propagation algorithm
Classification using back propagation algorithm
 
Cnn
CnnCnn
Cnn
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
03 Single layer Perception Classifier
03 Single layer Perception Classifier03 Single layer Perception Classifier
03 Single layer Perception Classifier
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
Mc culloch pitts neuron
Mc culloch pitts neuronMc culloch pitts neuron
Mc culloch pitts neuron
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descent
 
Perceptron
PerceptronPerceptron
Perceptron
 
0 1 knapsack using branch and bound
0 1 knapsack using branch and bound0 1 knapsack using branch and bound
0 1 knapsack using branch and bound
 

Similar a SVM for Regression

lecture9-support vector machines algorithms_ML-1.ppt
lecture9-support vector machines algorithms_ML-1.pptlecture9-support vector machines algorithms_ML-1.ppt
lecture9-support vector machines algorithms_ML-1.pptNaglaaAbdelhady
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machinenozomuhamada
 
linear SVM.ppt
linear SVM.pptlinear SVM.ppt
linear SVM.pptMahimMajee
 
lecture14-SVMs (1).ppt
lecture14-SVMs (1).pptlecture14-SVMs (1).ppt
lecture14-SVMs (1).pptmuqadsatareen
 
Support vector machine in data mining.pdf
Support vector machine in data mining.pdfSupport vector machine in data mining.pdf
Support vector machine in data mining.pdfRubhithaA
 
High-Performance Haskell
High-Performance HaskellHigh-Performance Haskell
High-Performance HaskellJohan Tibell
 
Linear Programming
Linear ProgrammingLinear Programming
Linear Programmingknspavan
 
Linear Machine Learning Models with L2 Regularization and Kernel Tricks
Linear Machine Learning Models with L2 Regularization and Kernel TricksLinear Machine Learning Models with L2 Regularization and Kernel Tricks
Linear Machine Learning Models with L2 Regularization and Kernel TricksFengtao Wu
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learningSteve Nouri
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machinesUjjawal
 
Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research Derbew Tesfa
 
Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Asma Ben Slimene
 
Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Asma Ben Slimene
 

Similar a SVM for Regression (20)

lecture9-support vector machines algorithms_ML-1.ppt
lecture9-support vector machines algorithms_ML-1.pptlecture9-support vector machines algorithms_ML-1.ppt
lecture9-support vector machines algorithms_ML-1.ppt
 
smtlecture.6
smtlecture.6smtlecture.6
smtlecture.6
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine
 
linear SVM.ppt
linear SVM.pptlinear SVM.ppt
linear SVM.ppt
 
lecture14-SVMs (1).ppt
lecture14-SVMs (1).pptlecture14-SVMs (1).ppt
lecture14-SVMs (1).ppt
 
Support vector machine in data mining.pdf
Support vector machine in data mining.pdfSupport vector machine in data mining.pdf
Support vector machine in data mining.pdf
 
Gentle intro to SVM
Gentle intro to SVMGentle intro to SVM
Gentle intro to SVM
 
High-Performance Haskell
High-Performance HaskellHigh-Performance Haskell
High-Performance Haskell
 
Linear Programming
Linear ProgrammingLinear Programming
Linear Programming
 
cswiercz-general-presentation
cswiercz-general-presentationcswiercz-general-presentation
cswiercz-general-presentation
 
Computer Network Homework Help
Computer Network Homework HelpComputer Network Homework Help
Computer Network Homework Help
 
Linear Machine Learning Models with L2 Regularization and Kernel Tricks
Linear Machine Learning Models with L2 Regularization and Kernel TricksLinear Machine Learning Models with L2 Regularization and Kernel Tricks
Linear Machine Learning Models with L2 Regularization and Kernel Tricks
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learning
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
Numerical Computation
Numerical ComputationNumerical Computation
Numerical Computation
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research
 
1551 limits and continuity
1551 limits and continuity1551 limits and continuity
1551 limits and continuity
 
Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...
 
Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...
 

Último

The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 

Último (20)

The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 

SVM for Regression

  • 1. Support Vector Machines for Regression July 15, 2015 1 / 16
  • 2. Overview 1 Linear Regression 2 Non-linear Regression and Kernels 2 / 16
  • 3. Linear Regression Model The linear regression model f(x) = xT β + β0 To estimate β, we consider minimization of H(β, β0) = N i=1 V (yi − f(xi)) + λ 2 β 2 with a loss function V and a regularization λ 2 β 2 • How to apply SVM to solve the linear regression problem? 3 / 16
  • 4. Linear Regression Model (Cont) The basic idea: Given training data set (x1, y1), ..., (xN , yN ) Target: find a function f(x) that has at most deviation from targets yi for all the training data and at the same time is as less complex (flat) as possible. In other words we do not care about errors as long as they are less than but will not accept any deviation larger than this. 4 / 16
  • 5. Linear Regression Model (Cont) • We want to find one ” -tube” that can contains all the samples. • Intuitively, a tube, with a small width, seems to over-fit with the training data. We should find f(x) that its -tube’s width is as big as possible (more generalization capability, less prediction error in future). • With a defined , a bigger tube corresponds to a smaller β (flatter function). • Optimization problem: minimize 1 2 β 2 s.t yi − f(xi) ≤ f(xi) − yi ≤ 5 / 16
  • 6. Linear Regression Model (Cont) With a defined , this problem is not always feasible, so we also want to allow some errors. Use slack variables ξi, ξ∗ i , the new optimization problem: minimize 1 2 β 2 + C N i=1 (ξi + ξ∗ i ) s.t    yi − f(xi) ≤ + ξ∗ i f(xi) − yi ≤ + ξi ξi, ξ∗ i ≥ 0 6 / 16
  • 7. Linear Regression Model (Cont) Let λ = 1 C Use an ” -insensitive” error measure, ignoring errors of size less than V (r) = 0 if |r| < |r| − , otherwise. We have the minimization of H(β, β0) = N i=1 V (yi − f(xi)) + λ 2 β 2 7 / 16
  • 8. Linear Regression Model (Cont) The Lagrange (primal) function: LP = 1 2 β 2 + C N i=1 (ξ∗ i + ξi) − N i=1 α∗ i ( + ξ∗ i − yi + xT i β + β0) − N i=1 αi(ε + ξi + yi − xT i β − β0) − N i=1 (η∗ i ξ∗ i + ηiξi) which we minimize w.r.t β, β0, ξi, ξ∗ i . Setting the respective derivatives to 0, we get 0 = N i=1 (α∗ i − αi) β = N i=1 (α∗ i − αi)xi α (∗) i = C − η (∗) i , ∀i 8 / 16
  • 9. Linear Regression Model (Cont) Substitute to the primal function, we obtain the dual optimization problem: max αi,α∗ i − N i=1 (α∗ i +αi)+ N i=1 yi(α∗ i −αi)− 1 2 N i,i =1 (α∗ i −αi)(α∗ i −αi ) xi, xi s.t    0 ≤ αi, α∗ i ≤ C(= 1/λ) N i=1(α∗ i − αi) = 0 αiα∗ i = 0 The solution function has the form ˆβ = N i=1 (ˆα∗ i − ˆαi)xi ˆf(x) = N i=1 (ˆα∗ i − ˆαi) x, xi + β0 9 / 16
  • 10. Linear Regression Model (Cont) Follow KKT conditions, we have ˆα∗ i ( + ξ∗ i − yi + ˆf(xi)) = 0 ˆαi( + ξi + yi − ˆf(xi)) = 0 (C − ˆα∗ i )ˆξ∗ i = 0 (C − ˆαi)ˆξi = 0 → For all data points inside the -tube, ˆαi = ˆα∗ i = 0. Only data points outside may have (ˆα∗ i − ˆαi) = 0. → Do not need all xi to describe β. The associated data points are called the support vectors. 10 / 16
  • 11. Linear Regression Model (Cont) Parameter controls the width of the -insensitive tube. The value of can affect the number of support vectors used to construct the regression function. The bigger , the fewer support vectors are selected, the ”flatter” estimates. It is associated with the choice of the loss function ( -insensitive loss function, quadratic loss function or Huber loss function, etc.) Parameter C (1 λ) determines the trade off between the model complexity (flatness) and the degree to which deviations larger than are tolerated. It is interpreted as a traditional regularization parameter that can be estimated by cross-validation for example 11 / 16
  • 12. Non-linear Regression and Kernels When the data is non-linear, use a map ϕ to transform the data into a higher dimensional feature space to make it possible to perform the linear regression. 12 / 16
  • 13. Non-linear Regression and Kernels (Cont) Suppose we consider approximation of the regression function in term of a set of basis function {hm(x)}, m = 1, 2, ..., M: f(x) = M m=1 βmhm(x) + β0 To estimate β and β0, minimize H(β, β0) = N i=1 V (yi − f(xi)) + λ 2 β2 m for some general error measure V (r). The solution has the form ˆf(x) = N i=1 ˆαiK(x, xi) with K(x, x ) = M m=1 hm(x)hm(x ) 13 / 16
  • 14. Non-linear Regression and Kernels (Cont) Let work out with V (r) = r2. Let H be the N x M basis matrix with imth element hm(xi) For simplicity assume β0 = 0. Estimate β by minimize H(β) = (y − Hβ)T (y − Hβ) + λ β 2 Setting the first derivative to zero, we have the solution ˆy = Hˆβ with ˆβ determined by −2HT (y − Hˆβ) + 2λˆβ = 0 −HT (y − Hˆβ) + λˆβ = 0 −HHT (y − Hˆβ) + λHˆβ = 0 (premultiply by H) (HHT + λI)Hˆβ = HHT y Hˆβ = (HHT + λI)−1 HHT y 14 / 16
  • 15. Non-linear Regression and Kernels (Cont) We have estimate function: f(x) = h(x)T ˆβ = h(x)T HT (HHT )−1 Hˆβ = h(x)T HT (HHT )−1 (HHT + λI)−1 HHT y = h(x)T HT [(HHT + λI)(HHT )]−1 HHT y = h(x)T HT [(HHT )(HHT ) + λ(HHT )I]−1 HHT y = h(x)T HT [(HHT )(HHT + λI)]−1 HHT y = h(x)T HT (HHT + λI)−1 (HHT )−1 HHT y = h(x)T HT (HHT + λI)−1 y = [K(x, x1)K(x, x2)...K(x, xN )]ˆα = N i=1 ˆαiK(x, xi) where ˆα = (HHT + λI)−1y. 15 / 16
  • 16. • The matrix N x N HHT consists of inner products between pair of observation i, i . {HHT }i,i = K(xi, xi ) → Need not specify or evaluate the large set of functions h1(x), h2(x), ..., hM (x). Only the inner product kernel K(xi, xi ) need be evaluated, at the N training points and at points x for predictions there. • Some popular choices of K are dth-Degree polynomial: K(x, x ) = (1 + x, x )d Radial basis: K(x, x ) = exp(−γ x − x 2) Neural network: K(x, x ) = tanh(κ1 x, x + κ2) • This property depends on the choice of squared norm β 2 16 / 16