SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
Seminar 2
Kernels
and
Support Vector Machines
Edgar Marca
Supervisor: DSc. André M.S. Barreto
Petrópolis, Rio de Janeiro - Brazil
September 2nd, 2015
1 / 28
Kernels
Kernels
Why Kernalize?
At first sight, introducing k(x, x′) has not improved our situation.
Instead of calculating ⟨Φ(xi), Φ(xj)⟩ for i, j = 1, . . . n we have to
calculate k(xi, xj), which has exactly the same values. However, there
are two potential reasons why the kernelized setup can be
advantageous:
▶ Speed: We might find and expression for k(xi, xj) that is faster to
calculate than forming Φ(xi) and then ⟨Φ(xi), Φ(xj)⟩.
▶ Flexibility: We construct functions k(x, x′), for which we know
that they corresponds to inner products after some feature
mapping Φ, but we don’t know how to compute Φ.
3 / 28
Kernels
How to use the Kernel Trick
To evaluate a decision function f(x) on an example x, one typically
employs the kernel trick as follows
f(x) = ⟨w, Φ(x)⟩
=
⟨ N∑
i=1
αiΦ(xi), Φ(x)
⟩
=
N∑
i=1
αi ⟨Φ(xi), Φ(x)⟩
=
N∑
i=1
αik(xi, x)
4 / 28
How to proof that a function
is a kernel?
Kernels
Some Definitions
Definition 1.1 (Positive Definite Kernel)
Let X be a nonempty set. A function k : X × X → C is called a
positive definite if and only if
n∑
i=1
n∑
j=1
cicjk(xi, xj) ≥ 0 (1)
for all n ∈ N, {x1, . . . , xn} ⊆ X and {c1, . . . , cn}.
Unfortunately, there is no common use of the preceding definition in
the literature. Indeed, some authors call positive definite function
positive semi-definite, ans strictly positive definite functions are
sometimes called positive definite.
Note:
For fixed x1, x2, . . . , xn ∈ X, then n × n matrix K := [k(xi, xj)]1≤i,j≤n
is often called the Gram Matrix.
6 / 28
Kernels
Mercer Condition
Theorem 1.2
Let X = [a, b] be compact interval and let k : [a, b] × [a, b] → C be
continuous. Then φ is positive definite if and only if
∫ b
a
∫ b
a
c(x)c(y)k(x, y)dxdy ≥ 0 (2)
for each continuous function c : X → C.
7 / 28
Kernels
Theorem 1.3 (Symmetric, positive definite functions are kernels)
A function k : X × X → R is a kernel if and only if is symmetric and
positive definite.
8 / 28
Kernels
Theorem 1.4
Let k1, k2 . . . are arbitrary positive definite kernels in X × X, where X
is not an empty set.
▶ The set of positive definite kernels is a closed convex cone, that is,
1. If α1, α2 ≥ 0, then α1k1 + α2k2 is positive definitive.
2. If k(x, x′
) := lim
n→∞
kn(x, x′
) exists for all x, x′
then k is positive
definitive.
▶ The product k1.k2 is positive definite kernel.
▶ Assume that for i = 1, 2 ki is a positive definite kernel on Xi × Xi,
where Xi is a nonempty set. Then the tensor product k1 ⊗ k2 and
the direct sum k1 ⊕ k2 are positive definite kernels on
(X1 × X2) × (X1 × X2).
▶ Suppose that Y is not an empty set and let f : Y → X any
arbitrary function then k(x, y) = k1(f(x), f(y)) is a positive
definite kernel over Y × Y .
9 / 28
Kernel Families
Kernels Kernel Families
Translation Invariant Kernels
Definition 1.5
A translation invariant kernel is given by
K(x, y) = k(x − y) (3)
where k is a even function in Rn, i.e., k(−x) = k(x) for all x in Rn.
11 / 28
Kernels Kernel Families
Translation Invariant Kernels
Definition 1.6
A function f : (0, ∞) → R is completely monotonic if it is C∞ and, for
all r > 0 and k ≥ 0,
(−1)k
f(k)
(r) ≥ 0 (4)
Here f(k) denotes the k−th derivative of f.
Theorem 1.7
Let X ⊂ Rn, f : (0, ∞) → R and K : X × X → R be defined by
K(x, y) = f(∥x − y∥2). If f is completely monotonic then K is positive
definite.
12 / 28
Kernels Kernel Families
Translation Invariant Kernels
Corollary 1.8
Let c ̸= 0. Then following kernels, defined on a compact domain
X ⊂ Rn, are Mercer Kernels.
▶ Gaussian Kernel or Radial Basis Function (RBF) or
Squared Exponential Kernel (SE)
k(x, y) = exp
(
−
∥x − y∥2
2σ2
)
(5)
▶ Inverse Multiquadratic Kernel
k(x, y) =
(
c2
+ ∥x − y∥2
)−α
, α > 0 (6)
13 / 28
Kernels Kernel Families
Polynomial Kernels
k(x, x′
) = (α⟨x, x′
⟩ + c)d
, α > 0, c ≥ 0, d ∈ Z (7)
14 / 28
Kernels Kernel Families
Non Mercer Kernels
Example 1.9
Let k : X × X → R defined as
k(x, x′
) =
{
1 , ∥x − x′∥ ≥ 1
0 , in other case
(8)
Suppose that k is a Mercer Kernel and set x1 = 1, x2 = 2 and x3 = 3
then the matrix Kij = k(xi, xj) for 1 ≤ i, j ≤ 3 is
K =


1 1 0
1 1 1
0 1 1

 (9)
then the eigenvalues of K are λ1 = (
√
2 − 1)−1 > 0 and
λ2 = (1 −
√
2) < 0. This is a contradiction because all the eigenvalues
of K are positive then we can conclude that k is not a Mercer Kernel.
15 / 28
Kernels Kernel Families
References for Kernels
[3] C. Berg, J. Reus, and P. Ressel. Harmonic Analysis on
Semigroups: Theory of Positive Definite and Related Functions.
Springer Science+Business Media, LLV, 1984.
[9] Felipe Cucker and Ding Xuan Zhou. Learning Theory.
Cambridge University Press, 2007.
[47] Ingo Steinwart and Christmannm Andreas. Support Vector
Machines. 2008.
16 / 28
Support Vector Machines
Applications SVM
Support Vector Machines
w, x + b = 1
w, x + b = −1
w, x + b = 0
margen
Figure: Linear Support Vector Machine
18 / 28
Applications SVM
Primal Problem
Theorem 3.1
The optimization program for the maximum margin classifier is



min
w,b
1
2
∥w∥2
s.a yi(⟨w, xi⟩ + b) ≥ 1, ∀i, 1 ≤ i ≤ m
(10)
19 / 28
Applications SVM
Theorem 3.2
Let F a function defined as:
F : Rm
→ R+
w → F(w) =
1
2
∥w∥2
then following affirmations are hold:
1. F is infinitely differential.
2. The gradient of F is ∇F(w) = w.
3. The Hessian of F is ∇2F(w) = Im×m.
4. The Hessian ∇2F(w) is strictly convex.
20 / 28
Applications SVM
Theorem 3.3 (The dual problem)
The Dual optimization program of (12) is:



max
α
m∑
i=1
αi −
1
2
m∑
i=1
m∑
j=1
αiαjyiyj⟨xi, xj⟩
s.a αi ≥ 0 ∧
m∑
i=1
αiyi = 0, ∀i, 1 ≤ i ≤ m
(11)
where α = (α1, α2, . . . , αm) and the solution for this dual problem will
be denotated by α∗ = (α∗
1, α∗
2, . . . , α∗
m).
21 / 28
Applications SVM
Proof.
The Lagrangianx of the function F is
L(x, b, α) =
1
2
∥w∥2
−
m∑
i=1
αi[yi(⟨w, xi⟩ + b) − 1] (12)
Because of the KKT conditions are hold (F is continuous and
differentiable and the restrictions are also continuous and differentiable)
then we can add the complementary conditions
Stationarity:
∇wL = w −
m∑
i=1
αiyixi = 0 ⇒ w =
m∑
i=1
αiyixi (13)
∇bL = −
m∑
i=1
αiyi = 0 ⇒
m∑
i=1
αiyi = 0 (14)
22 / 28
Applications SVM
Primal feasibility:
yi(⟨w, xi⟩ + b) ≥ 1, ∀i ∈ [1, m] (15)
Dual feasibility:
αi ≥ 0, ∀i ∈ [1, m] (16)
Complementary slackness:
αi[yi(⟨w, xi⟩+b)−1] = 0 ⇒ αi = 0∨yi(⟨w, xi⟩+b) = 1, ∀i ∈ [1, m] (17)
L(w, b, α) =
1
2
m∑
i=1
αiyixi
2
−
m∑
i=1
m∑
j=1
αiαjyiyj⟨xi, xj⟩
=− 1
2
∑m
i=1
∑m
j=1 αiαjyiyj⟨xi,xj⟩
−
m∑
i=1
αiyib
=0
+
m∑
i=1
αi
(18)
then
L(w, b, α) =
m∑
i=1
αi −
1
2
m∑
i=1
m∑
j=1
αiαjyiyj⟨xi, xj⟩ (19)
23 / 28
Applications SVM
Theorem 3.4
Let G a function defined as:
G: Rm
→ R
α → G(α) = αt
Im×m −
1
2
αt
Aα
where α = (α1, α2, . . . , αm) y A = [yiyj⟨xi, xj⟩]1≤i,j≤m in Rm×m then
the following affirmations are hold:
1. The A is symmetric.
2. The function G is differentiable and
∂G(α)
∂α
= Im×m − Aα.
3. The function G is twice differentiable and
∂2G(α)
∂α2
= −A.
4. The function G is a concave function.
24 / 28
Applications SVM
Linear Support Vector Machines
We will called Support Vector Machines to the decision function defined
by
f(x) = sign (⟨w, x⟩ + b) = sign
( m∑
i=1
α∗
i yi⟨xi, x⟩ + b
)
(20)
Where
▶ m is the number of training points.
▶ α∗
i are the lagrange multipliers of the dual problem (13).
25 / 28
Applications Non Linear SVM
Non Linear Support Vector Machines
We will called Non Linear Support Vector Machines to the decision
function defined by
f(x) = sign (⟨w, Φ(x)⟩ + b) = sign
( m∑
i=1
α∗
i yi⟨Φ(xi), Φ(x)⟩ + b
)
(21)
Where
▶ m is the number of training points.
▶ α∗
i are the lagrange multipliers of the dual problem (13).
26 / 28
Applications Non Linear SVM
Applying the Kernel Trick
Using the kernel trick we can replace ⟨Φ(xi), Φ(x)⟩ by a kernel k(xi, x)
f(x) = sign
( m∑
i=1
α∗
i yik(xi, x) + b
)
(22)
Where
▶ m is the number of training points.
▶ α∗
i are the lagrange multipliers of the dual problem (13).
27 / 28
Applications Non Linear SVM
References for Support Vector Machines
[31] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar.
Foundations of Machine Learning. The MIT Press, 2012.
28 / 28

Más contenido relacionado

La actualidad más candente

Deep Learning in Bio-Medical Imaging
Deep Learning in Bio-Medical ImagingDeep Learning in Bio-Medical Imaging
Deep Learning in Bio-Medical ImagingJoonhyung Lee
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and ApplicationsHoang Nguyen
 
Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1Srinivasan R
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classificationSung Yub Kim
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisJaclyn Kokx
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsJason Tsai
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine LearningUpekha Vandebona
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descentkandelin
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningDr. Radhey Shyam
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 ISungbin Lim
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep LearningRayKim51
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methodsReza Ramezani
 

La actualidad más candente (20)

Regularization
RegularizationRegularization
Regularization
 
Deep Learning in Bio-Medical Imaging
Deep Learning in Bio-Medical ImagingDeep Learning in Bio-Medical Imaging
Deep Learning in Bio-Medical Imaging
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
 
Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descent
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 I
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
 

Destacado

Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesadil raja
 
Support Vector Machines: concetti ed esempi
Support Vector Machines: concetti ed esempiSupport Vector Machines: concetti ed esempi
Support Vector Machines: concetti ed esempiGioele Ciaparrone
 
Word2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensimWord2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensimEdgar Marca
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tearsAnkit Sharma
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 

Destacado (6)

Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Support Vector Machines: concetti ed esempi
Support Vector Machines: concetti ed esempiSupport Vector Machines: concetti ed esempi
Support Vector Machines: concetti ed esempi
 
Word2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensimWord2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensim
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 

Similar a Kernels and Support Vector Machines

MLHEP Lectures - day 3, basic track
MLHEP Lectures - day 3, basic trackMLHEP Lectures - day 3, basic track
MLHEP Lectures - day 3, basic trackarogozhnikov
 
Existence Theory for Second Order Nonlinear Functional Random Differential Eq...
Existence Theory for Second Order Nonlinear Functional Random Differential Eq...Existence Theory for Second Order Nonlinear Functional Random Differential Eq...
Existence Theory for Second Order Nonlinear Functional Random Differential Eq...IOSR Journals
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Andrei rusu-2013-amaa-workshop
Andrei rusu-2013-amaa-workshopAndrei rusu-2013-amaa-workshop
Andrei rusu-2013-amaa-workshopAndries Rusu
 
Tensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationTensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationAlexander Litvinenko
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Alexander Litvinenko
 
Litv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfLitv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfAlexander Litvinenko
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsFrank Nielsen
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionCharles Deledalle
 
A Generalized Metric Space and Related Fixed Point Theorems
A Generalized Metric Space and Related Fixed Point TheoremsA Generalized Metric Space and Related Fixed Point Theorems
A Generalized Metric Space and Related Fixed Point TheoremsIRJET Journal
 
NIPS2010: optimization algorithms in machine learning
NIPS2010: optimization algorithms in machine learningNIPS2010: optimization algorithms in machine learning
NIPS2010: optimization algorithms in machine learningzukun
 
The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionPedro222284
 
Conformable Chebyshev differential equation of first kind
Conformable Chebyshev differential equation of first kindConformable Chebyshev differential equation of first kind
Conformable Chebyshev differential equation of first kindIJECEIAES
 
Integral Calculus Anti Derivatives reviewer
Integral Calculus Anti Derivatives reviewerIntegral Calculus Anti Derivatives reviewer
Integral Calculus Anti Derivatives reviewerJoshuaAgcopra
 
8517ijaia06
8517ijaia068517ijaia06
8517ijaia06ijaia
 
Combinatorial optimization CO-4
Combinatorial optimization CO-4Combinatorial optimization CO-4
Combinatorial optimization CO-4man003
 
Indefinite Integral
Indefinite IntegralIndefinite Integral
Indefinite IntegralJelaiAujero
 
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Tomoya Murata
 
Finance Enginering from Columbia.pdf
Finance Enginering from Columbia.pdfFinance Enginering from Columbia.pdf
Finance Enginering from Columbia.pdfCarlosLazo45
 

Similar a Kernels and Support Vector Machines (20)

MLHEP Lectures - day 3, basic track
MLHEP Lectures - day 3, basic trackMLHEP Lectures - day 3, basic track
MLHEP Lectures - day 3, basic track
 
Existence Theory for Second Order Nonlinear Functional Random Differential Eq...
Existence Theory for Second Order Nonlinear Functional Random Differential Eq...Existence Theory for Second Order Nonlinear Functional Random Differential Eq...
Existence Theory for Second Order Nonlinear Functional Random Differential Eq...
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Andrei rusu-2013-amaa-workshop
Andrei rusu-2013-amaa-workshopAndrei rusu-2013-amaa-workshop
Andrei rusu-2013-amaa-workshop
 
Tensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationTensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantification
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
 
Litv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfLitv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdf
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributions
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - Introduction
 
A Generalized Metric Space and Related Fixed Point Theorems
A Generalized Metric Space and Related Fixed Point TheoremsA Generalized Metric Space and Related Fixed Point Theorems
A Generalized Metric Space and Related Fixed Point Theorems
 
NIPS2010: optimization algorithms in machine learning
NIPS2010: optimization algorithms in machine learningNIPS2010: optimization algorithms in machine learning
NIPS2010: optimization algorithms in machine learning
 
The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability Distribution
 
Conformable Chebyshev differential equation of first kind
Conformable Chebyshev differential equation of first kindConformable Chebyshev differential equation of first kind
Conformable Chebyshev differential equation of first kind
 
Integral Calculus Anti Derivatives reviewer
Integral Calculus Anti Derivatives reviewerIntegral Calculus Anti Derivatives reviewer
Integral Calculus Anti Derivatives reviewer
 
8517ijaia06
8517ijaia068517ijaia06
8517ijaia06
 
Combinatorial optimization CO-4
Combinatorial optimization CO-4Combinatorial optimization CO-4
Combinatorial optimization CO-4
 
ma112011id535
ma112011id535ma112011id535
ma112011id535
 
Indefinite Integral
Indefinite IntegralIndefinite Integral
Indefinite Integral
 
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
 
Finance Enginering from Columbia.pdf
Finance Enginering from Columbia.pdfFinance Enginering from Columbia.pdf
Finance Enginering from Columbia.pdf
 

Más de Edgar Marca

Python Packages for Web Data Extraction and Analysis
Python Packages for Web Data Extraction and AnalysisPython Packages for Web Data Extraction and Analysis
Python Packages for Web Data Extraction and AnalysisEdgar Marca
 
The Kernel Trick
The Kernel TrickThe Kernel Trick
The Kernel TrickEdgar Marca
 
Aprendizaje de Maquina y Aplicaciones
Aprendizaje de Maquina y AplicacionesAprendizaje de Maquina y Aplicaciones
Aprendizaje de Maquina y AplicacionesEdgar Marca
 
Tilemill: Una Herramienta Open Source para diseñar mapas
Tilemill: Una Herramienta Open Source para diseñar mapasTilemill: Una Herramienta Open Source para diseñar mapas
Tilemill: Una Herramienta Open Source para diseñar mapasEdgar Marca
 
Buenas Aplicaciones y Programas con Datos Abiertos / Publicos.
Buenas Aplicaciones y Programas con Datos Abiertos / Publicos.Buenas Aplicaciones y Programas con Datos Abiertos / Publicos.
Buenas Aplicaciones y Programas con Datos Abiertos / Publicos.Edgar Marca
 
Theming cck-n-views
Theming cck-n-viewsTheming cck-n-views
Theming cck-n-viewsEdgar Marca
 

Más de Edgar Marca (6)

Python Packages for Web Data Extraction and Analysis
Python Packages for Web Data Extraction and AnalysisPython Packages for Web Data Extraction and Analysis
Python Packages for Web Data Extraction and Analysis
 
The Kernel Trick
The Kernel TrickThe Kernel Trick
The Kernel Trick
 
Aprendizaje de Maquina y Aplicaciones
Aprendizaje de Maquina y AplicacionesAprendizaje de Maquina y Aplicaciones
Aprendizaje de Maquina y Aplicaciones
 
Tilemill: Una Herramienta Open Source para diseñar mapas
Tilemill: Una Herramienta Open Source para diseñar mapasTilemill: Una Herramienta Open Source para diseñar mapas
Tilemill: Una Herramienta Open Source para diseñar mapas
 
Buenas Aplicaciones y Programas con Datos Abiertos / Publicos.
Buenas Aplicaciones y Programas con Datos Abiertos / Publicos.Buenas Aplicaciones y Programas con Datos Abiertos / Publicos.
Buenas Aplicaciones y Programas con Datos Abiertos / Publicos.
 
Theming cck-n-views
Theming cck-n-viewsTheming cck-n-views
Theming cck-n-views
 

Último

PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsbassianu17
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry Areesha Ahmad
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Silpa
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxSilpa
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Silpa
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 

Último (20)

PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 

Kernels and Support Vector Machines

  • 1. Seminar 2 Kernels and Support Vector Machines Edgar Marca Supervisor: DSc. André M.S. Barreto Petrópolis, Rio de Janeiro - Brazil September 2nd, 2015 1 / 28
  • 3. Kernels Why Kernalize? At first sight, introducing k(x, x′) has not improved our situation. Instead of calculating ⟨Φ(xi), Φ(xj)⟩ for i, j = 1, . . . n we have to calculate k(xi, xj), which has exactly the same values. However, there are two potential reasons why the kernelized setup can be advantageous: ▶ Speed: We might find and expression for k(xi, xj) that is faster to calculate than forming Φ(xi) and then ⟨Φ(xi), Φ(xj)⟩. ▶ Flexibility: We construct functions k(x, x′), for which we know that they corresponds to inner products after some feature mapping Φ, but we don’t know how to compute Φ. 3 / 28
  • 4. Kernels How to use the Kernel Trick To evaluate a decision function f(x) on an example x, one typically employs the kernel trick as follows f(x) = ⟨w, Φ(x)⟩ = ⟨ N∑ i=1 αiΦ(xi), Φ(x) ⟩ = N∑ i=1 αi ⟨Φ(xi), Φ(x)⟩ = N∑ i=1 αik(xi, x) 4 / 28
  • 5. How to proof that a function is a kernel?
  • 6. Kernels Some Definitions Definition 1.1 (Positive Definite Kernel) Let X be a nonempty set. A function k : X × X → C is called a positive definite if and only if n∑ i=1 n∑ j=1 cicjk(xi, xj) ≥ 0 (1) for all n ∈ N, {x1, . . . , xn} ⊆ X and {c1, . . . , cn}. Unfortunately, there is no common use of the preceding definition in the literature. Indeed, some authors call positive definite function positive semi-definite, ans strictly positive definite functions are sometimes called positive definite. Note: For fixed x1, x2, . . . , xn ∈ X, then n × n matrix K := [k(xi, xj)]1≤i,j≤n is often called the Gram Matrix. 6 / 28
  • 7. Kernels Mercer Condition Theorem 1.2 Let X = [a, b] be compact interval and let k : [a, b] × [a, b] → C be continuous. Then φ is positive definite if and only if ∫ b a ∫ b a c(x)c(y)k(x, y)dxdy ≥ 0 (2) for each continuous function c : X → C. 7 / 28
  • 8. Kernels Theorem 1.3 (Symmetric, positive definite functions are kernels) A function k : X × X → R is a kernel if and only if is symmetric and positive definite. 8 / 28
  • 9. Kernels Theorem 1.4 Let k1, k2 . . . are arbitrary positive definite kernels in X × X, where X is not an empty set. ▶ The set of positive definite kernels is a closed convex cone, that is, 1. If α1, α2 ≥ 0, then α1k1 + α2k2 is positive definitive. 2. If k(x, x′ ) := lim n→∞ kn(x, x′ ) exists for all x, x′ then k is positive definitive. ▶ The product k1.k2 is positive definite kernel. ▶ Assume that for i = 1, 2 ki is a positive definite kernel on Xi × Xi, where Xi is a nonempty set. Then the tensor product k1 ⊗ k2 and the direct sum k1 ⊕ k2 are positive definite kernels on (X1 × X2) × (X1 × X2). ▶ Suppose that Y is not an empty set and let f : Y → X any arbitrary function then k(x, y) = k1(f(x), f(y)) is a positive definite kernel over Y × Y . 9 / 28
  • 11. Kernels Kernel Families Translation Invariant Kernels Definition 1.5 A translation invariant kernel is given by K(x, y) = k(x − y) (3) where k is a even function in Rn, i.e., k(−x) = k(x) for all x in Rn. 11 / 28
  • 12. Kernels Kernel Families Translation Invariant Kernels Definition 1.6 A function f : (0, ∞) → R is completely monotonic if it is C∞ and, for all r > 0 and k ≥ 0, (−1)k f(k) (r) ≥ 0 (4) Here f(k) denotes the k−th derivative of f. Theorem 1.7 Let X ⊂ Rn, f : (0, ∞) → R and K : X × X → R be defined by K(x, y) = f(∥x − y∥2). If f is completely monotonic then K is positive definite. 12 / 28
  • 13. Kernels Kernel Families Translation Invariant Kernels Corollary 1.8 Let c ̸= 0. Then following kernels, defined on a compact domain X ⊂ Rn, are Mercer Kernels. ▶ Gaussian Kernel or Radial Basis Function (RBF) or Squared Exponential Kernel (SE) k(x, y) = exp ( − ∥x − y∥2 2σ2 ) (5) ▶ Inverse Multiquadratic Kernel k(x, y) = ( c2 + ∥x − y∥2 )−α , α > 0 (6) 13 / 28
  • 14. Kernels Kernel Families Polynomial Kernels k(x, x′ ) = (α⟨x, x′ ⟩ + c)d , α > 0, c ≥ 0, d ∈ Z (7) 14 / 28
  • 15. Kernels Kernel Families Non Mercer Kernels Example 1.9 Let k : X × X → R defined as k(x, x′ ) = { 1 , ∥x − x′∥ ≥ 1 0 , in other case (8) Suppose that k is a Mercer Kernel and set x1 = 1, x2 = 2 and x3 = 3 then the matrix Kij = k(xi, xj) for 1 ≤ i, j ≤ 3 is K =   1 1 0 1 1 1 0 1 1   (9) then the eigenvalues of K are λ1 = ( √ 2 − 1)−1 > 0 and λ2 = (1 − √ 2) < 0. This is a contradiction because all the eigenvalues of K are positive then we can conclude that k is not a Mercer Kernel. 15 / 28
  • 16. Kernels Kernel Families References for Kernels [3] C. Berg, J. Reus, and P. Ressel. Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions. Springer Science+Business Media, LLV, 1984. [9] Felipe Cucker and Ding Xuan Zhou. Learning Theory. Cambridge University Press, 2007. [47] Ingo Steinwart and Christmannm Andreas. Support Vector Machines. 2008. 16 / 28
  • 18. Applications SVM Support Vector Machines w, x + b = 1 w, x + b = −1 w, x + b = 0 margen Figure: Linear Support Vector Machine 18 / 28
  • 19. Applications SVM Primal Problem Theorem 3.1 The optimization program for the maximum margin classifier is    min w,b 1 2 ∥w∥2 s.a yi(⟨w, xi⟩ + b) ≥ 1, ∀i, 1 ≤ i ≤ m (10) 19 / 28
  • 20. Applications SVM Theorem 3.2 Let F a function defined as: F : Rm → R+ w → F(w) = 1 2 ∥w∥2 then following affirmations are hold: 1. F is infinitely differential. 2. The gradient of F is ∇F(w) = w. 3. The Hessian of F is ∇2F(w) = Im×m. 4. The Hessian ∇2F(w) is strictly convex. 20 / 28
  • 21. Applications SVM Theorem 3.3 (The dual problem) The Dual optimization program of (12) is:    max α m∑ i=1 αi − 1 2 m∑ i=1 m∑ j=1 αiαjyiyj⟨xi, xj⟩ s.a αi ≥ 0 ∧ m∑ i=1 αiyi = 0, ∀i, 1 ≤ i ≤ m (11) where α = (α1, α2, . . . , αm) and the solution for this dual problem will be denotated by α∗ = (α∗ 1, α∗ 2, . . . , α∗ m). 21 / 28
  • 22. Applications SVM Proof. The Lagrangianx of the function F is L(x, b, α) = 1 2 ∥w∥2 − m∑ i=1 αi[yi(⟨w, xi⟩ + b) − 1] (12) Because of the KKT conditions are hold (F is continuous and differentiable and the restrictions are also continuous and differentiable) then we can add the complementary conditions Stationarity: ∇wL = w − m∑ i=1 αiyixi = 0 ⇒ w = m∑ i=1 αiyixi (13) ∇bL = − m∑ i=1 αiyi = 0 ⇒ m∑ i=1 αiyi = 0 (14) 22 / 28
  • 23. Applications SVM Primal feasibility: yi(⟨w, xi⟩ + b) ≥ 1, ∀i ∈ [1, m] (15) Dual feasibility: αi ≥ 0, ∀i ∈ [1, m] (16) Complementary slackness: αi[yi(⟨w, xi⟩+b)−1] = 0 ⇒ αi = 0∨yi(⟨w, xi⟩+b) = 1, ∀i ∈ [1, m] (17) L(w, b, α) = 1 2 m∑ i=1 αiyixi 2 − m∑ i=1 m∑ j=1 αiαjyiyj⟨xi, xj⟩ =− 1 2 ∑m i=1 ∑m j=1 αiαjyiyj⟨xi,xj⟩ − m∑ i=1 αiyib =0 + m∑ i=1 αi (18) then L(w, b, α) = m∑ i=1 αi − 1 2 m∑ i=1 m∑ j=1 αiαjyiyj⟨xi, xj⟩ (19) 23 / 28
  • 24. Applications SVM Theorem 3.4 Let G a function defined as: G: Rm → R α → G(α) = αt Im×m − 1 2 αt Aα where α = (α1, α2, . . . , αm) y A = [yiyj⟨xi, xj⟩]1≤i,j≤m in Rm×m then the following affirmations are hold: 1. The A is symmetric. 2. The function G is differentiable and ∂G(α) ∂α = Im×m − Aα. 3. The function G is twice differentiable and ∂2G(α) ∂α2 = −A. 4. The function G is a concave function. 24 / 28
  • 25. Applications SVM Linear Support Vector Machines We will called Support Vector Machines to the decision function defined by f(x) = sign (⟨w, x⟩ + b) = sign ( m∑ i=1 α∗ i yi⟨xi, x⟩ + b ) (20) Where ▶ m is the number of training points. ▶ α∗ i are the lagrange multipliers of the dual problem (13). 25 / 28
  • 26. Applications Non Linear SVM Non Linear Support Vector Machines We will called Non Linear Support Vector Machines to the decision function defined by f(x) = sign (⟨w, Φ(x)⟩ + b) = sign ( m∑ i=1 α∗ i yi⟨Φ(xi), Φ(x)⟩ + b ) (21) Where ▶ m is the number of training points. ▶ α∗ i are the lagrange multipliers of the dual problem (13). 26 / 28
  • 27. Applications Non Linear SVM Applying the Kernel Trick Using the kernel trick we can replace ⟨Φ(xi), Φ(x)⟩ by a kernel k(xi, x) f(x) = sign ( m∑ i=1 α∗ i yik(xi, x) + b ) (22) Where ▶ m is the number of training points. ▶ α∗ i are the lagrange multipliers of the dual problem (13). 27 / 28
  • 28. Applications Non Linear SVM References for Support Vector Machines [31] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning. The MIT Press, 2012. 28 / 28