SlideShare una empresa de Scribd logo
1 de 40
Support Vector Machines in MapReduce
Presented by
Asghar Dehghani, Alpine Data Labs
Sara Asher, Alpine Data Labs
Overview
§  Theory of basic SVM (biclassification, linear)
§  Generalized SVM (multi-classification)
§  MapReducing SVM
§  Handling kernels (nonlinear SVM) in MapReduce
§  Demo
Background on SVM
§  Given a bunch of points…
Background on SVM
§  How do we classify a new point?
Background on SVM
§  Split the space using a hyper-plane
Background on SVM
§  Split the space using a hyper-plane
Background on SVM
§  Split the space using a hyper-plane
Background on SVM
§  Which plane do you use?
Background on SVM
§  Margin: Distance from closest points to the hyper-plane
§  Idea: Among the set of hyper-planes, choose the one that
maximizes the margin
6/6/13 9
ρ	
  
SVs	
  
Background on SVM
6/6/13 10
wTx + b = 0
ρ	
  
wTx + b > ρ
wTx + b < -ρ
ρ	
  
•  Hyper-plane represented by:
•  We want to choose the w and
b that will maximize the
margin ρ.
•  Using some algebra and
some rescaling, we can show
that for the support vectors:
margin =
1
w
Background on SVM (cont.)
§  Thus the goal is solving the following optimization problem:
6/6/13 11
Subject to yi (wTxi + b) ≥ 1, i =1..n
(where yi = 1 or -1, depending on which class of xi)
Argmax
W,b
ρ =
1
w
!
"
##
$
%
&& = Argmin
W,b
w( )
Background on SVM (cont.)
§  To avoid square roots, can do the following transformation
§  Thus, the problem is solving a quadratic function minimization
subject to linear constraints (well studied)
6/6/13 12
Subject to yi (wTxi + b) ≥ 1, i =1..n
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
Subject to yi (wTxi + b) ≥ 1, i =1..n

)
2
1
(
2
,
min wArg
bW
W,b
Argmin( w )
Background on SVM (cont.)
§  What happens is the data is not linearly separable? (i.e., there
is no hyper-plane that will split the data exactly)
Background on SVM (cont.)
6/6/13 14
Subject to yi (wTxi + b) ≥ 1, i=1 .. n
)
2
1
(
2
,
min wArg
bW
•  Slack variables ξi is added to the constraints.
•  ξi is the distance from xi to its class boundary.
Background on SVM (cont.)
6/6/13 CORTES, Corinna, and Vladimir VAPNIK, 1995. Support-vector
networks. Machine Learning, 20(3), 273–297
15
Subject to yi (wTxi + b) ≥ 1, i=1 .. n
)
2
1
(
2
,
min wArg
bW
⇓
Subject to yi (wTxi + b) ≥ 1 – ξi,, ξi ≥ 0, i=1 .. n
W,b
Argmin(
1
2
w
2
+C ξi
i=1
n
∑ )
(add slack)
•  Slack variables ξi is added to the constraints.
•  ξi is the distance from xi to its class boundary.
•  C is the regularization parameter which controls the bias-
variance trade-off (significance of outliers)
Background on SVM (cont.)
6/6/13 CORTES, Corinna, and Vladimir VAPNIK, 1995. Support-vector
networks. Machine Learning, 20(3), 273–297
16
Subject to yi (wTxi + b) ≥ 1 – ξi,, ξi ≥ 0, i=1 .. n
W,b
Argmin(
1
2
w
2
+C ξi
i=1
n
∑ )
Question: how to get rid of the constraints?
Background on SVM (cont.)
6/6/13 17
Subject to yi (wTxi + b) ≥ 1 – ξi,, ξi ≥ 0, i=1 .. n
W,b
Argmin(
1
2
w
2
+C ξi
i=1
n
∑ )
Answer: Fenchel Duality and Representer Theorems!
W,b
Argmin
λ
2
w
2
+ max 0,1− wT
xi − b( )
Hinge Loss
  i=1
n
∑
#
$
%
%%
&
'
(
((
We’ve removed the constraint! SVM minimizes the
“L2 Regularized Hinge”
Background on SVM (cont.)
§  What happens to the multi-class situations?
There are different ways to handle multi-classification:
•  One vs. all
•  One vs. one
•  Cost-sensitive Hinge (Crammer and Singer 2001)
Cost sensitive formulation of hinge loss
(Crammer and Singer 2001)
Where
This loss function is called “cost-sensitive hinge.”
And the prediction function is:
Background on SVM (cont.)
6/6/13 Crammer, K & Singer. Y. (2001). On the algorithmic implementation of
multiclass kernel-based vector machines. JMLR, 2, 262-292.
19
W,b
Argmin
λ
2
w
2
+ max 0,1+ f r
(xi )− f t
(xi )( )
multi-class Hinge
  i=1
n
∑
#
$
%
%%
&
'
(
((
f r
(xi ) = Argmax(wi x + bi ),i ∈ Y,i ≠ t
f t
(xi ) = wt x + bt
f (x) = Argmax(wi x + bi ),i ∈ Y
SVM: Implementation
We now have our function that we need to optimize. But how do
we parallelize this for map-reduce framework?
6/6/13 20
SVM: Implementation
We now have our function that we need to optimize. But how do
we parallelize this for map-reduce framework?
6/6/13 21
Parallelized	
  Stochas1c	
  Gradient	
  Descent	
  
By	
  Mar'n	
  Zinkevich,	
  Markus	
  Weimer,	
  Alexander	
  J.	
  
Smola,	
  Lihong	
  Li	
  	
  
NIPS	
  2010	
  
Parallelized Stochastic Gradient Descent - Theory
6/6/13 22
Parallelized Stochastic Gradient Descent - Theory
§  Conditions:
•  SVM loss function has bounded gradient
•  The solver is stochastic
§  Result:
•  You can break the original sample into randomly distributed
subsamples and solve on each subsample.
•  The convex combination of each sub-solution will be the same as the
solution for the original sample
6/6/13 23
Optimization
§  Conditions:
•  SVM loss function has bounded gradient
•  The solver is stochastic
§  Loss: Cost sensitive hinge
•  Crammer, K & Singer. Y. (2001). On the algorithmic implementation of
multiclass kernel-based vector machines. JMLR, 2, 262-292.
§  Solver: Pegasos
•  Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: primal estimated
sub-gradient solver for svm. ICML, 807-814.
§  Use mapper to random distribute the samples, and use reducer to
iterate on the sub-sample.
6/6/13 24
SVM: Non-tolerable data
But what about non-tolerable data?
6/6/13 25
SVM: Non-tolerable data
But what about non-tolerable data?
6/6/13 26
Idea: Transform the pattern space to a higher
dimensional space, called feature space, which is
linearly separable
SVM: Non-tolerable data
But what about non-tolerable data?
6/6/13 27
Idea: Transform the pattern space to a higher
dimensional space, called feature space, which is
linearly separable
SVM: Non-tolerable data
But what about non-tolerable data?
6/6/13 28
Idea: Transform the pattern space to a higher
dimensional space, called feature space, which is
linearly separable
SVM: Kernels
§  Two questions:
•  What kind of function is a kernel?
•  What kernel is appropriate for a specific problem?
§  The answers:
•  Mercer’s Theorem: Every semi-positive definite symmetric
function is a kernel
•  Depends on the problem.
6/6/13
http://www.ism.ac.jp/~fukumizu/
H20_kernel/Kernel_7_theory.pdf
29
SVM: Kernels
§  Examples of popular kernel functions:
•  Gaussian kernel:
•  Laplacian kernel:
•  Polynomial kernel:
6/6/13 30
2
2
2
),( σ
ji
exxK ji
xx −
−
=
θ
θ ||||
sin
||||
),(
ji
ji
ji xxK
xx
xx
−
−
=
( )d
j
T
iji bxaxxxK +=),(
SVM: Kernels
§  Kernel (dual) feature space is defined by the inner products
between each
§  Kernel matrix is N × N, where N is the number of samples
§  As your sample size goes up, kernel matrix gets huge!
§  Yet, the problem is lack of ability to match with MapReduce!
6/6/13 31
⇒ Dual space is not feasible at scale
xi and xj
SVM: Implementation
§  Question: How having a non-linear SVM without paying the
price of duality?
6/6/13 32
SVM: Implementation
§  Question: How having a non-linear SVM without paying the
price of duality?
§  Claim: For certain kernel functions we can find a function z
where
6/6/13 33
W,b
Argmin
λ
2
w
2
+ max 0,1− wT
z xi( )− b( )
Hinge Loss
  i=1
n
∑
#
$
%
%%
&
'
(
((
z
SVM: Implementation
6/6/13 34
Random	
  Features	
  for	
  Large-­‐Scale	
  Kernel	
  	
  Machines	
  
By	
  Ali	
  Rahimi	
  and	
  Ben	
  Recht	
  
NIPS	
  2007	
  
Can	
  approximate	
  shi1-­‐invariant	
  kernels	
  
Random	
  Feature	
  Maps	
  for	
  Dot	
  Product	
  
By	
  PurushoHam	
  Kar	
  and	
  Karish	
  Karnick	
  
AISTATS	
  2012	
  
Can	
  approximate	
  dot-­‐product	
  kernels	
  
Approximating shift-invariant kernel
6/6/13 35
Random	
  Features	
  for	
  Large-­‐Scale	
  Kernel	
  	
  Machines	
  
	
  
	
  
	
  
Given a positive definite shift-invariant kernel K x, y( )= f x − y( ),
we can create a randomized feature map Z : Rd
→ RD
such that
Z x( )#Z y( ) ≈ K x − y( )
Compute the Fourier tranform p of the kernel k: p(ω) =
1
2π
e− j "ω δ
k δ( )dΔ∫
Draw D iid samples ω1,...,ωD ∈ Rd
from p.
Draw D iid samples b1,...,bD ∈ R from the uniform distribution on 0,2π[ ].
Z : x →
2
D
cos "ω1x + b1( )cos "ωDx + bD( )#$ %&
"	
  
	
  
SVM: Implementation
6/6/13 36
Approximating dot-product kernel
6/6/13 37
Random	
  Feature	
  Maps	
  for	
  Dot	
  Product	
  Kernels	
  
	
  
	
  
	
  
Obtain the Maclaurin expansion of f (x) = an xn
n=0
∞
∑ by setting an =
f
n( )
0( )
n!
Fix a value p >1. For i =1 to D :
Choose a non-negative integer with P N = n[ ]=
1
pn+1
Choose N vectors ω1,...,ωn ∈ −1,1{ }
d
selecting each coordinate using fair coin tosses.
Let feature map Zi : x → aN pN+1
ωj
T
j=1
N
∏ x
Z : x →
1
D
Z1 x( ),..., ZD x( )( )
Given a positive definite dot product kernel K x, y( )= f x, y( ),
we can create a randomized feature map Z : Rd
→ RD
such that
Z x( ),Z y( ) ≈ K x, y( )
SVM: Implementation Summary
Using these approximations, we can now treat this as a linear SVM
problem.
(1)  Job 1 – compute stats for feature and class (mean, variance, class
cardinality, etc.)
(2)  Job 2- Transform sample by the approximate kernel and compute
stats for new feature space.
(3)  Job 3 – randomly distribute the new samples and train the model in
the reducer.
6/6/13 38
We can use map-reduce to solve non-linear
multi-classification SVM!
SVM: Implementation examples
§  SVM used by large entertainment company for customer
segmentation
•  Web logs containing browsing information mined for customer attributes
like gender and age
•  Raw Omniture logs stored in Hadoop
•  Models built on ~10 billion rows and 1 million features
•  Models used to improve inventory value of company’s web properties
for publishers
Questions?

Más contenido relacionado

La actualidad más candente

Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Simplilearn
 

La actualidad más candente (20)

Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbit
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
 
Deep Learning for AI (2)
Deep Learning for AI (2)Deep Learning for AI (2)
Deep Learning for AI (2)
 
Slide tesi
Slide tesiSlide tesi
Slide tesi
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++
 
Chainer v3
Chainer v3Chainer v3
Chainer v3
 
Neural tool box
Neural tool boxNeural tool box
Neural tool box
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
 
Learning stochastic neural networks with Chainer
Learning stochastic neural networks with ChainerLearning stochastic neural networks with Chainer
Learning stochastic neural networks with Chainer
 
Cmu Lecture on Hadoop Performance
Cmu Lecture on Hadoop PerformanceCmu Lecture on Hadoop Performance
Cmu Lecture on Hadoop Performance
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
 
Map-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on MulticoreMap-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicore
 
Processing Reachability Queries with Realistic Constraints on Massive Network...
Processing Reachability Queries with Realistic Constraints on Massive Network...Processing Reachability Queries with Realistic Constraints on Massive Network...
Processing Reachability Queries with Realistic Constraints on Massive Network...
 
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
 
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsDiscovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
 
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
 

Similar a Svm map reduce_slides

GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Ukraine
 
Tutorial - Support vector machines
Tutorial - Support vector machinesTutorial - Support vector machines
Tutorial - Support vector machines
butest
 

Similar a Svm map reduce_slides (20)

The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Notes relating to Machine Learning and SVM
Notes relating to Machine Learning and SVMNotes relating to Machine Learning and SVM
Notes relating to Machine Learning and SVM
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Dual SVM Problem.pdf
Dual SVM Problem.pdfDual SVM Problem.pdf
Dual SVM Problem.pdf
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Introduction to Support Vector Machines
Introduction to Support Vector MachinesIntroduction to Support Vector Machines
Introduction to Support Vector Machines
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketching
 
OM-DS-Fall2022-Session10-Support vector machine.pdf
OM-DS-Fall2022-Session10-Support vector machine.pdfOM-DS-Fall2022-Session10-Support vector machine.pdf
OM-DS-Fall2022-Session10-Support vector machine.pdf
 
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
 
svm-proyekt.pptx
svm-proyekt.pptxsvm-proyekt.pptx
svm-proyekt.pptx
 
SVM.ppt
SVM.pptSVM.ppt
SVM.ppt
 
Machine Learning 1
Machine Learning 1Machine Learning 1
Machine Learning 1
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Support Vector Machine.pptx
Support Vector Machine.pptxSupport Vector Machine.pptx
Support Vector Machine.pptx
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic track
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines Simply
 
Tutorial - Support vector machines
Tutorial - Support vector machinesTutorial - Support vector machines
Tutorial - Support vector machines
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Svm map reduce_slides

  • 1. Support Vector Machines in MapReduce Presented by Asghar Dehghani, Alpine Data Labs Sara Asher, Alpine Data Labs
  • 2. Overview §  Theory of basic SVM (biclassification, linear) §  Generalized SVM (multi-classification) §  MapReducing SVM §  Handling kernels (nonlinear SVM) in MapReduce §  Demo
  • 3. Background on SVM §  Given a bunch of points…
  • 4. Background on SVM §  How do we classify a new point?
  • 5. Background on SVM §  Split the space using a hyper-plane
  • 6. Background on SVM §  Split the space using a hyper-plane
  • 7. Background on SVM §  Split the space using a hyper-plane
  • 8. Background on SVM §  Which plane do you use?
  • 9. Background on SVM §  Margin: Distance from closest points to the hyper-plane §  Idea: Among the set of hyper-planes, choose the one that maximizes the margin 6/6/13 9 ρ   SVs  
  • 10. Background on SVM 6/6/13 10 wTx + b = 0 ρ   wTx + b > ρ wTx + b < -ρ ρ   •  Hyper-plane represented by: •  We want to choose the w and b that will maximize the margin ρ. •  Using some algebra and some rescaling, we can show that for the support vectors: margin = 1 w
  • 11. Background on SVM (cont.) §  Thus the goal is solving the following optimization problem: 6/6/13 11 Subject to yi (wTxi + b) ≥ 1, i =1..n (where yi = 1 or -1, depending on which class of xi) Argmax W,b ρ = 1 w ! " ## $ % && = Argmin W,b w( )
  • 12. Background on SVM (cont.) §  To avoid square roots, can do the following transformation §  Thus, the problem is solving a quadratic function minimization subject to linear constraints (well studied) 6/6/13 12 Subject to yi (wTxi + b) ≥ 1, i =1..n                                 Subject to yi (wTxi + b) ≥ 1, i =1..n  ) 2 1 ( 2 , min wArg bW W,b Argmin( w )
  • 13. Background on SVM (cont.) §  What happens is the data is not linearly separable? (i.e., there is no hyper-plane that will split the data exactly)
  • 14. Background on SVM (cont.) 6/6/13 14 Subject to yi (wTxi + b) ≥ 1, i=1 .. n ) 2 1 ( 2 , min wArg bW •  Slack variables ξi is added to the constraints. •  ξi is the distance from xi to its class boundary.
  • 15. Background on SVM (cont.) 6/6/13 CORTES, Corinna, and Vladimir VAPNIK, 1995. Support-vector networks. Machine Learning, 20(3), 273–297 15 Subject to yi (wTxi + b) ≥ 1, i=1 .. n ) 2 1 ( 2 , min wArg bW ⇓ Subject to yi (wTxi + b) ≥ 1 – ξi,, ξi ≥ 0, i=1 .. n W,b Argmin( 1 2 w 2 +C ξi i=1 n ∑ ) (add slack) •  Slack variables ξi is added to the constraints. •  ξi is the distance from xi to its class boundary. •  C is the regularization parameter which controls the bias- variance trade-off (significance of outliers)
  • 16. Background on SVM (cont.) 6/6/13 CORTES, Corinna, and Vladimir VAPNIK, 1995. Support-vector networks. Machine Learning, 20(3), 273–297 16 Subject to yi (wTxi + b) ≥ 1 – ξi,, ξi ≥ 0, i=1 .. n W,b Argmin( 1 2 w 2 +C ξi i=1 n ∑ ) Question: how to get rid of the constraints?
  • 17. Background on SVM (cont.) 6/6/13 17 Subject to yi (wTxi + b) ≥ 1 – ξi,, ξi ≥ 0, i=1 .. n W,b Argmin( 1 2 w 2 +C ξi i=1 n ∑ ) Answer: Fenchel Duality and Representer Theorems! W,b Argmin λ 2 w 2 + max 0,1− wT xi − b( ) Hinge Loss   i=1 n ∑ # $ % %% & ' ( (( We’ve removed the constraint! SVM minimizes the “L2 Regularized Hinge”
  • 18. Background on SVM (cont.) §  What happens to the multi-class situations? There are different ways to handle multi-classification: •  One vs. all •  One vs. one •  Cost-sensitive Hinge (Crammer and Singer 2001)
  • 19. Cost sensitive formulation of hinge loss (Crammer and Singer 2001) Where This loss function is called “cost-sensitive hinge.” And the prediction function is: Background on SVM (cont.) 6/6/13 Crammer, K & Singer. Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. JMLR, 2, 262-292. 19 W,b Argmin λ 2 w 2 + max 0,1+ f r (xi )− f t (xi )( ) multi-class Hinge   i=1 n ∑ # $ % %% & ' ( (( f r (xi ) = Argmax(wi x + bi ),i ∈ Y,i ≠ t f t (xi ) = wt x + bt f (x) = Argmax(wi x + bi ),i ∈ Y
  • 20. SVM: Implementation We now have our function that we need to optimize. But how do we parallelize this for map-reduce framework? 6/6/13 20
  • 21. SVM: Implementation We now have our function that we need to optimize. But how do we parallelize this for map-reduce framework? 6/6/13 21 Parallelized  Stochas1c  Gradient  Descent   By  Mar'n  Zinkevich,  Markus  Weimer,  Alexander  J.   Smola,  Lihong  Li     NIPS  2010  
  • 22. Parallelized Stochastic Gradient Descent - Theory 6/6/13 22
  • 23. Parallelized Stochastic Gradient Descent - Theory §  Conditions: •  SVM loss function has bounded gradient •  The solver is stochastic §  Result: •  You can break the original sample into randomly distributed subsamples and solve on each subsample. •  The convex combination of each sub-solution will be the same as the solution for the original sample 6/6/13 23
  • 24. Optimization §  Conditions: •  SVM loss function has bounded gradient •  The solver is stochastic §  Loss: Cost sensitive hinge •  Crammer, K & Singer. Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. JMLR, 2, 262-292. §  Solver: Pegasos •  Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: primal estimated sub-gradient solver for svm. ICML, 807-814. §  Use mapper to random distribute the samples, and use reducer to iterate on the sub-sample. 6/6/13 24
  • 25. SVM: Non-tolerable data But what about non-tolerable data? 6/6/13 25
  • 26. SVM: Non-tolerable data But what about non-tolerable data? 6/6/13 26 Idea: Transform the pattern space to a higher dimensional space, called feature space, which is linearly separable
  • 27. SVM: Non-tolerable data But what about non-tolerable data? 6/6/13 27 Idea: Transform the pattern space to a higher dimensional space, called feature space, which is linearly separable
  • 28. SVM: Non-tolerable data But what about non-tolerable data? 6/6/13 28 Idea: Transform the pattern space to a higher dimensional space, called feature space, which is linearly separable
  • 29. SVM: Kernels §  Two questions: •  What kind of function is a kernel? •  What kernel is appropriate for a specific problem? §  The answers: •  Mercer’s Theorem: Every semi-positive definite symmetric function is a kernel •  Depends on the problem. 6/6/13 http://www.ism.ac.jp/~fukumizu/ H20_kernel/Kernel_7_theory.pdf 29
  • 30. SVM: Kernels §  Examples of popular kernel functions: •  Gaussian kernel: •  Laplacian kernel: •  Polynomial kernel: 6/6/13 30 2 2 2 ),( σ ji exxK ji xx − − = θ θ |||| sin |||| ),( ji ji ji xxK xx xx − − = ( )d j T iji bxaxxxK +=),(
  • 31. SVM: Kernels §  Kernel (dual) feature space is defined by the inner products between each §  Kernel matrix is N × N, where N is the number of samples §  As your sample size goes up, kernel matrix gets huge! §  Yet, the problem is lack of ability to match with MapReduce! 6/6/13 31 ⇒ Dual space is not feasible at scale xi and xj
  • 32. SVM: Implementation §  Question: How having a non-linear SVM without paying the price of duality? 6/6/13 32
  • 33. SVM: Implementation §  Question: How having a non-linear SVM without paying the price of duality? §  Claim: For certain kernel functions we can find a function z where 6/6/13 33 W,b Argmin λ 2 w 2 + max 0,1− wT z xi( )− b( ) Hinge Loss   i=1 n ∑ # $ % %% & ' ( (( z
  • 34. SVM: Implementation 6/6/13 34 Random  Features  for  Large-­‐Scale  Kernel    Machines   By  Ali  Rahimi  and  Ben  Recht   NIPS  2007   Can  approximate  shi1-­‐invariant  kernels   Random  Feature  Maps  for  Dot  Product   By  PurushoHam  Kar  and  Karish  Karnick   AISTATS  2012   Can  approximate  dot-­‐product  kernels  
  • 35. Approximating shift-invariant kernel 6/6/13 35 Random  Features  for  Large-­‐Scale  Kernel    Machines         Given a positive definite shift-invariant kernel K x, y( )= f x − y( ), we can create a randomized feature map Z : Rd → RD such that Z x( )#Z y( ) ≈ K x − y( ) Compute the Fourier tranform p of the kernel k: p(ω) = 1 2π e− j "ω δ k δ( )dΔ∫ Draw D iid samples ω1,...,ωD ∈ Rd from p. Draw D iid samples b1,...,bD ∈ R from the uniform distribution on 0,2π[ ]. Z : x → 2 D cos "ω1x + b1( )cos "ωDx + bD( )#$ %& "    
  • 37. Approximating dot-product kernel 6/6/13 37 Random  Feature  Maps  for  Dot  Product  Kernels         Obtain the Maclaurin expansion of f (x) = an xn n=0 ∞ ∑ by setting an = f n( ) 0( ) n! Fix a value p >1. For i =1 to D : Choose a non-negative integer with P N = n[ ]= 1 pn+1 Choose N vectors ω1,...,ωn ∈ −1,1{ } d selecting each coordinate using fair coin tosses. Let feature map Zi : x → aN pN+1 ωj T j=1 N ∏ x Z : x → 1 D Z1 x( ),..., ZD x( )( ) Given a positive definite dot product kernel K x, y( )= f x, y( ), we can create a randomized feature map Z : Rd → RD such that Z x( ),Z y( ) ≈ K x, y( )
  • 38. SVM: Implementation Summary Using these approximations, we can now treat this as a linear SVM problem. (1)  Job 1 – compute stats for feature and class (mean, variance, class cardinality, etc.) (2)  Job 2- Transform sample by the approximate kernel and compute stats for new feature space. (3)  Job 3 – randomly distribute the new samples and train the model in the reducer. 6/6/13 38 We can use map-reduce to solve non-linear multi-classification SVM!
  • 39. SVM: Implementation examples §  SVM used by large entertainment company for customer segmentation •  Web logs containing browsing information mined for customer attributes like gender and age •  Raw Omniture logs stored in Hadoop •  Models built on ~10 billion rows and 1 million features •  Models used to improve inventory value of company’s web properties for publishers