SlideShare una empresa de Scribd logo
1 de 51
Descargar para leer sin conexión
Consensus Optimization and Machine Learning
Stephen Boyd and Steven Diamond
EE & CS Departments
Stanford University
H2O World, 11/10/2015
1
Outline
Convex optimization
Model fitting via convex optimization
Consensus optimization and model fitting
2
Outline
Convex optimization
Model fitting via convex optimization
Consensus optimization and model fitting
Convex optimization 3
Convex optimization problem
convex optimization problem:
minimize f0(x)
subject to fi (x) ≤ 0, i = 1, . . . , m
Ax = b
variable x ∈ Rn
equality constraints are linear
f0, . . . , fm are convex: for θ ∈ [0, 1],
fi (θx + (1 − θ)y) ≤ θfi (x) + (1 − θ)fi (y)
i.e., fi have nonnegative (upward) curvature
Convex optimization 4
Why convex optimization?
Convex optimization 5
Why convex optimization?
we can solve convex optimization problems effectively
Convex optimization 5
Why convex optimization?
we can solve convex optimization problems effectively
there are lots of applications
Convex optimization 5
Application areas
machine learning, statistics
finance
supply chain, revenue management, advertising
control
signal and image processing, vision
networking
circuit design
and many others . . .
Convex optimization 6
Convex optimization solvers
medium scale (1000s–10000s variables, constraints)
interior-point methods on single machine
large-scale (100k – 1B variables, constraints)
custom (often problem specific) methods, e.g., SGD
lots of on-going research
growing list of open source solvers
Convex optimization 7
Convex optimization modeling languages
(new) high level language support for convex optimization
describe problem in high level language
problem compiled to standard form and solved
implementations:
YALMIP, CVX (Matlab)
CVXPY (Python)
Convex.jl (Julia)
Convex optimization 8
CVXPY
(Diamond & Boyd, 2013)
minimize Ax − b 2
2 + γ x 1
subject to x ∞ ≤ 1
from cvxpy import *
x = Variable(n)
cost = sum_squares(A*x-b) + gamma*norm(x,1)
prob = Problem(Minimize(cost),
[norm(x,"inf") <= 1])
opt_val = prob.solve()
solution = x.value
Convex optimization 9
Example: Image in-painting
guess pixel values in obscured/corrupted parts of image
total variation in-painting: choose pixel values xij ∈ R3
to
minimize total variation
TV(x) =
ij
xi+1,j − xij
xi,j+1 − xij 2
a convex problem
Convex optimization 10
Example
512 × 512 color image (n ≈ 800000 variables)
Original Corrupted
Convex optimization 11
Example
Original Recovered
Convex optimization 12
Example
80% of pixels removed
Original Corrupted
Convex optimization 13
Example
80% of pixels removed
Original Recovered
Convex optimization 14
Outline
Convex optimization
Model fitting via convex optimization
Consensus optimization and model fitting
Model fitting via convex optimization 15
Predictor
given data (xi , yi ), i = 1, . . . , m
x is feature vector, y is outcome or label
find predictor ψ so that
y ≈ ˆy = ψ(x) for data (x, y) that you haven’t seen
ψ is a regression model for y ∈ R
ψ is a classifier for y ∈ {−1, 1}
Model fitting via convex optimization 16
Loss minimization predictor
predictor parametrized by θ ∈ Rn
loss function L(xi , yi , θ) gives miss-fit for data point (xi , yi )
for given θ, predictor is
ψ(x) = argmin
y
L(x, y, θ)
how do we choose parameter θ?
Model fitting via convex optimization 17
Model fitting via regularized loss minimization
choose θ by minimizing regularized loss
1
m
m
i=1
L(xi , yi , θ) + λr(θ)
regularization r(θ) penalizes model complexity, enforces
constraints, or represents prior
λ > 0 scales regularization
Model fitting via convex optimization 18
Model fitting via regularized loss minimization
choose θ by minimizing regularized loss
1
m
m
i=1
L(xi , yi , θ) + λr(θ)
regularization r(θ) penalizes model complexity, enforces
constraints, or represents prior
λ > 0 scales regularization
for many useful cases, this is a convex problem
Model fitting via convex optimization 18
Examples
predictor L(x, y, θ) ψ(x) r(θ)
least-squares (θT x − y)2 θT x 0
ridge regression (θT x − y)2 θT x θ 2
2
lasso (θT x − y)2 θT x θ 1
logistic classifier log(1 + exp(−yθT x)) sign(θT x) 0
SVM (1 − yθT x)+ sign(θT x) θ 2
2
can mix and match, e.g., r(θ) = θ 1 sparsifies
all lead to convex fitting problems
Model fitting via convex optimization 19
Robust (Huber) regression
loss L(x, y, θ) = φhub(θT x − y)
φhub is Huber function (with threshold M > 0):
φhub
(u) =
u2 |u| ≤ M
2Mu − M2 |u| > M
same as least-squares for small residuals, but allows (some)
large residuals
and so, robust to outliers
Model fitting via convex optimization 20
Example
m = 450 measurements, n = 300 regressors
choose θtrue; xi ∼ N(0, I)
set yi = (θtrue)T xi + i , i ∼ N(0, 1)
with probability p, replace yi with −yi
data has fraction p of (non-obvious) wrong measurements
distribution of ‘good’ and ‘bad’ yi are the same
try to recover θtrue ∈ Rn
from measurements y ∈ Rm
‘prescient’ version: we know which measurements are wrong
Model fitting via convex optimization 21
Example
50 problem instances, p varying from 0 to 0.15
Model fitting via convex optimization 22
Example
Model fitting via convex optimization 23
Quantile regression
quantile regression: use tilted 1 loss
L(x, y, θ) = τ(r)+ + (1 − τ)(r)−
with r = θT x − y, τ ∈ (0, 1)
τ = 0.5: equal penalty for over- and under-estimating
τ = 0.1: 9× more penalty for under-estimating
τ = 0.9: 9× more penalty for over-estimating
τ-quantile of residuals is zero
Model fitting via convex optimization 24
Example
time series xt, t = 0, 1, 2, . . .
auto-regressive predictor:
ˆxt+1 = θT
(1, xt, . . . , xt−M)
M = 10 is memory of predictor
use quantile regression for τ = 0.1, 0.5, 0.9
at each time t, gives three one-step-ahead predictions:
ˆx0.1
t+1, ˆx0.5
t+1, ˆx0.9
t+1
Model fitting via convex optimization 25
Example
time series xt
Model fitting via convex optimization 26
Example
xt and predictions ˆx0.1
t+1, ˆx0.5
t+1, ˆx0.9
t+1 (training set, t = 0, . . . , 399)
Model fitting via convex optimization 27
Example
xt and predictions ˆx0.1
t+1, ˆx0.5
t+1, ˆx0.9
t+1 (test set, t = 400, . . . , 449)
Model fitting via convex optimization 28
Example
residual distributions for τ = 0.9, 0.5, and 0.1 (training set)
Model fitting via convex optimization 29
Example
residual distributions for τ = 0.9, 0.5, and 0.1 (test set)
Model fitting via convex optimization 30
Outline
Convex optimization
Model fitting via convex optimization
Consensus optimization and model fitting
Consensus optimization and model fitting 31
Consensus optimization
want to solve problem with N objective terms
minimize N
i=1 fi (x)
e.g., fi is the loss function for ith block of training data
consensus form:
minimize N
i=1 fi (xi )
subject to xi − z = 0
xi are local variables
z is the global variable
xi − z = 0 are consistency or consensus constraints
Consensus optimization and model fitting 32
Consensus optimization via ADMM
with xk = (1/N) N
i=1 xk
i (average over local variables)
xk+1
i := argmin
xi
fi (xi ) + (ρ/2) xi − xk
+ uk
i
2
2
uk+1
i := uk
i + (xk+1
i − xk+1
)
get global minimum, under very general conditions
uk is running sum of inconsistencies (PI control)
minimizations carried out independently and in parallel
coordination is via averaging of local variables xi
Consensus optimization and model fitting 33
Consensus model fitting
variable is θ, parameter in predictor
fi (θi ) is loss + (share of) regularizer for ith data block
θk+1
i minimizes local loss + additional quadratic term
local parameters converge to consensus, same as if whole
data set were handled together
privacy preserving: agents don’t reveal data to each other
Consensus optimization and model fitting 34
Example
SVM:
hinge loss l(u) = (1 − u)+
sum square regularization r(θ) = θ2
2
baby problem with n = 2, m = 400 to illustrate
examples split into N = 20 groups, in worst possible way:
each group contains only positive or negative examples
Consensus optimization and model fitting 35
Iteration 1
−3 −2 −1 0 1 2 3
−10
−8
−6
−4
−2
0
2
4
6
8
10
Consensus optimization and model fitting 36
Iteration 5
−3 −2 −1 0 1 2 3
−10
−8
−6
−4
−2
0
2
4
6
8
10
Consensus optimization and model fitting 37
Iteration 40
−3 −2 −1 0 1 2 3
−10
−8
−6
−4
−2
0
2
4
6
8
10
Consensus optimization and model fitting 38
CVXPY implementation
(Steven Diamond)
N = 105 samples, n = 103 (dense) features
hinge (SVM) loss with 1 regularization
data split into 100 chunks
100 processes on 32 cores
26 sec per ADMM iteration
100 iterations for objective to converge
10 iterations (5 minutes) to get good model
Consensus optimization and model fitting 39
CVXPY implementation
Consensus optimization and model fitting 40
H2O implementation
(Tomas Nykodym)
click-through data derived from a kaggle data set
20000 features, 20M examples
logistic loss, elastic net regularization
examples divided into 100 chunks (of different sizes)
run on 100 H2O instances
5 iterations to get good global model
Consensus optimization and model fitting 41
H2O implementation
ROC, iteration 1
Consensus optimization and model fitting 42
H2O implementation
ROC, iteration 2
Consensus optimization and model fitting 43
H2O implementation
ROC, iteration 3
Consensus optimization and model fitting 44
H2O implementation
ROC, iteration 5
Consensus optimization and model fitting 45
H2O implementation
ROC, iteration 10
Consensus optimization and model fitting 46
Summary
ADMM consensus
can do machine learning across distributed data sources
the data never moves
get same model as if you had collected all data in one place
Consensus optimization and model fitting 47
Resources
many researchers have worked on the topics covered
Convex Optimization
Distributed Optimization and Statistical Learning via the
Alternating Direction Method of Multipliers
EE364a (course slides, videos, code, homework, . . . )
software CVX, CVXPY, Convex.jl
all available online
Consensus optimization and model fitting 48

Más contenido relacionado

La actualidad más candente

Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Frank Nielsen
 

La actualidad más candente (20)

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
 
Patch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesPatch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective Divergences
 
Additive model and boosting tree
Additive model and boosting treeAdditive model and boosting tree
Additive model and boosting tree
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Tensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationTensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantification
 
Maximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer LatticeMaximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer Lattice
 
A nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaA nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formula
 
Approximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelApproximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts model
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
 
QMC: Operator Splitting Workshop, A New (More Intuitive?) Interpretation of I...
QMC: Operator Splitting Workshop, A New (More Intuitive?) Interpretation of I...QMC: Operator Splitting Workshop, A New (More Intuitive?) Interpretation of I...
QMC: Operator Splitting Workshop, A New (More Intuitive?) Interpretation of I...
 
Lecture note4coordinatedescent
Lecture note4coordinatedescentLecture note4coordinatedescent
Lecture note4coordinatedescent
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Brief Introduction About Topological Interference Management (TIM)
Brief Introduction About Topological Interference Management (TIM)Brief Introduction About Topological Interference Management (TIM)
Brief Introduction About Topological Interference Management (TIM)
 
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012
 

Destacado

Destacado (9)

H2O World - H2O Deep Learning with Arno Candel
H2O World - H2O Deep Learning with Arno CandelH2O World - H2O Deep Learning with Arno Candel
H2O World - H2O Deep Learning with Arno Candel
 
Optimization: A Framework for Predictive Analytics
Optimization: A Framework for Predictive AnalyticsOptimization: A Framework for Predictive Analytics
Optimization: A Framework for Predictive Analytics
 
2013 05 ny
2013 05 ny2013 05 ny
2013 05 ny
 
Growth pl
Growth plGrowth pl
Growth pl
 
Metzgar Jason Mobile Presentation
Metzgar Jason Mobile PresentationMetzgar Jason Mobile Presentation
Metzgar Jason Mobile Presentation
 
Cosplay
CosplayCosplay
Cosplay
 
Online Display Advertising Optimization with H2O at ShareThis
Online Display Advertising Optimization with H2O at ShareThisOnline Display Advertising Optimization with H2O at ShareThis
Online Display Advertising Optimization with H2O at ShareThis
 
A Predictive Model Factory Picks Up Steam
A Predictive Model Factory Picks Up SteamA Predictive Model Factory Picks Up Steam
A Predictive Model Factory Picks Up Steam
 
Sparkling Water Applications Meetup 07.21.15
Sparkling Water Applications Meetup 07.21.15Sparkling Water Applications Meetup 07.21.15
Sparkling Water Applications Meetup 07.21.15
 

Similar a H2O World - Consensus Optimization and Machine Learning - Stephen Boyd

4optmizationtechniques-150308051251-conversion-gate01.pdf
4optmizationtechniques-150308051251-conversion-gate01.pdf4optmizationtechniques-150308051251-conversion-gate01.pdf
4optmizationtechniques-150308051251-conversion-gate01.pdf
BechanYadav4
 

Similar a H2O World - Consensus Optimization and Machine Learning - Stephen Boyd (20)

Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
 
Optimization tutorial
Optimization tutorialOptimization tutorial
Optimization tutorial
 
SIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithmsSIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithms
 
Automatic bayesian cubature
Automatic bayesian cubatureAutomatic bayesian cubature
Automatic bayesian cubature
 
Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...
 
Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...
 
Presentation.pdf
Presentation.pdfPresentation.pdf
Presentation.pdf
 
ECE 2103_L6 Boolean Algebra Canonical Forms [Autosaved].pptx
ECE 2103_L6 Boolean Algebra Canonical Forms [Autosaved].pptxECE 2103_L6 Boolean Algebra Canonical Forms [Autosaved].pptx
ECE 2103_L6 Boolean Algebra Canonical Forms [Autosaved].pptx
 
Input analysis
Input analysisInput analysis
Input analysis
 
Automatic Bayesian method for Numerical Integration
Automatic Bayesian method for Numerical Integration Automatic Bayesian method for Numerical Integration
Automatic Bayesian method for Numerical Integration
 
Lecture: Monte Carlo Methods
Lecture: Monte Carlo MethodsLecture: Monte Carlo Methods
Lecture: Monte Carlo Methods
 
Hybrid dynamics in large-scale logistics networks
Hybrid dynamics in large-scale logistics networksHybrid dynamics in large-scale logistics networks
Hybrid dynamics in large-scale logistics networks
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic track
 
DAA - UNIT 4 - Engineering.pptx
DAA - UNIT 4 - Engineering.pptxDAA - UNIT 4 - Engineering.pptx
DAA - UNIT 4 - Engineering.pptx
 
Wcsmo_presentation.pdf
Wcsmo_presentation.pdfWcsmo_presentation.pdf
Wcsmo_presentation.pdf
 
Derivative free optimization
Derivative free optimizationDerivative free optimization
Derivative free optimization
 
ML unit-1.pptx
ML unit-1.pptxML unit-1.pptx
ML unit-1.pptx
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
4optmizationtechniques-150308051251-conversion-gate01.pdf
4optmizationtechniques-150308051251-conversion-gate01.pdf4optmizationtechniques-150308051251-conversion-gate01.pdf
4optmizationtechniques-150308051251-conversion-gate01.pdf
 
Optmization techniques
Optmization techniquesOptmization techniques
Optmization techniques
 

Más de Sri Ambati

Más de Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Último

CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Último (20)

Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 

H2O World - Consensus Optimization and Machine Learning - Stephen Boyd

  • 1. Consensus Optimization and Machine Learning Stephen Boyd and Steven Diamond EE & CS Departments Stanford University H2O World, 11/10/2015 1
  • 2. Outline Convex optimization Model fitting via convex optimization Consensus optimization and model fitting 2
  • 3. Outline Convex optimization Model fitting via convex optimization Consensus optimization and model fitting Convex optimization 3
  • 4. Convex optimization problem convex optimization problem: minimize f0(x) subject to fi (x) ≤ 0, i = 1, . . . , m Ax = b variable x ∈ Rn equality constraints are linear f0, . . . , fm are convex: for θ ∈ [0, 1], fi (θx + (1 − θ)y) ≤ θfi (x) + (1 − θ)fi (y) i.e., fi have nonnegative (upward) curvature Convex optimization 4
  • 6. Why convex optimization? we can solve convex optimization problems effectively Convex optimization 5
  • 7. Why convex optimization? we can solve convex optimization problems effectively there are lots of applications Convex optimization 5
  • 8. Application areas machine learning, statistics finance supply chain, revenue management, advertising control signal and image processing, vision networking circuit design and many others . . . Convex optimization 6
  • 9. Convex optimization solvers medium scale (1000s–10000s variables, constraints) interior-point methods on single machine large-scale (100k – 1B variables, constraints) custom (often problem specific) methods, e.g., SGD lots of on-going research growing list of open source solvers Convex optimization 7
  • 10. Convex optimization modeling languages (new) high level language support for convex optimization describe problem in high level language problem compiled to standard form and solved implementations: YALMIP, CVX (Matlab) CVXPY (Python) Convex.jl (Julia) Convex optimization 8
  • 11. CVXPY (Diamond & Boyd, 2013) minimize Ax − b 2 2 + γ x 1 subject to x ∞ ≤ 1 from cvxpy import * x = Variable(n) cost = sum_squares(A*x-b) + gamma*norm(x,1) prob = Problem(Minimize(cost), [norm(x,"inf") <= 1]) opt_val = prob.solve() solution = x.value Convex optimization 9
  • 12. Example: Image in-painting guess pixel values in obscured/corrupted parts of image total variation in-painting: choose pixel values xij ∈ R3 to minimize total variation TV(x) = ij xi+1,j − xij xi,j+1 − xij 2 a convex problem Convex optimization 10
  • 13. Example 512 × 512 color image (n ≈ 800000 variables) Original Corrupted Convex optimization 11
  • 15. Example 80% of pixels removed Original Corrupted Convex optimization 13
  • 16. Example 80% of pixels removed Original Recovered Convex optimization 14
  • 17. Outline Convex optimization Model fitting via convex optimization Consensus optimization and model fitting Model fitting via convex optimization 15
  • 18. Predictor given data (xi , yi ), i = 1, . . . , m x is feature vector, y is outcome or label find predictor ψ so that y ≈ ˆy = ψ(x) for data (x, y) that you haven’t seen ψ is a regression model for y ∈ R ψ is a classifier for y ∈ {−1, 1} Model fitting via convex optimization 16
  • 19. Loss minimization predictor predictor parametrized by θ ∈ Rn loss function L(xi , yi , θ) gives miss-fit for data point (xi , yi ) for given θ, predictor is ψ(x) = argmin y L(x, y, θ) how do we choose parameter θ? Model fitting via convex optimization 17
  • 20. Model fitting via regularized loss minimization choose θ by minimizing regularized loss 1 m m i=1 L(xi , yi , θ) + λr(θ) regularization r(θ) penalizes model complexity, enforces constraints, or represents prior λ > 0 scales regularization Model fitting via convex optimization 18
  • 21. Model fitting via regularized loss minimization choose θ by minimizing regularized loss 1 m m i=1 L(xi , yi , θ) + λr(θ) regularization r(θ) penalizes model complexity, enforces constraints, or represents prior λ > 0 scales regularization for many useful cases, this is a convex problem Model fitting via convex optimization 18
  • 22. Examples predictor L(x, y, θ) ψ(x) r(θ) least-squares (θT x − y)2 θT x 0 ridge regression (θT x − y)2 θT x θ 2 2 lasso (θT x − y)2 θT x θ 1 logistic classifier log(1 + exp(−yθT x)) sign(θT x) 0 SVM (1 − yθT x)+ sign(θT x) θ 2 2 can mix and match, e.g., r(θ) = θ 1 sparsifies all lead to convex fitting problems Model fitting via convex optimization 19
  • 23. Robust (Huber) regression loss L(x, y, θ) = φhub(θT x − y) φhub is Huber function (with threshold M > 0): φhub (u) = u2 |u| ≤ M 2Mu − M2 |u| > M same as least-squares for small residuals, but allows (some) large residuals and so, robust to outliers Model fitting via convex optimization 20
  • 24. Example m = 450 measurements, n = 300 regressors choose θtrue; xi ∼ N(0, I) set yi = (θtrue)T xi + i , i ∼ N(0, 1) with probability p, replace yi with −yi data has fraction p of (non-obvious) wrong measurements distribution of ‘good’ and ‘bad’ yi are the same try to recover θtrue ∈ Rn from measurements y ∈ Rm ‘prescient’ version: we know which measurements are wrong Model fitting via convex optimization 21
  • 25. Example 50 problem instances, p varying from 0 to 0.15 Model fitting via convex optimization 22
  • 26. Example Model fitting via convex optimization 23
  • 27. Quantile regression quantile regression: use tilted 1 loss L(x, y, θ) = τ(r)+ + (1 − τ)(r)− with r = θT x − y, τ ∈ (0, 1) τ = 0.5: equal penalty for over- and under-estimating τ = 0.1: 9× more penalty for under-estimating τ = 0.9: 9× more penalty for over-estimating τ-quantile of residuals is zero Model fitting via convex optimization 24
  • 28. Example time series xt, t = 0, 1, 2, . . . auto-regressive predictor: ˆxt+1 = θT (1, xt, . . . , xt−M) M = 10 is memory of predictor use quantile regression for τ = 0.1, 0.5, 0.9 at each time t, gives three one-step-ahead predictions: ˆx0.1 t+1, ˆx0.5 t+1, ˆx0.9 t+1 Model fitting via convex optimization 25
  • 29. Example time series xt Model fitting via convex optimization 26
  • 30. Example xt and predictions ˆx0.1 t+1, ˆx0.5 t+1, ˆx0.9 t+1 (training set, t = 0, . . . , 399) Model fitting via convex optimization 27
  • 31. Example xt and predictions ˆx0.1 t+1, ˆx0.5 t+1, ˆx0.9 t+1 (test set, t = 400, . . . , 449) Model fitting via convex optimization 28
  • 32. Example residual distributions for τ = 0.9, 0.5, and 0.1 (training set) Model fitting via convex optimization 29
  • 33. Example residual distributions for τ = 0.9, 0.5, and 0.1 (test set) Model fitting via convex optimization 30
  • 34. Outline Convex optimization Model fitting via convex optimization Consensus optimization and model fitting Consensus optimization and model fitting 31
  • 35. Consensus optimization want to solve problem with N objective terms minimize N i=1 fi (x) e.g., fi is the loss function for ith block of training data consensus form: minimize N i=1 fi (xi ) subject to xi − z = 0 xi are local variables z is the global variable xi − z = 0 are consistency or consensus constraints Consensus optimization and model fitting 32
  • 36. Consensus optimization via ADMM with xk = (1/N) N i=1 xk i (average over local variables) xk+1 i := argmin xi fi (xi ) + (ρ/2) xi − xk + uk i 2 2 uk+1 i := uk i + (xk+1 i − xk+1 ) get global minimum, under very general conditions uk is running sum of inconsistencies (PI control) minimizations carried out independently and in parallel coordination is via averaging of local variables xi Consensus optimization and model fitting 33
  • 37. Consensus model fitting variable is θ, parameter in predictor fi (θi ) is loss + (share of) regularizer for ith data block θk+1 i minimizes local loss + additional quadratic term local parameters converge to consensus, same as if whole data set were handled together privacy preserving: agents don’t reveal data to each other Consensus optimization and model fitting 34
  • 38. Example SVM: hinge loss l(u) = (1 − u)+ sum square regularization r(θ) = θ2 2 baby problem with n = 2, m = 400 to illustrate examples split into N = 20 groups, in worst possible way: each group contains only positive or negative examples Consensus optimization and model fitting 35
  • 39. Iteration 1 −3 −2 −1 0 1 2 3 −10 −8 −6 −4 −2 0 2 4 6 8 10 Consensus optimization and model fitting 36
  • 40. Iteration 5 −3 −2 −1 0 1 2 3 −10 −8 −6 −4 −2 0 2 4 6 8 10 Consensus optimization and model fitting 37
  • 41. Iteration 40 −3 −2 −1 0 1 2 3 −10 −8 −6 −4 −2 0 2 4 6 8 10 Consensus optimization and model fitting 38
  • 42. CVXPY implementation (Steven Diamond) N = 105 samples, n = 103 (dense) features hinge (SVM) loss with 1 regularization data split into 100 chunks 100 processes on 32 cores 26 sec per ADMM iteration 100 iterations for objective to converge 10 iterations (5 minutes) to get good model Consensus optimization and model fitting 39
  • 44. H2O implementation (Tomas Nykodym) click-through data derived from a kaggle data set 20000 features, 20M examples logistic loss, elastic net regularization examples divided into 100 chunks (of different sizes) run on 100 H2O instances 5 iterations to get good global model Consensus optimization and model fitting 41
  • 45. H2O implementation ROC, iteration 1 Consensus optimization and model fitting 42
  • 46. H2O implementation ROC, iteration 2 Consensus optimization and model fitting 43
  • 47. H2O implementation ROC, iteration 3 Consensus optimization and model fitting 44
  • 48. H2O implementation ROC, iteration 5 Consensus optimization and model fitting 45
  • 49. H2O implementation ROC, iteration 10 Consensus optimization and model fitting 46
  • 50. Summary ADMM consensus can do machine learning across distributed data sources the data never moves get same model as if you had collected all data in one place Consensus optimization and model fitting 47
  • 51. Resources many researchers have worked on the topics covered Convex Optimization Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers EE364a (course slides, videos, code, homework, . . . ) software CVX, CVXPY, Convex.jl all available online Consensus optimization and model fitting 48