Tuning 2.0: Advanced Optimization Techniques Webinar

SigOpt. Conﬁdential.
Tuning 2.0
Advanced Optimization Techniques
Scott Clark, PhD — Founder and CEO
Tuesday, September 10, 2019

Accelerate and amplify the
impact of modelers everywhere

Diﬀerentiated
Models
Tailored Models
10,000x
Analytics 2.0 Models
100x
1x
Modelers by Segment Value per Model
Enterprise AI
Goals:
Diﬀerentiate Products
Generate Revenue
Requirements:
Modelers with Expertise
Best-in-Class Solutions

Your firewall
Training
Data
AI, ML, DL,
Simulation Model
Model Evaluation
or Backtest
Testing
Data
New
Configurations
Objective
Metric
Better
Results
EXPERIMENT INSIGHTS
Track, organize, analyze and
reproduce any model
ENTERPRISE PLATFORM
Built to fit any stack and scale
with your needs
OPTIMIZATION ENGINE
Explore and exploit with a
variety of techniques
RESTAPI
Configuration
Parameters or
Hyperparameters
Your data
and models
stay private
Iterative, automated optimization
Integrates
with any
modeling
stack

$300B+
in assets under management
Current SigOpt algorithmic
trading customers represent
$500B+
in market capitalization
Current SigOpt enterprise customers
across six industries represent

Data
Engineering
Feature
Engineering
Metric
Deﬁnition
Model
Search
Model
Training
Model
Tuning
Model
Evaluation
Model
Deployment

Data
Engineering
Feature
Engineering
Metric
Deﬁnition
Model
Search
Model
Training
Model
Tuning
Model
Evaluation
Model
Deployment
Hyperparameter Optimization
(including long training cycles)

Data
Engineering
Feature
Engineering
Metric
Deﬁnition
Model
Search
Model
Training
Model
Tuning
Model
Evaluation
Model
Deployment
Early Stopping
Convergence Monitoring

Data
Engineering
Feature
Engineering
Metric
Deﬁnition
Model
Search
Model
Training
Model
Tuning
Model
Evaluation
Model
Deployment
HPO with
Conditional
Parameters

Data
Engineering
Feature
Engineering
Metric
Deﬁnition
Model
Search
Model
Training
Model
Tuning
Model
Evaluation
Model
Deployment
Tuning Transformations

Data
Engineering
Feature
Engineering
Metric
Deﬁnition
Model
Search
Model
Training
Model
Tuning
Model
Evaluation
Model
Deployment
Multimetric
HPO

Data
Engineering
Feature
Engineering
Metric
Deﬁnition
Model
Search
Model
Training
Model
Tuning
Model
Evaluation
Model
Deployment
Re-tuning

Data
Engineering
Feature
Engineering
Metric
Deﬁnition
Model
Search
Model
Training
Model
Tuning
Model
Evaluation
Model
Deployment
Tuning
Transformations
Balancing
Metrics
Tuning
Architecture
Early
Stopping
HPO (long
training cycles)
Re-tuning
Opportunity
Iteratively tune at all stages of the modeling lifecycle

Beneﬁts
Learn fast, fail fast
Give yourself the best chance at finding good use
cases while avoiding false negatives
Connect outputs to outcomes
Define, select and iterate on your metrics
with end-to-end evaluation
Find the global maximum
Early non-optimized decisions in the process limit
your ability to maximize performance
Boost productivity
Automate modeling tasks so modelers spend
more time applying their expertise

Data
Engineering
Feature
Engineering
Metric
Deﬁnition
Model
Search
Model
Training
Model
Tuning
Model
Evaluation
Model
Deployment
Tuning
Transformations
Balancing
Metrics
Tuning
Architecture
Early
Stopping
HPO (long
training cycles)
Re-tuning
Focus for today
Metric Deﬁnition, Model Search, Long Training Cycles

Techniques
1. Metric deﬁnition: multimetric optimization
2. Model search: conditional parameters
3. Long training cycles: multitask optimization

Metric deﬁnition with
multimetric optimization
1

How it works: Multimetric optimization (with thresholds)
● Define two metrics instead of one
● Optimize against both metrics
automatically and simultaneously
● Set thresholds on each individual metric to
reflect business or modeling needs
● Compare a Pareto frontier of best model
configurations that balance these two
metrics
● Relevant docs
● Relevant blog post

Potential applications of multimetric optimization
Balance Competing Objectives Deﬁne and Select Metrics Connect Metrics to Outcomes
https://sigopt.com/blog/intro-to-multicriteria
-optimization/
https://sigopt.com/blog/multimetric-updates-
in-the-experiment-insights-dashboard/
https://sigopt.com/blog/metric-thresholds-a
-new-feature-to-supercharge-multimetric-
optimization/

Use Case: Balancing Speed & Accuracy in Deep Learning
Multimetric Use Case 1
● Category: Time Series
● Task: Sequence Classification
● Model: CNN
● Data: Diatom Images
● Analysis: Accuracy-Time Tradeoff
● Result: Similar accuracy, 33% the inference time
Multimetric Use Case 2
● Category: NLP
● Task: Sentiment Analysis
● Model: CNN
● Data: Rotten Tomatoes
● Analysis: Accuracy-Time Tradeoff
● Result: ~2% in accuracy versus 50% of training time
Learn more
https://devblogs.nvidia.com/sigopt-deep-learning-
hyperparameter-optimization/

Experiment Design for Sequence Classification
Data
● Diatom Images
● Source: UCR Time Series Classification
Model
● Convolutional Neural Network
● Source: Wang et al. (paper, code)
● Tensorflow via Keras
Metrics
● Inference Time
● Accuracy
HPO Methods (Implemented via SigOpt)
● Random Search
● Bayesian Optimization
Note: Experiment code available here

Process: Tune variety of parameters, maximize metrics
Network
Architecture
Stochastic Gradient
Descent

Result: Bayesian outperforms random search
● Both methods were executed
via the SigOpt API
● Bayesian optimization required
90% fewer training runs than
random search
● Bayesian optimization found
85.7% of the combined Pareto
frontier of optimal model
conﬁgurations—almost 6x as
many choices
10x random

Result: Minimal accuracy loss for 66% inference gain
Maximize accuracy
Minimize inference
time
Balance Both

Model search with conditional parameters2

How it works: Conditional parameters
Take into account the conditionality of
certain parameter types in the
optimization process
● Establish conditionality between
various parameters
● Use this conditionality to improve
the Bayesian optimization process
● Boost results from the hyper-
parameter optimization process
● Example: Architecture parameters
for deep learning models
● Example: Parameter types for SGD
variants (to the right)
● Relevant docs

Use Case: Eﬀective and Eﬃcient NLP Optimization
Use Case
● Category: NLP
● Task: Question Answering
● Model: MemN2N
● Data: bAbI
● Analysis: Performance benchmark
● Result: 4.84% gain, 30% the cost
Learn more
https://devblogs.nvidia.com/optimizing-end-to-end-memory-
networks-using-sigopt-gpus/

Design: Question answering data and memory networks
Data Model
Sources:
Facebook AI Research (FAIR) bAbI dataset: https://research.fb.com/downloads/babi/
Sukhbaatar et al.: https://arxiv.org/abs/1503.08895

Hyperparameter Optimization Experiment Setup
Comparison of Bayesian Optimization and Random Search
Standard Parameters Conditional Parameters

Result: Signiﬁcant boost in consistency, accuracy
Comparison across random search versus Bayesian optimization with conditionals

Result: Highly cost eﬃcient accuracy gains
Comparison across random search versus Bayesian optimization with conditionals
SigOpt is
18.5x as
efficient

Long training cycles with
multitask optimization in parallel
3

SigOpt. Conﬁdential.33
How it works: Multitask Optimization
Partial Full
● Introduce a variety of cheap
and expensive tasks in a
hyperparameter optimization
experiment
● Use cheaper tasks earlier
(explore) in the tuning process
to inform more expensive tasks
later (exploit)
● In the process, reduce the full
time required to tune an
expensive model
● Relevant docs
Sources:
Matthias Poloczek, Jialei Wang, Peter I. Frazier: https://arxiv.org/abs/1603.00389
Aaron Klein, Frank Hutter, et al.: https://arxiv.org/abs/1605.07079

How it works: Combine multitask with parallelization
Your firewall
New
Configurations
Objective
Metric
Better
Results
EXPERIMENT INSIGHTS
Track, organize, analyze and
reproduce any model
ENTERPRISE PLATFORM
Built to fit any stack and scale
with your needs
OPTIMIZATION ENGINE
Explore and exploit with a
variety of techniques
RESTAPI
Configuration
Parameters or
Hyperparameters
WorkerWorker
Worker Worker

Use Case: Image Classification on a Budget
Use Case
● Category: Computer Vision
● Task: Image Classification
● Model: CNN
● Data: Stanford Cars
● Analysis: Architecture Comparison
● Result: 2.4% accuracy gain for much cheaper model
Learn more
https://mlconf.com/blog/insights-for-building-high-performing-
image-classification-models/

Data: Cars image classiﬁcation
36
Stanford CARS Dataset 16,185 images, 196 classes Labels: Car, Make, Year
Source:
Stanford CARS Dataset: https://ai.stanford.edu/~jkrause/cars/car_dataset.html

Architecture: Classifying images of cars using ResNet
37
Convolutions Classiﬁcation
ResNet
Input
Acura
TLX
2015
Output
Label
Sources:
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun: https://arxiv.org/abs/1512.03385

Architecture
Comparison Training
Setup
Comparison
Experiment design scenarios
38
Baseline SigOpt Multitask
ResNet 50
Scenario 1a
Pre-Train on Imagenet
Tune Fully Connected Layer
Scenario 1b
Optimize Hyperparameters to
Tune the Fully Connected Layer
ResNet 18
Scenario 2a
Fine Tune Full Network
Scenario 2b
Optimize Hyperparameters to
Fine Tune the Full Network

Training setup comparison
ImageNet Pretrained
Convolutional Layers
Fully Connected Layer
ImageNet Pretrained
Convolutional Layers
Fully Connected Layer
Input
Convolutional Features
Classification
Input
Convolutional Features
Classification
Fine Tuning Feature Extractor
Tuning
Tuning

Hyperparameter setup
40
Hyperparameter Lower Bound Upper Bound
Log Learning Rate 1.2e-4 1.0
Learning Rate Scheduler 0 0.99
Batch Size (powers of 2) 16 256
Nesterov False True
Log Weight Decay 1.2e-5 1.0
Momentum 0 0.9
Scheduler Step 1 20

Fine-tuning the smaller
network signiﬁcantly
outperforms feature
extraction on a bigger
network
Results: Optimizing and tuning the full network outperforms
41
Multitask optimization
drives signiﬁcant
performance gains
+3.92%
+1.58%

SigOpt. Conﬁdential.42
Insight: Multitask eﬃciency at the hyperparameter level
Example: Learning rate accuracy and values by cost of task over time
Progression of observations over time Accuracy and value for each observation Parameter importance analysis

Insight: Parallelization further accelerates wall-clock time
43
928 total hours to optimize ResNet 18
220 observations per experiment
20 p2.xlarge AWS ec2 instances
45 hour actual wall-clock time

Implication: Fine-tuning significantly outperforms
Cost Breakdown for Multitask Optimization
Cost efficiency Feature Extractor ResNet 50 Fine-Tuning ResNet 18
Hours per training 4.08 4.2
Observations 220 220
Number of Runs 1 1
Total compute hours 898 924
Cost per GPU-hour $0.90 $0.90
% Improvement 1.58% 3.92%
Total compute cost $808 $832
cost ($) per %
improvement $509 $20
Similar Compute Cost
Fine-Tuning Significantly
More Efficient and Effective
Similar Wall-Clock Time

Implication: Multiple beneﬁts from multitask
45
Tuning ResNet-18
Cost eﬃciency Multitask Bayesian Random
Hours per training 4.2 4.2 4.2
Observations 220 646 646
Number of Runs 1 1 20
Total compute hours 924 2,713 54,264
Cost per GPU-hour $0.90 $0.90 $0.90
Total compute cost $832 $2,442 $48,838
Time to optimize Multitask Bayesian Random
Total compute hours 924 2,713 54,264
# of Machines 20 20 20
Wall-clock time (hrs) 46 136 2,713
1.7% the cost of
random search to
achieve similar
performance
58x faster
wall-clock time
to optimize with
multitask versus
random search

Techniques
1. Metric deﬁnition: multimetric optimization
Read the blog here.
2. Model search: conditional parameters
Read the blog here.
3. Long training cycles: multitask optimization
Read the blog here.

Try our solution
Sign up at
sigopt.com/try-it
today.
Register with code: 1SFSPON25
https://sanfrancisco.theaisummit.com
September 25-26, 2019
Register with code: SIGOPT20
https://twimlcon.com/
October 1-2, 2019
Download eBook
https://twimlai.com/announcing-our-
ai-platforms-series-and-ebooks/

Tuning 2.0: Advanced Optimization Techniques Webinar

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Tuning 2.0: Advanced Optimization Techniques Webinar

Similar a Tuning 2.0: Advanced Optimization Techniques Webinar (20)

Más de SigOpt

Más de SigOpt (18)

Último

Último (20)

Tuning 2.0: Advanced Optimization Techniques Webinar