This talk explains how to train deep learning and other expensive models with parallelism and multitask optimization to reduce wall clock time. Tobias Andreassen, who supports a number of our systematic trading customers, presented the intuition behind Bayesian optimization for model optimization with a single or multiple (often competing) metrics. Many times it makes sense to analyze a second metric to avoid myopic training runs that overfit on your data, or otherwise don’t represent or impede performance in real-world scenarios.
SQL Database Design For Developers at php[tek] 2024
Tuning for Systematic Trading: Talk 2: Deep Learning
1. SigOpt. Confidential.
Talk #2
Optimize Training and Tuning for Deep
Learning
SigOpt Talk Series
Tuning for Systematic Trading
Tobias Andreasen — Machine Learning Engineer
Tuesday, April 21, 2020
2. SigOpt. Confidential.
Abstract
SigOpt provides an extensive set of advanced features,
which help you, the expert, save time while
increasing model performance via experimentation.
Today, we will continue this talk series by discussing
how to best utilize your infrastructure, reduce
experiment time and accelerate training for deep
learning models.
3. SigOpt. Confidential.
Motivation
1. Overview of SigOpt
2. Recap on bayesian optimization
3. How to continuously and efficiently utilize your
project’s allotted compute infrastructure
4. How to tune models with expensive training costs
6. SigOpt. Confidential.
Experiment Insights Optimization Engine
Track, analyze and reproduce any
model to improve the productivity of
your modeling
Enterprise Platform
Automate hyperparameter tuning to
maximize the performance and impact
of your models
Standardize experimentation across
any combination of library,
infrastructure, model or task
On-Premise Hybrid/Multi
Solution: Experiment, optimize and analyze at scale
6
7. SigOpt. Confidential.
SigOpt Features
Enterprise
Platform
Optimization
Engine
Experiment
Insights
Reproducibility
Intuitive web dashboards
Cross-team permissions
and collaboration
Advanced experiment
visualizations
Usage insights
Parameter importance
analysis
Multimetric optimization
Continuous, categorical, or
integer parameters
Constraints and
failure regions
Up to 10k observations,
100 parameters
Multitask optimization and
high parallelism
Training Monitor and
Automated Early Stopping
Infrastructure agnostic
REST API
Parallel Resource Scheduler
Black-Box Interface
Tunes without
accessing any data
Libraries for Python,
Java, R, and MATLAB
9. SigOpt. Confidential.
Your firewall
Training
Data
AI, ML, DL,
Simulation Model
Model Evaluation
or Backtest
Testing
Data
New
Configurations
Objective
Metric
Better
Results
EXPERIMENT INSIGHTS
Track, organize, analyze and
reproduce any model
ENTERPRISE PLATFORM
Built to fit any stack and scale
with your needs
OPTIMIZATION ENGINE
Explore and exploit with a
variety of techniques
RESTAPI
Configuration
Parameters or
Hyperparameters
Black Box Optimization
10. SigOpt. Confidential.
A graphical depiction of the iterative process
10
Build a statistical model
Sequential Model Based Optimization (SMBO)
11. SigOpt. Confidential.
A graphical depiction of the iterative process
11
Build a statistical model
Choose the next point
to maximize the acquisition function
Sequential Model Based Optimization (SMBO)
12. SigOpt. Confidential.
A graphical depiction of the iterative process
12
Build a statistical model Build a statistical model
Choose the next point
to maximize the acquisition function
Sequential Model Based Optimization (SMBO)
13. SigOpt. Confidential.
A graphical depiction of the iterative process
13
Build a statistical model Build a statistical model
Choose the next point
to maximize the acquisition function
Sequential Model Based Optimization (SMBO)
Choose the next point
to maximize the acquisition function
14. SigOpt Blog Posts: Intuition Behind Bayesian Optimization
Some Relevant Blog Posts
● Intuition Behind Covariance Kernels
● Approximation of Data
● Likelihood for Gaussian Processes
● Profile Likelihood vs. Kriging Variance
● Intuition behind Gaussian Processes
● Dealing with Troublesome Metrics
Find more blog posts visit:
https://sigopt.com/blog/
15. SigOpt. Confidential.
How to continuously and efficiently
utilize your project’s allotted compute
infrastructure
3
16. SigOpt. Confidential.
Utilize compute by asynchronous parallel optimization
SigOpt natively handles Parallel Function Evaluation with the primary goal of
minimizing the Overall Wall-Clock Time. Parallelism also provides:
• Faster time-to-results — minimized overall wall-clock time
• Full resource utilization — asynchronous parallel optimization
• Scaling with infrastructure — optimize across the number of available compute resources
This is essential to increase Research Productivity by lowering the time-to-results and scaling with
available infrastructure.
16
Continuously and efficiently utilize infrastructure
17. SigOpt. Confidential.
Your firewall
Training
Data
AI, ML, DL, Simulation
Model
Model Evaluation
or Backtest
Testing
Data
New
Configurations
Objective
Metric
Better
Results
EXPERIMENT INSIGHTS
Track, organize, analyze and
reproduce any model
ENTERPRISE PLATFORM
Built to fit any stack and scale with
your needs
OPTIMIZATION ENGINE
Explore and exploit with a variety
of techniques
RESTAPI
Configuration
Parameters or
Hyperparameters
Continuously and efficiently utilize infrastructure
18. SigOpt. Confidential.
Better
Results
EXPERIMENT INSIGHTS
Track, organize, analyze and
reproduce any model
ENTERPRISE PLATFORM
Built to fit any stack and scale with
your needs
OPTIMIZATION ENGINE
Explore and exploit with a variety
of techniques
RESTAPI
Worker
Continuously and efficiently utilize infrastructure
19. SigOpt. Confidential.
EXPERIMENT INSIGHTS
Track, organize, analyze and
reproduce any model
ENTERPRISE PLATFORM
Built to fit any stack and scale with
your needs
OPTIMIZATION ENGINE
Explore and exploit with a variety
of techniques
RESTAPI
Worker
Continuously and efficiently utilize infrastructure
Worker
Worker
Worker #1
Worker #2
Worker #100
20. SigOpt. Confidential.
Parallel function evaluations: find the best set of suggestions
20
Parallel function evaluations are a way of efficiently maximizing a function
while using all available compute resources [Ginsbourger et al, 2008, Garcia-Barcos et al. 2019].
• Choosing points by jointly maximizing criteria over the entire set of open resources
• Asynchronously evaluating over a collection of points
• Fixing points which are currently being evaluated while sampling new ones
Jointly Optimize Multiple Next Points to Sample
1D - Acquisition Function 2D - Acquisition Function
21. SigOpt. Confidential.
Parallel optimization: different parallel bandwidth leads to different search
21
Parallel bandwidth = 1 Parallel bandwidth = 2 Parallel bandwidth = 3
Parallel bandwidth = 4 Parallel bandwidth = 5
Next point(s) to
evaluate:
Parallel bandwidth
represent the # of
available compute
resources
Statistical Model
More Exploration, More Exploitation: Faster Wall Clock
22. Parallelism Use Case
● Category: NLP
● Task: Sentiment Analysis
● Model: CNN
● Data: Rotten Tomatoes Movie Reviews
● Analysis: Predicting Positive vs. Negative Sentiment
● Result: 400x speedup
Learn more
https://aws.amazon.com/blogs/machine-learning/fast-cnn-tuni
ng-with-aws-gpu-instances-and-sigopt/
Use Case: Fast CNN Tuning with AWS GPU Instances
24. SigOpt. Confidential.
How to efficiently minimize time to optimize any function
SigOpt’s multitask feature is an efficient way for modelers to tune model with an expensive training cost
with the benefit of:
• Faster time-to-market — The ability to bring expensive models into production faster
• Reduction in infrastructure cost — Intelligently leverage infrastructure while reducing cost
Through novel research SigOpt helps the user lower the overall time-to-market,
while reducing the overall compute budget.
24
Expensive Training Cost
25. SigOpt. Confidential.
Your firewall
Training
Data
AI, ML, DL, Simulation
Model
Model Evaluation or
Backtest
Testing
Data
New
Configurations
Objective
Metric
Better
Results
EXPERIMENT INSIGHTS
Track, organize, analyze and
reproduce any model
ENTERPRISE PLATFORM
Built to fit any stack and scale with
your needs
OPTIMIZATION ENGINE
Explore and exploit with a variety
of techniques
RESTAPI
Configuration
Parameters or
Hyperparameters
Expensive Training Cost
26. SigOpt. Confidential.
Better
Results
EXPERIMENT INSIGHTS
Track, organize, analyze and
reproduce any model
ENTERPRISE PLATFORM
Built to fit any stack and scale with
your needs
OPTIMIZATION ENGINE
Explore and exploit with a variety
of techniques
RESTAPI
Expensive Training Cost
27. SigOpt. Confidential.
Better
Results
EXPERIMENT INSIGHTS
Track, organize, analyze and
reproduce any model
ENTERPRISE PLATFORM
Built to fit any stack and scale with
your needs
OPTIMIZATION ENGINE
Explore and exploit with a variety
of techniques
RESTAPI
Expensive Training Cost
28. SigOpt. Confidential.
Using cheap or free information to speed learning
28
SigOpt allows to the user to define lower-cost functions in order to quickly optimize expensive functions
• Cheaper-cost functions can be flexible (fewer epochs, subsampled data, other custom features)
• Use cheaper tasks earlier in the tuning process to explore
• Inform more expensive tasks later by exploiting what we learn
• In the process, reduce the full time required to tune an expensive model
Expensive Training Cost
29. SigOpt. Confidential.
Using cheap or free information to speed learning
We can build better models using inaccurate data to help point the
actual optimization in the right direction with less cost.
• Using a warm start through multi-task learning logic [Swersky et al, 2014]
• Combining good anytime performance with active learning [Klein et al, 2018]
• Accepting data from multiple sources without priors [Poloczek et al, 2017]
29
Expensive Training Cost
30. Use Case: Image Classification on a Budget
Use Case
● Category: Computer Vision
● Task: Image Classification
● Model: CNN
● Data: Stanford Cars Dataset
● Analysis: Architecture Comparison
● Result: 2.4% accuracy gain with a much shallower
model
Learn more
https://mlconf.com/blog/insights-for-building-high-performing-
image-classification-models/