SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
SigOpt. Confidential.
Efficient BERT
Compress BERT with Multimetric Bayesian Optimization
SigOpt. Confidential.2
BERT is great!
Vaswani et al 2017, Devlin et al 2018
SigOpt. Confidential.
But, it is very large
3
Stanford CS244n
SigOpt. Confidential.
Can we understand the trade-offs
when compressing BERT?
SigOpt. Confidential.
Can we find a model architecture
that works for our specific needs?
SigOpt. Confidential.
Distilling BERT for Question
Answering
SigOpt. Confidential.
The Data: SQUAD 2.0
7
SQUAD 2.0
SigOpt. Confidential.
SQUAD 2.0’s Unanswerable Questions
8
SigOpt. Confidential.
How does Distillation work?
9
Teacher Model
Student Model
Data
Data
Soft
target
loss
Hard
target
loss
Trained Student Model
Hinton et al 2015, Intel’s overview
SigOpt. Confidential.
Distilling BERT for Question Answering
10
BERT
Pre-trained for language
modeling
Student Model
SQUAD 2.0
SQUAD 2.0
Soft
target
loss
Hard
target
loss
BERT
Fine-tuned for SQUAD 2.0
Trained Student Model
For more on distillation: Hinton et al 2015, DistilBERT
SigOpt. Confidential.
Defining the student model
11
Student Model
BookCorpus
and English
Wikipedia
DistilBERT
Pre-trained for language
understanding
Architecture
parameters
Pre-trained
model weights
DistilBERT, Toronto Book Corpus,
English Wikipedia, SigOpt
SigOpt. Confidential.
What is the Baseline?
12
BERT
Pre-trained for language
modeling
DistilBERT architecture
SQUAD 2.0
SQUAD 2.0
Soft
target
loss
Hard
target
loss
BERT
Fine-tuned for SQUAD 2.0
Trained DistilBERT
For more on distillation: Hinton et al 2015, DistilBERT
SigOpt. Confidential.
Baseline model performance
13
Baseline Exact
67.07%
Baseline
Parameters
66.3M
SigOpt. Confidential.
SigOpt: Enterprise HPO at Scale
ML, DL or Simulation
Model
Model Evaluation or
Backtest
Testing
Data
Training
Data
SigOpt. Confidential.
Multimetric Bayesian Optimization
Optimizing for two competing metrics
15
SigOpt’s Multimetric Optimization
SigOpt. Confidential.
What are our metrics?
16
Minimize
Model Size
Maximize Model Performance
Baseline Exact
67.07%
Baseline
Parameters
66.3M
SigOpt. Confidential.
Metric Threshold: Dealing with dataset characteristics
17
Minimize
Model Size
Maximize Model Performance
Baseline
Parameters
66.3M
Baseline Exact
67.07%Metric Threshold
SigOpt’s Metric Threshold
SigOpt. Confidential.
What are we tuning?
18
SGD Parameters, Batch Size,
Warm up, Weight Initialization
Number of Layers and
Attention Heads, Pruning,
Dropouts
Temperature and loss
function weights
9 Model training
parameters
6 Model architecture
parameters
3 Distillation parameters
SigOpt. Confidential.
The Optimization Cycle
19
Student Model
Architecture and
training
parameters
BERT
Fine-tuned for SQUAD 2.0
SQUAD 2.0 Trained Student Model
Distillation
Distillation
Parameters
validation accuracy
and model size
SigOpt. Confidential.
Orchestration
20
AWS EC2
User’s Workstation
Execute Program
Cluster
management
Optimization at
Scale
SigOpt. Confidential.
So, what were our results?
SigOpt. Confidential.
SigOpt found dozens of viable models
22
Baseline Exact
Baseline
Size
Metric Threshold
SigOpt. Confidential.
Choose the model architecture that meets your needs
23
Maximize
Performance
Minimize
Size
+3.45% on Performance
+0.09% on Size
-0.25% on Performance
-22.47% on Size
+3.19% on Performance
-1.69% on Size
SigOpt. Confidential.
Some architecture options
24
Maximize
Performance
Minimize
Size
+3.45% on Performance
+0.09% on Size
-0.25% on Performance
-22.47% on Size
+3.19% on Performance
-1.69% on Size
4 layers, 11 attention heads
No dropout, raised
temperature, soft target loss
weighted more
6 layers, 11 attention heads
no dropout, low
temperature, almost all soft
target loss
6 layers, 12 attention heads
no dropout, raised
temperature, soft target loss
weighted more
SigOpt. Confidential.
Let’s take a quick look at the
dashboard
SigOpt. Confidential.
Was the model able to answer
questions?
SigOpt. Confidential.
Model performance
27
SigOpt. Confidential.
What did misclassifications look like?
28
SigOpt. Confidential.
Let’s take a look at Warsaw
29
SQUAD 2.0
SigOpt. Confidential.
Why does it matter?
30
By using Multimetric Bayesian
Optimization, we’re able to easily
understand trade-offs made during
compression
By understanding these trade-offs,
we’re able to choose a model
architecture that best suits our needs
SigOpt. Confidential.
Check out our
YouTube channel:
Learn more about SigOpt
Read our research and product blog.
See more videos here.
Get free beta access to
Experiment Management
Join the beta
Click Here
Upcoming webinars:
● Introducing Experiment
Management - Thursday, July 9 at
10am PT
SigOpt. Confidential.
Thank you!
Here’s the repo to reproduce work

Más contenido relacionado

Similar a Efficient NLP by Distilling BERT and Multimetric Optimization

Line item dimension and high cardinality dimension
Line item dimension and high cardinality dimensionLine item dimension and high cardinality dimension
Line item dimension and high cardinality dimension
Praveen Kumar
 
Data Quality for Machine Learning Tasks
Data Quality for Machine Learning TasksData Quality for Machine Learning Tasks
Data Quality for Machine Learning Tasks
Hima Patel
 
Cross-Project Build Co-change Prediction
Cross-Project Build Co-change PredictionCross-Project Build Co-change Prediction
Cross-Project Build Co-change Prediction
Shane McIntosh
 

Similar a Efficient NLP by Distilling BERT and Multimetric Optimization (20)

Line item dimension and high cardinality dimension
Line item dimension and high cardinality dimensionLine item dimension and high cardinality dimension
Line item dimension and high cardinality dimension
 
Scaling up Deep Learning by Scaling Down
Scaling up Deep Learning by Scaling DownScaling up Deep Learning by Scaling Down
Scaling up Deep Learning by Scaling Down
 
SigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt for Machine Learning and AI
SigOpt for Machine Learning and AI
 
How Will Knowledge Graphs Improve Clinical Reporting Workflows
How Will Knowledge Graphs Improve Clinical Reporting WorkflowsHow Will Knowledge Graphs Improve Clinical Reporting Workflows
How Will Knowledge Graphs Improve Clinical Reporting Workflows
 
CCDE Experience
CCDE ExperienceCCDE Experience
CCDE Experience
 
CAD Certification
CAD CertificationCAD Certification
CAD Certification
 
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
 
Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...
Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...
Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...
 
Data Quality for Machine Learning Tasks
Data Quality for Machine Learning TasksData Quality for Machine Learning Tasks
Data Quality for Machine Learning Tasks
 
BIM Show Live 2015: Fit-out case study
BIM Show Live 2015: Fit-out case studyBIM Show Live 2015: Fit-out case study
BIM Show Live 2015: Fit-out case study
 
Advanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise WebinarAdvanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise Webinar
 
Decision Optimization - CPLEX Optimization Studio - Product Overview(2).PPTX
Decision Optimization - CPLEX Optimization Studio - Product Overview(2).PPTXDecision Optimization - CPLEX Optimization Studio - Product Overview(2).PPTX
Decision Optimization - CPLEX Optimization Studio - Product Overview(2).PPTX
 
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
 
Metric Management: a SigOpt Applied Use Case
Metric Management: a SigOpt Applied Use CaseMetric Management: a SigOpt Applied Use Case
Metric Management: a SigOpt Applied Use Case
 
Cross-Project Build Co-change Prediction
Cross-Project Build Co-change PredictionCross-Project Build Co-change Prediction
Cross-Project Build Co-change Prediction
 
Scaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will loveScaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will love
 
ML Production Pipelines: A Classification Model
ML Production Pipelines: A Classification ModelML Production Pipelines: A Classification Model
ML Production Pipelines: A Classification Model
 
Adopting BIM - An Architect's Perspective (07 May 2014)
Adopting BIM - An Architect's Perspective (07 May 2014)Adopting BIM - An Architect's Perspective (07 May 2014)
Adopting BIM - An Architect's Perspective (07 May 2014)
 
Democratization of NOSQL Document-Database over Relational Database Comparati...
Democratization of NOSQL Document-Database over Relational Database Comparati...Democratization of NOSQL Document-Database over Relational Database Comparati...
Democratization of NOSQL Document-Database over Relational Database Comparati...
 
Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...
Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...
Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...
 

Más de SigOpt

Más de SigOpt (20)

Experiment Management for the Enterprise
Experiment Management for the EnterpriseExperiment Management for the Enterprise
Experiment Management for the Enterprise
 
Detecting COVID-19 Cases with Deep Learning
Detecting COVID-19 Cases with Deep LearningDetecting COVID-19 Cases with Deep Learning
Detecting COVID-19 Cases with Deep Learning
 
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric StrategyTuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy
 
Tuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningTuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep Learning
 
Tuning Data Augmentation to Boost Model Performance
Tuning Data Augmentation to Boost Model PerformanceTuning Data Augmentation to Boost Model Performance
Tuning Data Augmentation to Boost Model Performance
 
Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019
 
SigOpt at Ai4 Finance—Modeling at Scale
SigOpt at Ai4 Finance—Modeling at Scale SigOpt at Ai4 Finance—Modeling at Scale
SigOpt at Ai4 Finance—Modeling at Scale
 
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
 
SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...
SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...
SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...
 
SigOpt at O'Reilly - Best Practices for Scaling Modeling Platforms
SigOpt at O'Reilly - Best Practices for Scaling Modeling PlatformsSigOpt at O'Reilly - Best Practices for Scaling Modeling Platforms
SigOpt at O'Reilly - Best Practices for Scaling Modeling Platforms
 
SigOpt at GTC - Reducing operational barriers to optimization
SigOpt at GTC - Reducing operational barriers to optimizationSigOpt at GTC - Reducing operational barriers to optimization
SigOpt at GTC - Reducing operational barriers to optimization
 
Lessons for an enterprise approach to modeling at scale
Lessons for an enterprise approach to modeling at scaleLessons for an enterprise approach to modeling at scale
Lessons for an enterprise approach to modeling at scale
 
Modeling at scale in systematic trading
Modeling at scale in systematic tradingModeling at scale in systematic trading
Modeling at scale in systematic trading
 
SigOpt at MLconf - Reducing Operational Barriers to Model Training
SigOpt at MLconf - Reducing Operational Barriers to Model TrainingSigOpt at MLconf - Reducing Operational Barriers to Model Training
SigOpt at MLconf - Reducing Operational Barriers to Model Training
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
 
Tuning the Untunable - Insights on Deep Learning Optimization
Tuning the Untunable - Insights on Deep Learning OptimizationTuning the Untunable - Insights on Deep Learning Optimization
Tuning the Untunable - Insights on Deep Learning Optimization
 
Machine Learning Fundamentals
Machine Learning FundamentalsMachine Learning Fundamentals
Machine Learning Fundamentals
 
Tips and techniques for hyperparameter optimization
Tips and techniques for hyperparameter optimizationTips and techniques for hyperparameter optimization
Tips and techniques for hyperparameter optimization
 
MLconf 2017 Seattle Lunch Talk - Using Optimal Learning to tune Deep Learning...
MLconf 2017 Seattle Lunch Talk - Using Optimal Learning to tune Deep Learning...MLconf 2017 Seattle Lunch Talk - Using Optimal Learning to tune Deep Learning...
MLconf 2017 Seattle Lunch Talk - Using Optimal Learning to tune Deep Learning...
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Efficient NLP by Distilling BERT and Multimetric Optimization