Efficient NLP by Distilling BERT and Multimetric Optimization

•

0 recomendaciones•238 vistas

This document summarizes work using Bayesian optimization to compress BERT models for question answering while balancing model size and performance. It describes distilling BERT into smaller student models using SQuAD 2.0 data. SigOpt was used to tune model architectures and training to find models that exceeded the baseline performance while reducing size by over 20%. The best models found had 4-6 layers and maintained over 67% accuracy on SQuAD.

Tecnología

SigOpt. Conﬁdential.
Eﬃcient BERT
Compress BERT with Multimetric Bayesian Optimization

SigOpt. Conﬁdential.2
BERT is great!
Vaswani et al 2017, Devlin et al 2018

SigOpt. Conﬁdential.
But, it is very large
3
Stanford CS244n

SigOpt. Conﬁdential.
Can we understand the trade-oﬀs
when compressing BERT?

SigOpt. Conﬁdential.
Can we ﬁnd a model architecture
that works for our speciﬁc needs?

SigOpt. Conﬁdential.
Distilling BERT for Question
Answering

SigOpt. Conﬁdential.
The Data: SQUAD 2.0
7
SQUAD 2.0

SigOpt. Conﬁdential.
SQUAD 2.0’s Unanswerable Questions
8

SigOpt. Conﬁdential.
How does Distillation work?
9
Teacher Model
Student Model
Data
Data
Soft
target
loss
Hard
target
loss
Trained Student Model
Hinton et al 2015, Intel’s overview

SigOpt. Conﬁdential.
Distilling BERT for Question Answering
10
BERT
Pre-trained for language
modeling
Student Model
SQUAD 2.0
SQUAD 2.0
Soft
target
loss
Hard
target
loss
BERT
Fine-tuned for SQUAD 2.0
Trained Student Model
For more on distillation: Hinton et al 2015, DistilBERT

SigOpt. Conﬁdential.
Deﬁning the student model
11
Student Model
BookCorpus
and English
Wikipedia
DistilBERT
Pre-trained for language
understanding
Architecture
parameters
Pre-trained
model weights
DistilBERT, Toronto Book Corpus,
English Wikipedia, SigOpt

SigOpt. Conﬁdential.
What is the Baseline?
12
BERT
Pre-trained for language
modeling
DistilBERT architecture
SQUAD 2.0
SQUAD 2.0
Soft
target
loss
Hard
target
loss
BERT
Fine-tuned for SQUAD 2.0
Trained DistilBERT
For more on distillation: Hinton et al 2015, DistilBERT

SigOpt. Conﬁdential.
Baseline model performance
13
Baseline Exact
67.07%
Baseline
Parameters
66.3M

SigOpt. Conﬁdential.
SigOpt: Enterprise HPO at Scale
ML, DL or Simulation
Model
Model Evaluation or
Backtest
Testing
Data
Training
Data

SigOpt. Conﬁdential.
Multimetric Bayesian Optimization
Optimizing for two competing metrics
15
SigOpt’s Multimetric Optimization

SigOpt. Conﬁdential.
What are our metrics?
16
Minimize
Model Size
Maximize Model Performance
Baseline Exact
67.07%
Baseline
Parameters
66.3M

SigOpt. Conﬁdential.
Metric Threshold: Dealing with dataset characteristics
17
Minimize
Model Size
Maximize Model Performance
Baseline
Parameters
66.3M
Baseline Exact
67.07%Metric Threshold
SigOpt’s Metric Threshold

SigOpt. Conﬁdential.
What are we tuning?
18
SGD Parameters, Batch Size,
Warm up, Weight Initialization
Number of Layers and
Attention Heads, Pruning,
Dropouts
Temperature and loss
function weights
9 Model training
parameters
6 Model architecture
parameters
3 Distillation parameters

SigOpt. Conﬁdential.
The Optimization Cycle
19
Student Model
Architecture and
training
parameters
BERT
Fine-tuned for SQUAD 2.0
SQUAD 2.0 Trained Student Model
Distillation
Distillation
Parameters
validation accuracy
and model size

SigOpt. Conﬁdential.
Orchestration
20
AWS EC2
User’s Workstation
Execute Program
Cluster
management
Optimization at
Scale

SigOpt. Conﬁdential.
So, what were our results?

SigOpt. Conﬁdential.
SigOpt found dozens of viable models
22
Baseline Exact
Baseline
Size
Metric Threshold

SigOpt. Conﬁdential.
Choose the model architecture that meets your needs
23
Maximize
Performance
Minimize
Size
+3.45% on Performance
+0.09% on Size
-0.25% on Performance
-22.47% on Size
+3.19% on Performance
-1.69% on Size

SigOpt. Conﬁdential.
Some architecture options
24
Maximize
Performance
Minimize
Size
+3.45% on Performance
+0.09% on Size
-0.25% on Performance
-22.47% on Size
+3.19% on Performance
-1.69% on Size
4 layers, 11 attention heads
No dropout, raised
temperature, soft target loss
weighted more
6 layers, 11 attention heads
no dropout, low
temperature, almost all soft
target loss
6 layers, 12 attention heads
no dropout, raised
temperature, soft target loss
weighted more

SigOpt. Conﬁdential.
Let’s take a quick look at the
dashboard

SigOpt. Conﬁdential.
Was the model able to answer
questions?

SigOpt. Conﬁdential.
Model performance
27

SigOpt. Conﬁdential.
What did misclassiﬁcations look like?
28

SigOpt. Conﬁdential.
Let’s take a look at Warsaw
29
SQUAD 2.0

SigOpt. Conﬁdential.
Why does it matter?
30
By using Multimetric Bayesian
Optimization, we’re able to easily
understand trade-oﬀs made during
compression
By understanding these trade-oﬀs,
we’re able to choose a model
architecture that best suits our needs

SigOpt. Conﬁdential.
Check out our
YouTube channel:
Learn more about SigOpt
Read our research and product blog.
See more videos here.
Get free beta access to
Experiment Management
Join the beta
Click Here
Upcoming webinars:
● Introducing Experiment
Management - Thursday, July 9 at
10am PT

SigOpt. Conﬁdential.
Thank you!
Here’s the repo to reproduce work

Más contenido relacionado

Similar a Efficient NLP by Distilling BERT and Multimetric Optimization

Line item dimension and high cardinality dimension

Praveen Kumar

Scaling up Deep Learning by Scaling Down

Databricks

SigOpt for Machine Learning and AI

SigOpt

How Will Knowledge Graphs Improve Clinical Reporting Workflows

Neo4j

CCDE Experience

Himawan Nugroho

CAD Certification

Vskills

How do you find the best solution when faced with many choices? Combinatorial optimization is a field of mathematics that seeks to find the most optimal solutions for complex problems involving multiple variables. There are numerous business verticals that can benefit from combinatorial optimization, whether transport, supply chain, or the mobile industry. More recently, we’ve seen gains from AI for combinatorial optimization, leading to scalability of the method, as well as significant reductions in cost. This method replaces the manual tuning of traditional heuristic approaches with an AI agent that provides a fast metric estimation. In this presentation you will find out: Why AI is crucial in combinatorial optimization How it can be applied to two use cases: improving chip design and hardware-specific compilers The state-of-the-art results achieved by Qualcomm AI Research

Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI

Qualcomm Research

Data Vault 2.0 is more than a modeling approach, it is an invaluable methodology that adds value to an array of data warehouse projects. Join Michael Olschimke as he describes the positive impact of Data Vault 2.0 to data warehousing teams. This session also includes a short demonstration of Data Vault Express, a product proven to automate the entire data vault lifecycle to deliver data vault solutions to the business faster, at lower cost and with less risk. Join us and learn how you can make Data Vaults a practical reality. Meet the Speaker = = = = = = = = = Michael Olschimke has more than 20 years of experience in Information Technology. During the last eight years, he has specialized in Business Intelligence topics such as OLAP, Dimensional Modelling, and Data Mining and holds a Master of Science in Information Systems from Santa Clara University in Silicon Valley, California. Michael is one of the Chief Executive Officer (CEO) and co-founders of Scalefree where he is responsible for the business direction of the company. He is also co-author of the book "Building a Scalable Data Warehouse with Data Vault 2.0".

Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...

IDERA Software

Slides from lecture style tutorial on data quality for ML delivered at SIGKDD 2021. The quality of training data has a huge impact on the efficiency, accuracy and complexity of machine learning tasks. Data remains susceptible to errors or irregularities that may be introduced during collection, aggregation or annotation stage. This necessitates profiling and assessment of data to understand its suitability for machine learning tasks and failure to do so can result in inaccurate analytics and unreliable decisions. While researchers and practitioners have focused on improving the quality of models (such as neural architecture search and automated feature selection), there are limited efforts towards improving the data quality. Assessing the quality of the data across intelligently designed metrics and developing corresponding transformation operations to address the quality gaps helps to reduce the effort of a data scientist for iterative debugging of the ML pipeline to improve model performance. This tutorial highlights the importance of analysing data quality in terms of its value for machine learning applications. Finding the data quality issues in data helps different personas like data stewards, data scientists, subject matter experts, or machine learning scientists to get relevant data insights and take remedial actions to rectify any issue. This tutorial surveys all the important data quality related approaches for structured, unstructured and spatio-temporal domains discussed in literature, focusing on the intuition behind them, highlighting their strengths and similarities, and illustrates their applicability to real-world problems.

Data Quality for Machine Learning Tasks

Hima Patel

BIM Show Live 2015: Fit-out case study

BuiltEnvironmentUBM

Building on the TWIML eBook, TWIMLcon event and TWIML podcast series that explore Machine Learning Platforms in great detail, this webinar examines the machine learning platforms that power enterprise leaders in AI. SigOpt CEO Scott Clark will provide an overview of critical technical capabilities that our customers have prioritized in their ML platforms. Review these slides to learn about: - Critical capabilities for data, experiment and model management - Tradeoffs between building and buying these capabilities - Lessons from the implementation of these platforms by AI leaders Why focus on these platforms and the capabilities that power them? Nearly every company is investing in machine learning that differentiates products or generates revenue. These so-called "differentiated models" represent the biggest opportunity for AI to transform the business. Most of these teams find success hiring expert data scientists and machine learning engineers who can build these models. But most of these teams also struggle to create a more sustainable, scalable and reproducible process for model development, and have begun building ML platforms to tackle this challenge.

Advanced Optimization for the Enterprise Webinar

SigOpt

Decision Optimization - CPLEX Optimization Studio - Product Overview(2).PPTX

SanjayKPrasad2

¿Segmentación semántica? ¿Pero de qué me estás hablando? En Visión Artificial hay una serie de problemas típicos: clasificación de imágenes, detección de objetos, segmentación semántica... En este meetup vamos a echar un ojo en más profundidad a eso de la segmentación semántica. Desde el principio, sin asumir conocimientos previos. Veremos todos los pasos necesarios para definir un pequeño experimento y cómo medirlo. Lo bueno es que todos estos pasos que veremos y esta metodología son fácilmente exportables a la mayoría de proyectos de ciencia de datos.

Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...

Ricardo Guerrero Gómez-Olmedo

Metric Management: a SigOpt Applied Use Case

SigOpt

Build systems orchestrate how human-readable source code is translated into executable programs. In a software project, source code changes can induce changes in the build system (aka. build co-changes). It is difficult for developers to identify when build co-changes are necessary due to the complexity of build systems. Prediction of build co-changes works well if there is a sufficient amount of training data to build a model. However, in practice, for new projects, there exists a limited number of changes. Using training data from other projects to predict the build co-changes in a new project can help improve the performance of the build co-change prediction. We refer to this problem as cross-project build co-change prediction. In this paper, we propose CroBuild, a novel cross-project build co-change prediction approach that iteratively learns new classifiers. CroBuild constructs an ensemble of classifiers by iteratively building classifiers and assigning them weights according to its prediction error rate. Given that only a small proportion of code changes are build co-changing, we also propose an imbalance-aware approach that learns a threshold boundary between those code changes that are build co-changing and those that are not in order to construct classifiers in each iteration. To examine the benefits of CroBuild, we perform experiments on 4 large datasets including Mozilla, Eclipse-core, Lucene, and Jazz, comprising a total of 50,884 changes. On average, across the 4 datasets, CroBuild achieves a F1-score of up to 0.408. We also compare CroBuild with other approaches such as a basic model, AdaBoost proposed by Freund et al., and TrAdaBoost proposed by Dai et al.. On average, across the 4 datasets, the CroBuild approach yields an improvement in F1-scores of 41.54%, 36.63%, and 36.97% over the basic model, AdaBoost, and TrAdaBoost, respectively.

Cross-Project Build Co-change Prediction

Shane McIntosh

Scaling & Transforming Stitch Fix's Visibility into What Folks will love

June Andrews

In this talk, we will present how we tied Python together with Databricks and MLflow to productionalize a machine learning pipeline. Through the deployment of a fairly standard classification model, we will present what a machine learning pipeline in Production could look like. The project consists of two pipelines; training and prediction. We are using the S3 Bucket as a source of data. The training pipeline trains various models on data, registers them in Mlflow, and stores all metrics and hyperparameters. Using Grid Search, the best model is chosen and moved to the Production Stage in MLflow. The Production model can then be deployed using Flask, or just a UDF if we want to process data in a batch. The prediction pipeline will then use the deployed model to make a prediction, whether on-demand or in a batch.

ML Production Pipelines: A Classification Model

Databricks

Adopting BIM - An Architect's Perspective (07 May 2014)

Elrond Burrell

Democratization of NOSQL Document-Database over Relational Database Comparati...

IRJET Journal

Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...

IRJET Journal

Similar a Efficient NLP by Distilling BERT and Multimetric Optimization (20)

Line item dimension and high cardinality dimension

Scaling up Deep Learning by Scaling Down

SigOpt for Machine Learning and AI

How Will Knowledge Graphs Improve Clinical Reporting Workflows

CCDE Experience

CAD Certification

Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI

Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...

Data Quality for Machine Learning Tasks

BIM Show Live 2015: Fit-out case study

Advanced Optimization for the Enterprise Webinar

Decision Optimization - CPLEX Optimization Studio - Product Overview(2).PPTX

Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...

Metric Management: a SigOpt Applied Use Case

Cross-Project Build Co-change Prediction

Scaling & Transforming Stitch Fix's Visibility into What Folks will love

ML Production Pipelines: A Classification Model

Adopting BIM - An Architect's Perspective (07 May 2014)

Democratization of NOSQL Document-Database over Relational Database Comparati...

Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...

Más de SigOpt

Experiment Management for the Enterprise

SigOpt

Detecting COVID-19 Cases with Deep Learning

SigOpt

Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy

SigOpt

This talk explains how to train deep learning and other expensive models with parallelism and multitask optimization to reduce wall clock time. Tobias Andreassen, who supports a number of our systematic trading customers, presented the intuition behind Bayesian optimization for model optimization with a single or multiple (often competing) metrics. Many times it makes sense to analyze a second metric to avoid myopic training runs that overfit on your data, or otherwise don’t represent or impede performance in real-world scenarios.

Tuning for Systematic Trading: Talk 2: Deep Learning

SigOpt

In this webinar, SigOpt ML Engineer Meghana Ravikumar presents on and builds an image classifier trained on the Stanford Cars dataset to evaluate two approaches to transfer learning—fine tuning and feature extraction—and the impact of Multitask optimization, a more efficient form of Bayesian optimization, on these techniques. Once we define the most performant transfer learning technique for Stanford Cars, we will use image augmentation to double the size of the dataset to boost the classifier’s performance. Instead of manually tuning the hyperparameters associated with image augmentation, we will use Multitask Optimization to learn these hyperparameters using the downstream image classifier’s performance as the guide. In conjunction with model performance, we will also explore the features of these augmented images and the downstream implications for our image classifier.

Tuning Data Augmentation to Boost Model Performance

SigOpt

Modeling at Scale: SigOpt at TWIMLcon 2019

SigOpt

SigOpt at Ai4 Finance—Modeling at Scale

SigOpt

Many real world applications - machine learning models, simulators, etc. - have multiple competing metrics that define performance; these require practitioners to carefully consider potential tradeoffs. However, assessing and ranking this tradeoff is nontrivial, especially when the number of metrics is more than two. Often times, practitioners scalarize the metrics into a single objective, e.g., using a weighted sum. In this talk, we pose this problem as a constrained multi-objective optimization problem. By setting and updating the constraints, we can efficiently explore only the region of the Pareto efficient frontier of the model/system of most interest. We motivate this problem with the application of an experimental design setting, where we are trying to fabricate high performance glass substrate for solar cell panels.

Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...

SigOpt

As data science workloads grow, so does their need for infrastructure. But, is it fair to ask data scientists to also become infrastructure experts? If not the data scientists, then, who is responsible for spinning up and managing data science infrastructure? This talk will address the context in which ML infrastructure is emerging, walk through two examples of ML infrastructure tools for launching hyperparameter optimization jobs, and end with some thoughts for building better tools in the future. Originally given as a talk at the PyData Ann Arbor meetup (https://www.meetup.com/PyData-Ann-Arbor/events/260380989/)

Machine Learning Infrastructure

SigOpt

SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...

SigOpt

Companies are increasingly building modeling platforms to empower their researchers to efficiently scale the development and productionalization of their models. Scott Clark and Matt Greenwood share a case study from a leading algorithmic trading firm to illustrate best practices for building these types of platforms in any industry. Join in to learn how Two Sigma, a leading quantitative investment and technology firm, solved its model optimization problem.

SigOpt at O'Reilly - Best Practices for Scaling Modeling Platforms

SigOpt

Advanced hardware like NVIDIA technology lowers technical barriers to model size and scope, but issues remain in areas like model performance and training infrastructure management. We'll discuss operational challenges to training models at scale with a particular focus on how training management and hyperparameter tuning can inform each other to accomplish specific goals. We'll also explore techniques like parallelism and scheduling, discuss their impact on model optimization, and compare various techniques. We'll also evaluate results of this approach. In particular, we'll focus on how new tools that automate training orchestration accelerate model development and increase the volume and quality of models in production.

SigOpt at GTC - Reducing operational barriers to optimization

SigOpt

Lessons for an enterprise approach to modeling at scale

SigOpt

SigOpt CEO Scott Clark provides insights for modeling at scale in systematic trading. SigOpt works with algorithmic trading firms that collectively represent $300 billion in assets under management (AUM). In this presentation, Scott draws on this experience to provide a few critical insights to how these companies effectively model at scale. Alongside these insights, Scott shares a more specific case study from working with Two Sigma, a leading systematic investment manager.

Modeling at scale in systematic trading

SigOpt

SigOpt at MLconf - Reducing Operational Barriers to Model Training

SigOpt

Machine learning infrastructure solve data scientists' problems using infrastructure tools. This talk shows the case study of building SigOpt Orchestrate, an ML infrastructure tool. The talk highlights how data scientists' concerns as user mapped to solutions with some of today's most popular infrastructure tools. To learn more about SigOpt Orchestrate: https://sigopt.com/orchestrate Originally given as a talk for UC Berkeley's Women in Electrical Engineering and Computer Science group on January 24, 2019.

Machine Learning Infrastructure

SigOpt

Patrick Hayes originally gave this talk at ODSC West in 2018. During this talk, Patrick discusses a couple key barriers to deep learning optimization and how SigOpt solves them. First, Patrick discusses the problem of lengthy training cycles and how novel techniques like multitask optimization are designed to use partial information to solve this challenge. Second, Patrick discusses automated cluster management and how solving this problem makes it much easier to manage training cycles for these models.

Tuning the Untunable - Insights on Deep Learning Optimization

SigOpt

Machine Learning Fundamentals

SigOpt

Tips and techniques for hyperparameter optimization

SigOpt

In this talk we introduce Bayesian Optimization as an efficient way to optimize machine learning model parameters, especially when evaluating different parameters is time-consuming or expensive. Deep learning pipelines are notoriously expensive to train and often have many tunable parameters including hyperparameters, the architecture, feature transformations that can have a large impact on the efficacy of the model. We will motivate the problem by giving several example applications using multiple open source deep learning frameworks and open datasets. We’ll compare the results of Bayesian Optimization to standard techniques like grid search, random search, and expert tuning.

MLconf 2017 Seattle Lunch Talk - Using Optimal Learning to tune Deep Learning...

SigOpt

Más de SigOpt (20)

Experiment Management for the Enterprise

Detecting COVID-19 Cases with Deep Learning

Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy

Tuning for Systematic Trading: Talk 2: Deep Learning

Tuning Data Augmentation to Boost Model Performance

Modeling at Scale: SigOpt at TWIMLcon 2019

SigOpt at Ai4 Finance—Modeling at Scale

Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...

Machine Learning Infrastructure

SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...

SigOpt at O'Reilly - Best Practices for Scaling Modeling Platforms

SigOpt at GTC - Reducing operational barriers to optimization

Lessons for an enterprise approach to modeling at scale

Modeling at scale in systematic trading

SigOpt at MLconf - Reducing Operational Barriers to Model Training

Machine Learning Infrastructure

Tuning the Untunable - Insights on Deep Learning Optimization

Machine Learning Fundamentals

Tips and techniques for hyperparameter optimization

MLconf 2017 Seattle Lunch Talk - Using Optimal Learning to tune Deep Learning...

Último

Effective data discovery is crucial for maintaining compliance and mitigating risks in today's rapidly evolving privacy landscape. However, traditional manual approaches often struggle to keep pace with the growing volume and complexity of data. Join us for an insightful webinar where industry leaders from TrustArc and Privya will share their expertise on leveraging AI-powered solutions to revolutionize data discovery. You'll learn how to: - Effortlessly maintain a comprehensive, up-to-date data inventory - Harness code scanning insights to gain complete visibility into data flows leveraging the advantages of code scanning over DB scanning - Simplify compliance by leveraging Privya's integration with TrustArc - Implement proven strategies to mitigate third-party risks Our panel of experts will discuss real-world case studies and share practical strategies for overcoming common data discovery challenges. They'll also explore the latest trends and innovations in AI-driven data management, and how these technologies can help organizations stay ahead of the curve in an ever-changing privacy landscape.

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

TrustArc

The action of the next cyber saga takes place in the mystical lands of the Asia-Pacific region, where the main characters began their digital activities in the middle of 2021 and qualitatively strengthened it in 2022. Corporate espionage, document theft, audio recordings, and data leaks from messaging platforms were all a matter of one day for Dark Pink. Their geographical focus may have started in the Asia-Pacific region, but their ambitions knew no bounds, targeting a European government ministry in a bold move to expand their portfolio. Their victim profile was as diverse as a UN meeting, targeting military organizations, government agencies, and even a religious organization. Because discrimination is not a fashionable agenda. In the world of cybercrime, they serve as a reminder that sometimes the most serious threats come in the most unassuming packages with a pink bow.

Cyberprint. Dark Pink Apt Group [EN].pdf

Overkill Security

Accelerating FinTech Innovation: Unleashing API Economy and GenAI Vasa Krishnan, Chief Technology Officer - FinResults Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

apidays

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Zilliz

Artificial Intelligence Chap.5 : Uncertainty

Khushali Kathiriya

AXA XL - Insurer Innovation Award Americas 2024

The Digital Insurer

Webinar Recording: https://www.panagenda.com/webinars/why-teams-call-analytics-is-critical-to-your-entire-business Nothing is as frustrating and noticeable as being in an important call and being unable to see or hear the other person. Not surprising then, that issues with Teams calls are among the most common problems users call their helpdesk for. Having in depth insight into everything relevant going on at the user’s device, local network, ISP and Microsoft itself during the call is crucial for good Microsoft Teams Call quality support. To ensure a quick and adequate solution and to ensure your users get the most out of their Microsoft 365. But did you know that ‘bad calls’ are also an excellent indicator of other problems arising? Precisely because it is so noticeable!? Like the canary in the mine, bad calls can be early indicators of problems. Problems that might otherwise not have been noticed for a while but can have a big impact on productivity and satisfaction. Join this session by Christoph Adler to learn how true Microsoft Teams call quality analytics helped other organizations troubleshoot bad calls and identify and fix problems that impacted Teams calls or the use of Microsoft365 in general. See what it can do to keep your users happy and productive! In this session we will cover - Why CQD data alone is not enough to troubleshoot call problems - The importance of attributing call problems to the right call participant - What call quality analytics can do to help you quickly find, fix-, and prevent problems - Why having retrospective detailed insights matters - Real life examples of how others have used Microsoft Teams call quality monitoring to problem shoot problems with their ISP, network, device health and more.

Why Teams call analytics are critical to your entire business

panagenda

Dubai, known for its towering skyscrapers, luxurious lifestyle, and relentless pursuit of innovation, often finds itself in the global spotlight. However, amidst the glitz and glamour, the emirate faces its own set of challenges, including the occasional threat of flooding. In recent years, Dubai has experienced sporadic but significant floods, disrupting normalcy and posing unique challenges to its infrastructure. Among the critical nodes in this bustling metropolis is the Dubai International Airport, a vital hub connecting the world. This article delves into the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Orbitshub

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

Dubai, often portrayed as a shimmering oasis in the desert, faces its own set of challenges, including the occasional threat of flooding. Despite its reputation for opulence and modernity, the emirate is not immune to the forces of nature. In recent years, Dubai has experienced sporadic but significant floods, testing the resilience of its infrastructure and communities. Among the critical lifelines in this bustling metropolis is the Dubai International Airport, a bustling hub that connects the city to the world. This article explores the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

Orbitshub

When you’re building (micro)services, you have lots of framework options. Spring Boot is no doubt a popular choice. But there’s more! Take Quarkus, a framework that’s considered the rising star for Kubernetes-native Java. It always depends on what's best for your situation, but how to choose the best solution if you're comparing 2 frameworks? Both Spring Boot and Quarkus have their positives and negatives. Let us compare the two by live coding a couple of common use cases in Spring Boot and Quarkus. After this talk, you’ll be ready to get started with Quarkus yourself, and know when to select Quarkus or Spring Boot.

Spring Boot vs Quarkus the ultimate battle - DevoxxUK

Jago de Vreede

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the deployment of external web forms using Jotform for Bonterra Impact Management. This solution can be customized to your organization’s needs and deployed to support the common use cases below: - Intake and consent - Assessments - Surveys - Applications - Program registration Interested in deploying web form automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Jeffrey Haguewood

Tracing the root cause of a performance issue requires a lot of patience, experience, and focus. It’s so hard that we sometimes attempt to guess by trying out tentative fixes, but that usually results in frustration, messy code, and a considerable waste of time and money. This talk explains how to correctly zoom in on a performance bottleneck using three levels of profiling: distributed tracing, metrics, and method profiling. After we learn to read the JVM profiler output as a flame graph, we explore a series of bottlenecks typical for backend systems, like connection/thread pool starvation, invisible aspects, blocking code, hot CPU methods, lock contention, and Virtual Thread pinning, and we learn to trace them even if they occur in library code you are not familiar with. Attend this talk and prepare for the performance issues that will eventually hit any successful system. About authorWith two decades of experience, Victor is a Java Champion working as a trainer for top companies in Europe. Five thousands developers in 120 companies attended his workshops, so he gets to debate every week the challenges that various projects struggle with. In return, Victor summarizes key points from these workshops in conference talks and online meetups for the European Software Crafters, the world’s largest developer community around architecture, refactoring, and testing. Discover how Victor can help you on victorrentea.ro : company training catalog, consultancy and YouTube playlists.

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

Victor Rentea

Keynote 2: APIs in 2030: The Risk of Technological Sleepwalk Paolo Malinverno, Growth Advisor - The Business of Technology Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

apidays

Three things you will take away from the session: • How to run an effective tenant-to-tenant migration • Best practices for before, during, and after migration • Tips for using migration as a springboard to prepare for Copilot in Microsoft 365 Main ideas: Migration Overview: The presentation covers the current reality of cross-tenant migrations, the triggers, phases, best practices, and benefits of a successful tenant migration Considerations: When considering a migration, it is important to consider the migration scope, performance, customization, flexibility, user-friendly interface, automation, monitoring, support, training, scalability, data integrity, data security, cost, and licensing structure Next Wave: The next wave of change includes the launch of Copilot, which requires businesses to be prepared for upcoming changes related to Copilot and the cloud, and to consolidate data and tighten governance ShareGate: ShareGate can help with pre-migration analysis, configurable migration tool, and automated, end-user driven collaborative governance

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

sammart93

Architecting Cloud Native Applications

WSO2

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Martijn de Jong

Join our latest Connector Corner webinar to discover how UiPath Integration Service revolutionizes API-centric automation in a 'Quote to Cash' process—and how that automation empowers businesses to accelerate revenue generation. A comprehensive demo will explore connecting systems, GenAI, and people, through powerful pre-built connectors designed to speed process cycle times. Speakers: James Dickson, Senior Software Engineer Charlie Greenberg, Host, Product Marketing Manager

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

DianaGray10

Manulife - Insurer Transformation Award 2024

The Digital Insurer

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

Efficient NLP by Distilling BERT and Multimetric Optimization

1. SigOpt. Conﬁdential. Eﬃcient BERT Compress BERT with Multimetric Bayesian Optimization

2. SigOpt. Conﬁdential.2 BERT is great! Vaswani et al 2017, Devlin et al 2018

3. SigOpt. Conﬁdential. But, it is very large 3 Stanford CS244n

4. SigOpt. Conﬁdential. Can we understand the trade-oﬀs when compressing BERT?

5. SigOpt. Confidential. Can we find a model architecture that works for our specific needs?

6. SigOpt. Conﬁdential. Distilling BERT for Question Answering

7. SigOpt. Conﬁdential. The Data: SQUAD 2.0 7 SQUAD 2.0

8. SigOpt. Conﬁdential. SQUAD 2.0’s Unanswerable Questions 8

9. SigOpt. Conﬁdential. How does Distillation work? 9 Teacher Model Student Model Data Data Soft target loss Hard target loss Trained Student Model Hinton et al 2015, Intel’s overview

10. SigOpt. Conﬁdential. Distilling BERT for Question Answering 10 BERT Pre-trained for language modeling Student Model SQUAD 2.0 SQUAD 2.0 Soft target loss Hard target loss BERT Fine-tuned for SQUAD 2.0 Trained Student Model For more on distillation: Hinton et al 2015, DistilBERT

11. SigOpt. Conﬁdential. Deﬁning the student model 11 Student Model BookCorpus and English Wikipedia DistilBERT Pre-trained for language understanding Architecture parameters Pre-trained model weights DistilBERT, Toronto Book Corpus, English Wikipedia, SigOpt

12. SigOpt. Conﬁdential. What is the Baseline? 12 BERT Pre-trained for language modeling DistilBERT architecture SQUAD 2.0 SQUAD 2.0 Soft target loss Hard target loss BERT Fine-tuned for SQUAD 2.0 Trained DistilBERT For more on distillation: Hinton et al 2015, DistilBERT

13. SigOpt. Conﬁdential. Baseline model performance 13 Baseline Exact 67.07% Baseline Parameters 66.3M

14. SigOpt. Conﬁdential. SigOpt: Enterprise HPO at Scale ML, DL or Simulation Model Model Evaluation or Backtest Testing Data Training Data

15. SigOpt. Conﬁdential. Multimetric Bayesian Optimization Optimizing for two competing metrics 15 SigOpt’s Multimetric Optimization

16. SigOpt. Conﬁdential. What are our metrics? 16 Minimize Model Size Maximize Model Performance Baseline Exact 67.07% Baseline Parameters 66.3M

17. SigOpt. Conﬁdential. Metric Threshold: Dealing with dataset characteristics 17 Minimize Model Size Maximize Model Performance Baseline Parameters 66.3M Baseline Exact 67.07%Metric Threshold SigOpt’s Metric Threshold

18. SigOpt. Conﬁdential. What are we tuning? 18 SGD Parameters, Batch Size, Warm up, Weight Initialization Number of Layers and Attention Heads, Pruning, Dropouts Temperature and loss function weights 9 Model training parameters 6 Model architecture parameters 3 Distillation parameters

19. SigOpt. Conﬁdential. The Optimization Cycle 19 Student Model Architecture and training parameters BERT Fine-tuned for SQUAD 2.0 SQUAD 2.0 Trained Student Model Distillation Distillation Parameters validation accuracy and model size

20. SigOpt. Conﬁdential. Orchestration 20 AWS EC2 User’s Workstation Execute Program Cluster management Optimization at Scale

21. SigOpt. Conﬁdential. So, what were our results?

22. SigOpt. Conﬁdential. SigOpt found dozens of viable models 22 Baseline Exact Baseline Size Metric Threshold

23. SigOpt. Conﬁdential. Choose the model architecture that meets your needs 23 Maximize Performance Minimize Size +3.45% on Performance +0.09% on Size -0.25% on Performance -22.47% on Size +3.19% on Performance -1.69% on Size

24. SigOpt. Conﬁdential. Some architecture options 24 Maximize Performance Minimize Size +3.45% on Performance +0.09% on Size -0.25% on Performance -22.47% on Size +3.19% on Performance -1.69% on Size 4 layers, 11 attention heads No dropout, raised temperature, soft target loss weighted more 6 layers, 11 attention heads no dropout, low temperature, almost all soft target loss 6 layers, 12 attention heads no dropout, raised temperature, soft target loss weighted more

25. SigOpt. Conﬁdential. Let’s take a quick look at the dashboard

26. SigOpt. Conﬁdential. Was the model able to answer questions?

27. SigOpt. Conﬁdential. Model performance 27

28. SigOpt. Conﬁdential. What did misclassiﬁcations look like? 28

29. SigOpt. Conﬁdential. Let’s take a look at Warsaw 29 SQUAD 2.0

30. SigOpt. Confidential. Why does it matter? 30 By using Multimetric Bayesian Optimization, we’re able to easily understand trade-offs made during compression By understanding these trade-offs, we’re able to choose a model architecture that best suits our needs

31. SigOpt. Conﬁdential. Check out our YouTube channel: Learn more about SigOpt Read our research and product blog. See more videos here. Get free beta access to Experiment Management Join the beta Click Here Upcoming webinars: ● Introducing Experiment Management - Thursday, July 9 at 10am PT

32. SigOpt. Conﬁdential. Thank you! Here’s the repo to reproduce work

Efficient NLP by Distilling BERT and Multimetric Optimization

Recomendados

Recomendados

Más contenido relacionado

Similar a Efficient NLP by Distilling BERT and Multimetric Optimization

Similar a Efficient NLP by Distilling BERT and Multimetric Optimization (20)

Más de SigOpt

Más de SigOpt (20)

Último

Último (20)

Efficient NLP by Distilling BERT and Multimetric Optimization