SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
ML Monitoring is not APM
Cory A. Johannsen
Product Engineer, Verta Inc.
www.verta.ai
Agenda
▴ What is APM?
▴ What is ML monitoring?
▴ How ML monitoring and APM differ
▴ The unique needs of ML monitoring
▴ A very cool solution to model monitoring from Verta
About
https://www.verta.ai/product
- End-to-end MLOps platform for ML
model delivery, operations and
management
- Kubernetes-based, operations stack
for ML
- 23 years as a software engineer
- Embedded systems, enterprise
software, SaaS
- 6 years in APM working at scale
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.
What is APM?
What is APM?
▴ Application performance Monitoring
▴ Metrics
○ Name
○ Value
○ Labels
○ Timestamp
▴ Visualization
▴ Alerting
What do I care about monitoring in APM?
▴ Health
▴ Availability
▴ Performance
▴ Stability
▴ Notification
APM in practice
▴ Production operations
▴ Diagnostics and debugging
▴ Critical incident response
What is Model Monitoring?
▴ Know when models are failing
▴ Quickly find the root cause
▴ Close the loop by fast recovery
10
Ensuring model results are
consistently of high quality
*We refer to all latency, throughput etc. as model service health
▴ w/o ground truth, model
fails challenging to detect
▴ Need to monitor complex
statistical summaries
▴ Distributions, anomalies,
missing values, quantiles
etc.
▴ Often model-specific
▴ Intelligent detection
and alerting to
pre-emptively identify
issues and trigger
remediations
▴ Execute re-trains,
fallback models, and
human intervention.
11
Know when a model fails Close the loop
▴ A model is one part of a
inference pipeline
▴ Need global view of the
pipeline jungle to see
where the root issue
may be
Quickly find the root cause
How APM and ML monitoring align
▴ Error rate, Throughput, Latency
○ You need to know my production systems are
operational
▴ Visualization
○ You need to see change over time
▴ Alerting
○ You need to know when
something has gone wrong
(and only when something
has gone wrong)
What do you care about in ML Monitoring?
▴ Distribution
○ Training versus test
○ Iteration over iteration
○ Live prediction
▴ Drift
○ Change in Distribution over
time
How APM and ML monitoring differ
▴ Error Rate, Throughput, Latency
○ Necessary, no longer sufficient
▴ Not all work is production work
○ ML monitoring happens from the beginning
of the pipeline
▴ APM can tell you what is wrong
○ ML monitoring is about understanding why
What makes ML monitoring unique
▴ Quantitative analysis of model performance
○ Information you can use
▴ Controlled comparison of distributions
○ Repeatable
○ Reliable
○ Consistent
▴ Alerting on meaningful deviation
○ Actionable
○ Timely
○ Accurate
Only you know the shape of your data
▴ Every model and pipeline is different and specialized
○ You built them, you understand them
▴ You know what metrics and distributions are valuable
○ This is your model, you know the data and processes that created it
▴ You know the expected distributions
○ You can determine whether the behavior is correct
Only you know how to measure change
▴ Compare to reference set
○ Training, test, golden data set
▴ Compare to a baseline
○ Calculate a baseline from your data or production systems
▴ Compare to other
○ Use a comparison that makes sense in your domain
Only you know when a change matters
▴ You know your model and tolerances
▴ You know when a deviation is significant (or not!)
▴ You know when these conditions need to change
Verta understand model monitoring
▴ Designed for your workflows
▴ Easy integration to capture your monitoring data
▴ Visualize and understand your metrics, distributions, and drift
▴ Get alerted when you should - not otherwise
Introducing a generalized
framework for Model Monitoring
Concepts
▴ Monitored Entity: A reference name (e.g. model or pipeline) that you want to
monitor
▴ Profiler: A function that computes statistics about your data
▴ Summary: A collection of statistics about your data (output of profiler)
○ Samples: instance of a summary, i.e., a statistic
○ Labels: key-values attached to summary samples. Used for rich filtering and
aggregation
▴ Alerter: Triggered periodically, it can talk with the Verta API to fetch information
about summaries and identify if they look wrong
How does it work?
1. Define monitored entity: the entity to be monitored (e.g., model, data, pipeline)
2. Define summaries to monitor for the entity
3. Run profilers (manually or automatically) to produce summary samples
4. View samples, define alerts
5. Get alerted (e.g. via Slack)
6. Close the loop!
How does it work?
Time-series DB for
statistical summaries
...
Ground truth
Data/Model
Pipelines
Model (Live)
Remediation
- Retrain
- Rollback
- Human loop
Model (Batch)
Prediction
Log
Summary
▴ Performance monitoring is no longer sufficient for the needs of modern ML systems
○ Model monitoring starts at the beginning of the pipeline and continues through production
○ Batch and live can be addressed in the same framework
▴ Knowing something is wrong is not enough, you need to know why
▴ Timely actionable alerting is mandatory
▴ Building these tools on-site is difficult, error-prone, and expensive
▴ Spark is a fantastic tool to enable model monitoring
Monitor Your Models with Verta
▴ Visit monitoring.verta.ai today and see it in action
▴ Join our community
▴ Get more out of your models
▴ Get more out of your alerts
Thank you.
Cory A. Johannsen
Product Engineer, Verta Inc.
www.verta.ai

Más contenido relacionado

La actualidad más candente

AI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with DatabricksAI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with DatabricksDatabricks
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentDatabricks
 
NLP Text Recommendation System Journey to Automated Training
NLP Text Recommendation System Journey to Automated TrainingNLP Text Recommendation System Journey to Automated Training
NLP Text Recommendation System Journey to Automated TrainingDatabricks
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdDatabricks
 
Detecting Anomalous Behavior with Surveillance​ Analytics​
Detecting Anomalous Behavior with Surveillance​ Analytics​Detecting Anomalous Behavior with Surveillance​ Analytics​
Detecting Anomalous Behavior with Surveillance​ Analytics​Databricks
 
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Rodney Joyce
 
Feature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleFeature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleNoriaki Tatsumi
 
ML-Ops: From Proof-of-Concept to Production Application
ML-Ops: From Proof-of-Concept to Production ApplicationML-Ops: From Proof-of-Concept to Production Application
ML-Ops: From Proof-of-Concept to Production ApplicationHunter Carlisle
 
Advanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using MLAdvanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using MLDatabricks
 
Ml infra at an early stage
Ml infra at an early stageMl infra at an early stage
Ml infra at an early stageNick Handel
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In ProductionSamir Bessalah
 
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentUnified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentDatabricks
 
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...Databricks
 
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilNLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilDatabricks
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to productionGeorg Heiler
 
Big Data at Speed
Big Data at SpeedBig Data at Speed
Big Data at Speedmarkgrover
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017Nisha Talagala
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeIdo Shilon
 

La actualidad más candente (20)

AI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with DatabricksAI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with Databricks
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
NLP Text Recommendation System Journey to Automated Training
NLP Text Recommendation System Journey to Automated TrainingNLP Text Recommendation System Journey to Automated Training
NLP Text Recommendation System Journey to Automated Training
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with Hummingbird
 
Detecting Anomalous Behavior with Surveillance​ Analytics​
Detecting Anomalous Behavior with Surveillance​ Analytics​Detecting Anomalous Behavior with Surveillance​ Analytics​
Detecting Anomalous Behavior with Surveillance​ Analytics​
 
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
 
Feature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleFeature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scale
 
Architecting for Data Science
Architecting for Data ScienceArchitecting for Data Science
Architecting for Data Science
 
ML-Ops: From Proof-of-Concept to Production Application
ML-Ops: From Proof-of-Concept to Production ApplicationML-Ops: From Proof-of-Concept to Production Application
ML-Ops: From Proof-of-Concept to Production Application
 
Advanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using MLAdvanced Model Comparison and Automated Deployment Using ML
Advanced Model Comparison and Automated Deployment Using ML
 
Ml infra at an early stage
Ml infra at an early stageMl infra at an early stage
Ml infra at an early stage
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In Production
 
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentUnified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model Deployment
 
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
 
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilNLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
 
Big Data at Speed
Big Data at SpeedBig Data at Speed
Big Data at Speed
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 

Similar a Why APM Is Not the Same As ML Monitoring

Monitoring Distributed Systems
Monitoring Distributed SystemsMonitoring Distributed Systems
Monitoring Distributed SystemsAleksandr Tavgen
 
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...Agile Testing Alliance
 
Pipeline analytics concept for posting
Pipeline analytics concept for postingPipeline analytics concept for posting
Pipeline analytics concept for postingMark Peco
 
Pipeline analytics concept for posting on linked in
Pipeline analytics concept for posting on linked inPipeline analytics concept for posting on linked in
Pipeline analytics concept for posting on linked inMark Peco
 
Managing the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflowManaging the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflowDatabricks
 
artificggggggggggggggialintelligence.pdf
artificggggggggggggggialintelligence.pdfartificggggggggggggggialintelligence.pdf
artificggggggggggggggialintelligence.pdftt4765690
 
Vgo Sim And Opt
Vgo Sim And OptVgo Sim And Opt
Vgo Sim And Optlksisemore
 
Delivering BAM & BPM With Run-Time Integration
Delivering BAM & BPM With Run-Time IntegrationDelivering BAM & BPM With Run-Time Integration
Delivering BAM & BPM With Run-Time IntegrationNathaniel Palmer
 
Sage - Clinical Laboratory Management System
Sage - Clinical Laboratory Management SystemSage - Clinical Laboratory Management System
Sage - Clinical Laboratory Management SystemGirish Kumar Ayyappath
 
Data drift and machine learning
Data drift and machine learningData drift and machine learning
Data drift and machine learningSmita Agrawal
 
SAS Training session - By Pratima
SAS Training session  -  By Pratima SAS Training session  -  By Pratima
SAS Training session - By Pratima Pratima Pandey
 
The Automation Firehose: Be Strategic and Tactical by Thomas Haver
The Automation Firehose: Be Strategic and Tactical by Thomas HaverThe Automation Firehose: Be Strategic and Tactical by Thomas Haver
The Automation Firehose: Be Strategic and Tactical by Thomas HaverQA or the Highway
 
Data drift and machine learning
Data drift and machine learningData drift and machine learning
Data drift and machine learningSmita Agrawal
 
LIMS_ASQ.pptx
LIMS_ASQ.pptxLIMS_ASQ.pptx
LIMS_ASQ.pptxArta Doci
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro sessionAvinash Patil
 
The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web Testing
The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web TestingThe Automation Firehose: Be Strategic & Tactical With Your Mobile & Web Testing
The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web TestingPerfecto by Perforce
 

Similar a Why APM Is Not the Same As ML Monitoring (20)

Monitoring Distributed Systems
Monitoring Distributed SystemsMonitoring Distributed Systems
Monitoring Distributed Systems
 
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
 
Pipeline analytics concept for posting
Pipeline analytics concept for postingPipeline analytics concept for posting
Pipeline analytics concept for posting
 
Pipeline analytics concept for posting on linked in
Pipeline analytics concept for posting on linked inPipeline analytics concept for posting on linked in
Pipeline analytics concept for posting on linked in
 
Managing the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflowManaging the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflow
 
artificggggggggggggggialintelligence.pdf
artificggggggggggggggialintelligence.pdfartificggggggggggggggialintelligence.pdf
artificggggggggggggggialintelligence.pdf
 
Vgo Sim And Opt
Vgo Sim And OptVgo Sim And Opt
Vgo Sim And Opt
 
Data Science for Retail Broking
Data Science for Retail BrokingData Science for Retail Broking
Data Science for Retail Broking
 
Data Science for Retail Broking
Data Science for Retail BrokingData Science for Retail Broking
Data Science for Retail Broking
 
Delivering BAM & BPM With Run-Time Integration
Delivering BAM & BPM With Run-Time IntegrationDelivering BAM & BPM With Run-Time Integration
Delivering BAM & BPM With Run-Time Integration
 
Predictive analytics roadshow
Predictive analytics roadshowPredictive analytics roadshow
Predictive analytics roadshow
 
Sage - Clinical Laboratory Management System
Sage - Clinical Laboratory Management SystemSage - Clinical Laboratory Management System
Sage - Clinical Laboratory Management System
 
Data drift and machine learning
Data drift and machine learningData drift and machine learning
Data drift and machine learning
 
SAS Training session - By Pratima
SAS Training session  -  By Pratima SAS Training session  -  By Pratima
SAS Training session - By Pratima
 
TRI and OPRA Overview
TRI and OPRA OverviewTRI and OPRA Overview
TRI and OPRA Overview
 
The Automation Firehose: Be Strategic and Tactical by Thomas Haver
The Automation Firehose: Be Strategic and Tactical by Thomas HaverThe Automation Firehose: Be Strategic and Tactical by Thomas Haver
The Automation Firehose: Be Strategic and Tactical by Thomas Haver
 
Data drift and machine learning
Data drift and machine learningData drift and machine learning
Data drift and machine learning
 
LIMS_ASQ.pptx
LIMS_ASQ.pptxLIMS_ASQ.pptx
LIMS_ASQ.pptx
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
 
The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web Testing
The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web TestingThe Automation Firehose: Be Strategic & Tactical With Your Mobile & Web Testing
The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web Testing
 

Más de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueDatabricks
 
Improving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesImproving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesDatabricks
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta LakeDatabricks
 

Más de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
 
Improving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesImproving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot Instances
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 

Último

TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTimothy Spann
 
Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1bengalurutug
 
The market for cross-border mortgages in Europe
The market for cross-border mortgages in EuropeThe market for cross-border mortgages in Europe
The market for cross-border mortgages in Europe321k
 
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdfNeo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdfNeo4j
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-ProfitsTimothy Spann
 
Báo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingBáo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingMarketingTrips
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
Empowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded AnalyticsEmpowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded AnalyticsGain Insights
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptxFurkanTasci3
 
Stochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptxStochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptxjkmrshll88
 
Data Analytics Fundamentals: data analytics types.potx
Data Analytics Fundamentals: data analytics types.potxData Analytics Fundamentals: data analytics types.potx
Data Analytics Fundamentals: data analytics types.potxEmmanuel Dauda
 
Data Collection from Social Media Platforms
Data Collection from Social Media PlatformsData Collection from Social Media Platforms
Data Collection from Social Media PlatformsMahmoud Yasser
 
Brain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptxBrain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptxShammiRai3
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Air Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdfAir Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdfJasonBoboKyaw
 
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...ferisulianta.com
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptxFurkanTasci3
 
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...Neo4j
 

Último (20)

TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
 
Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1
 
The market for cross-border mortgages in Europe
The market for cross-border mortgages in EuropeThe market for cross-border mortgages in Europe
The market for cross-border mortgages in Europe
 
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdfNeo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits
 
Báo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingBáo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân Marketing
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
Empowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded AnalyticsEmpowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded Analytics
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
 
Stochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptxStochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptx
 
Data Analytics Fundamentals: data analytics types.potx
Data Analytics Fundamentals: data analytics types.potxData Analytics Fundamentals: data analytics types.potx
Data Analytics Fundamentals: data analytics types.potx
 
Data Collection from Social Media Platforms
Data Collection from Social Media PlatformsData Collection from Social Media Platforms
Data Collection from Social Media Platforms
 
Target_Company_Data_breach_2013_110million
Target_Company_Data_breach_2013_110millionTarget_Company_Data_breach_2013_110million
Target_Company_Data_breach_2013_110million
 
Brain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptxBrain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptx
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Air Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdfAir Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdf
 
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
 
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
 

Why APM Is Not the Same As ML Monitoring

  • 1. ML Monitoring is not APM Cory A. Johannsen Product Engineer, Verta Inc. www.verta.ai
  • 2. Agenda ▴ What is APM? ▴ What is ML monitoring? ▴ How ML monitoring and APM differ ▴ The unique needs of ML monitoring ▴ A very cool solution to model monitoring from Verta
  • 3. About https://www.verta.ai/product - End-to-end MLOps platform for ML model delivery, operations and management - Kubernetes-based, operations stack for ML - 23 years as a software engineer - Embedded systems, enterprise software, SaaS - 6 years in APM working at scale
  • 4. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.
  • 6. What is APM? ▴ Application performance Monitoring ▴ Metrics ○ Name ○ Value ○ Labels ○ Timestamp ▴ Visualization ▴ Alerting
  • 7. What do I care about monitoring in APM? ▴ Health ▴ Availability ▴ Performance ▴ Stability ▴ Notification
  • 8. APM in practice ▴ Production operations ▴ Diagnostics and debugging ▴ Critical incident response
  • 9. What is Model Monitoring?
  • 10. ▴ Know when models are failing ▴ Quickly find the root cause ▴ Close the loop by fast recovery 10 Ensuring model results are consistently of high quality *We refer to all latency, throughput etc. as model service health
  • 11. ▴ w/o ground truth, model fails challenging to detect ▴ Need to monitor complex statistical summaries ▴ Distributions, anomalies, missing values, quantiles etc. ▴ Often model-specific ▴ Intelligent detection and alerting to pre-emptively identify issues and trigger remediations ▴ Execute re-trains, fallback models, and human intervention. 11 Know when a model fails Close the loop ▴ A model is one part of a inference pipeline ▴ Need global view of the pipeline jungle to see where the root issue may be Quickly find the root cause
  • 12. How APM and ML monitoring align ▴ Error rate, Throughput, Latency ○ You need to know my production systems are operational ▴ Visualization ○ You need to see change over time ▴ Alerting ○ You need to know when something has gone wrong (and only when something has gone wrong)
  • 13. What do you care about in ML Monitoring? ▴ Distribution ○ Training versus test ○ Iteration over iteration ○ Live prediction ▴ Drift ○ Change in Distribution over time
  • 14. How APM and ML monitoring differ ▴ Error Rate, Throughput, Latency ○ Necessary, no longer sufficient ▴ Not all work is production work ○ ML monitoring happens from the beginning of the pipeline ▴ APM can tell you what is wrong ○ ML monitoring is about understanding why
  • 15. What makes ML monitoring unique ▴ Quantitative analysis of model performance ○ Information you can use ▴ Controlled comparison of distributions ○ Repeatable ○ Reliable ○ Consistent ▴ Alerting on meaningful deviation ○ Actionable ○ Timely ○ Accurate
  • 16. Only you know the shape of your data ▴ Every model and pipeline is different and specialized ○ You built them, you understand them ▴ You know what metrics and distributions are valuable ○ This is your model, you know the data and processes that created it ▴ You know the expected distributions ○ You can determine whether the behavior is correct
  • 17. Only you know how to measure change ▴ Compare to reference set ○ Training, test, golden data set ▴ Compare to a baseline ○ Calculate a baseline from your data or production systems ▴ Compare to other ○ Use a comparison that makes sense in your domain
  • 18. Only you know when a change matters ▴ You know your model and tolerances ▴ You know when a deviation is significant (or not!) ▴ You know when these conditions need to change
  • 19. Verta understand model monitoring ▴ Designed for your workflows ▴ Easy integration to capture your monitoring data ▴ Visualize and understand your metrics, distributions, and drift ▴ Get alerted when you should - not otherwise
  • 20. Introducing a generalized framework for Model Monitoring
  • 21. Concepts ▴ Monitored Entity: A reference name (e.g. model or pipeline) that you want to monitor ▴ Profiler: A function that computes statistics about your data ▴ Summary: A collection of statistics about your data (output of profiler) ○ Samples: instance of a summary, i.e., a statistic ○ Labels: key-values attached to summary samples. Used for rich filtering and aggregation ▴ Alerter: Triggered periodically, it can talk with the Verta API to fetch information about summaries and identify if they look wrong
  • 22. How does it work? 1. Define monitored entity: the entity to be monitored (e.g., model, data, pipeline) 2. Define summaries to monitor for the entity 3. Run profilers (manually or automatically) to produce summary samples 4. View samples, define alerts 5. Get alerted (e.g. via Slack) 6. Close the loop!
  • 23. How does it work? Time-series DB for statistical summaries ... Ground truth Data/Model Pipelines Model (Live) Remediation - Retrain - Rollback - Human loop Model (Batch) Prediction Log
  • 24. Summary ▴ Performance monitoring is no longer sufficient for the needs of modern ML systems ○ Model monitoring starts at the beginning of the pipeline and continues through production ○ Batch and live can be addressed in the same framework ▴ Knowing something is wrong is not enough, you need to know why ▴ Timely actionable alerting is mandatory ▴ Building these tools on-site is difficult, error-prone, and expensive ▴ Spark is a fantastic tool to enable model monitoring
  • 25. Monitor Your Models with Verta ▴ Visit monitoring.verta.ai today and see it in action ▴ Join our community ▴ Get more out of your models ▴ Get more out of your alerts
  • 26. Thank you. Cory A. Johannsen Product Engineer, Verta Inc. www.verta.ai