SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
www.verta.ai Confidential
Model Versioning Done Right:
A ModelDB 2.0 Walkthrough
Manasi Vartak, Ph.D.
CEO, Verta.ai
!1
www.verta.ai Confidential
Agenda
• Welcome & Logistics (5 mins)
• Model Versioning Overview (20 mins)
• Hands-on tutorial (20 mins)
• Wrap-up (5 mins)
!2
www.verta.ai Confidential
Welcome & Logistics
!3
www.verta.ai Confidential
About
• MIT CSAIL Ph.D.
• Thesis work: Infrastructure for
Model Management and Diagnosis
• Built first open-source model
management system: ModelDB
• Used by research labs,
Fortune-500s
• Founded Verta.ai to build ML Infra
!4
• Building the operations stack for production ML
• Model versioning, deployment & ops,
monitoring
• Built for data science; able to run at large scale
www.verta.ai Confidential
Logistics: This webinar will be hands-on!
!5
bit.ly/verta-webinar-1-instructions
www.verta.ai Confidential
Model Versioning Overview
!6
www.verta.ai Confidential
Data Scientists & ML Engineers build 100s of models
!7
www.verta.ai Confidential!8
Challenge: Model Development is empirical & ad-hoc
Model 1
Accuracy: 62%
www.verta.ai Confidential!9
Challenge: Model Development is empirical & ad-hoc
Model 3
Accuracy: 76%
val udf1: (Int => Int) = (delayed..)
df.withColumn(“timesDelayed”, udf1)
RandomForestClassifier
www.verta.ai Confidential!10
Challenge: Model Development is empirical & ad-hoc
Model 5
Accuracy: 68%
val udf1: (Int => Int) = (delayed..)
df.withColumn(“timesDelayed”, udf1)
RandomForestClassifier
credit-default-clean.csv
val lrGrid = new ParamGridBuilder()
.addGrid(rf.maxDepth, Array(5, 10, 15))
.addGrid(rf.numTrees, Array(50, 100))
www.verta.ai Confidential!11
Challenge: Model Development is empirical & ad-hoc
Model 50
Accuracy: 82%
val udf1: (Int => Int) = (delayed..)
df.withColumn(“timesDelayed”, udf1)
RandomForestClassifier
credit-default-clean.csv
val lrGrid = new ParamGridBuilder()
.addGrid(rf.maxDepth, Array(5, 10, 15))
.addGrid(rf.numTrees, Array(50, 100))
val labelIndexer1 = new LabelIndexer()
val labelIndexer2 = new LabelIndexer()
…
val udf1: (Int => Int) = (delayed..)
val udf2: (String, Int) = …
df.withColumn(“timesDelayed”, udf1)
.withColumn(“percentPaid”, udf2)
.withColumn(“creditUsed”, udf3)
val scaler = new StandardScaler()
.setInputCol(“features”) …
www.verta.ai Confidential
SOTA in Model Versioning
!12
www.verta.ai Confidential
Why is this a problem?
Scientific rigor
• “How do I or colleagues reproduce this work / this model?”

Production Safety
• “What version of the model is deployed?”

• “This model is breaking, we need to rollback”

Speed of Delivery
• With the right versioning, you can use CI/CD tools

Insurance
• “If Jane leaves the team, how do I preserve the knowledge?”

Productivity 

• “What experiments have I run before?”
!13
www.verta.ai Confidential
Model Version Control
• Manage changes to models over time
• Audit, Change logs, Immutability
• Uniquely identify a model and the ability to return to it at any time
• Reproducibility, Governance
• Enable multiple data scientists to contribute and reuse
• Collaboration, Blame
!14
www.verta.ai Confidential
The software industry was built on the back of code versioning
• First code version system in 1972 for Linux

• ~5 generations of code versioning systems until now
!15
SCCS
RCS
Perforce
www.verta.ai Confidential
So what should we version?
!16
Source Code Binary Container
Software
Code + Other Ingredients
Weights /
Checkpoints
Container
Machine
Learning
www.verta.ai Confidential
Ingredients of a model
!17
1. Code
www.verta.ai Confidential
Ingredients of a model
!18
2. Config
Same Code, Diff
models
www.verta.ai Confidential
Ingredients of a model
!19
3. Data
Data changes
underneath the model
www.verta.ai Confidential
Ingredients of a model
!20
4. Environment
www.verta.ai Confidential
Versioning vs. Metadata vs. Experiment Management
!21
Reproducibility 

• Uniquely identify a model

• Track every change

• Go back in time
Analytics
• Metrics, Performance

• Organizational data

• Searching, sorting
Applications
Experiment
Management
Compliance
Metadata
Versioning
Collaboration
run.metrics[‘accuracy’] = 0.7
run.tags = [‘demo’, ‘nlp’]
run = run_experiment(code,
data, config, environment)
www.verta.ai Confidential
Model Versioning in ModelDB 2.0
!22
www.verta.ai Confidential
ModelDB
• ModelDB 1.0: Metadata focused
• What did it manage?
• Hyperparameters
• Metrics
• Pipeline metadata
• Tags
!23
• ModelDB 2.0: Versioning focused
• Git-like constructs for versioning
• Commits, Forks, Merges
• Metadata tracking as before
• APIs for applications
• Integration into MLOps
https://github.com/VertaAI/modeldb/
www.verta.ai Confidential
Why not Git?
• Git is fantastic for code
• Git doesn’t handle large files very well; DBMSs can’t be stored on Git
• Needs to be accessed as a library vs. command-line
• Git is distributed, each user must download the full repo history (1MM
models)
• All commits & branches (even unsuccessful ones) need to be preserved
!24
www.verta.ai Confidential
ModelDB 2.0: A “super repo”
!25
ModelDB Repo
www.verta.ai Confidential
ModelDB 2.0: A “super repo”
!26
ModelDB Repo
Config RepoEnv Repo
Data Repo Code Repo
Delegate operations to
specialized versioning
systems
??
?? ??
Where systems don’t
exist, we have created
light-weight systems
www.verta.ai Confidential
ModelDB 2.0: A “super repo”
!27
ModelDB Repo
Config RepoEnv Repo
Data Repo Code Repo
Delegate operations to
specialized versioning
systems
Where systems don’t
exist, we have created
light-weight systems
Company
FeatureStore
www.verta.ai Confidential
What does a ModelDB Version look like?
!28
census-repo 5y64er mdb://census-repo
data-train 34ae59 s3://census-data
code ke87a5 git://github.com/census
config 1ee3pk config://census-repo
env-train 5le723 env://census-repo
census-repo 5y64er mdb://census-repo
data-train 34ae59 s3://census-data
code ke87a5 git://github.com/census
config 1ee3pk config://census-repo
env-train 5le723 env://census-repo
annotation-data 67er44 s3://census-annotations
annotation-code 8bh651 git://github.com/annotation-code
www.verta.ai Confidential
How do you re-constitute a model?
!29
> mdb checkout mdb://census-repo@5y64er
> git checkout git://github.com/census@ke87a5
> docker --build env://census-repo@5le723
> docker run train.py --data s3://census-data@34ae59
--config config://census-repo@1ee3pk
census-repo 5y64er mdb://census-repo
data-train 34ae59 s3://census-data
code ke87a5 git://github.com/census
config 1ee3pk config://census-repo
env-train 5le723 env://census-repo
www.verta.ai Confidential
Advantages of a “super repo”
!30
ModelDB Repo
Config RepoEnv Repo
Data Repo Code Repo
Revert
RevertRevert
RevertRevert
www.verta.ai Confidential
Advantages of a super repo
!31
ModelDB Repo
Config RepoEnv Repo
Data Repo Code Repo
Merge
Merge
MergeMerge
Merge
www.verta.ai Confidential
How do you fork and merge?
!32
census-repo 5y64er mdb://census-repo
data-train 34ae59 s3://census-data
code ke87a5 git://github.com/
censusconfig 1ee3pk config://census-repo
env-train 5le723 env://census-repo
Fork
Fork
census-repo 5y64er
data-train g3e4rr
code ke87a5
config 1ee3pk
env-train 5le723
census-repo 5y64er
data-train 34ae59
code r77fef
config kj7h14
env-train 5le723
census-repo 5y64er
data-train g3e4rr
code r77fef
config kj7h14
env-train 5le723
Merge
www.verta.ai Confidential
Hands-on Tutorial
!33
www.verta.ai Confidential
You will need…
!34
bit.ly/verta-webinar-1-instructions
www.verta.ai Confidential
That’s a wrap!
!35
www.verta.ai Confidential
Thanks!
• ModelDB 2.0 is constantly updated
• Use it (Star, fork, etc., let us know about use cases)
• Give us feedback, partner with us
• Contribute to it
• We love this stuff — hit us up if we can do a tech talk / help you get set up!
• If you are looking for a hosted version, drop us a line: modeldb@verta.ai
!36
www.verta.ai Confidential
Thanks!
!37
https://github.com/VertaAI/modeldb | Slack: http://bit.ly/modeldb-mlops | https://verta.ai
manasi@verta.ai | @DataCereal | modeldb@verta.ai

Más contenido relacionado

La actualidad más candente

[AI] ML Operationalization with Microsoft Azure
[AI] ML Operationalization with Microsoft Azure[AI] ML Operationalization with Microsoft Azure
[AI] ML Operationalization with Microsoft AzureKorkrid Akepanidtaworn
 
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflowContinuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflowDatabricks
 
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...Databricks
 
Hamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreHamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreMoritz Meister
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro sessionAvinash Patil
 
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...Databricks
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOpsRui Quintino
 
Magdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningMagdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningLviv Startup Club
 
ML-Ops: Philosophy, Best-Practices and Tools
ML-Ops:Philosophy, Best-Practices and ToolsML-Ops:Philosophy, Best-Practices and Tools
ML-Ops: Philosophy, Best-Practices and ToolsJorge Davila-Chacon
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operationsStepan Pushkarev
 
Simplifying AI integration on Apache Spark
Simplifying AI integration on Apache SparkSimplifying AI integration on Apache Spark
Simplifying AI integration on Apache SparkDatabricks
 
Productionalizing Models through CI/CD Design with MLflow
Productionalizing Models through CI/CD Design with MLflowProductionalizing Models through CI/CD Design with MLflow
Productionalizing Models through CI/CD Design with MLflowDatabricks
 
Weave GitOps - continuous delivery for any Kubernetes
Weave GitOps - continuous delivery for any KubernetesWeave GitOps - continuous delivery for any Kubernetes
Weave GitOps - continuous delivery for any KubernetesWeaveworks
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle Databricks
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningStepan Pushkarev
 
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)Jasjeet Thind
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Productioniguazio
 

La actualidad más candente (20)

[AI] ML Operationalization with Microsoft Azure
[AI] ML Operationalization with Microsoft Azure[AI] ML Operationalization with Microsoft Azure
[AI] ML Operationalization with Microsoft Azure
 
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflowContinuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
 
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
 
Hamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreHamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature Store
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
 
MLOps with Kubeflow
MLOps with Kubeflow MLOps with Kubeflow
MLOps with Kubeflow
 
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps
 
Magdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningMagdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine Learning
 
ML-Ops: Philosophy, Best-Practices and Tools
ML-Ops:Philosophy, Best-Practices and ToolsML-Ops:Philosophy, Best-Practices and Tools
ML-Ops: Philosophy, Best-Practices and Tools
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operations
 
Simplifying AI integration on Apache Spark
Simplifying AI integration on Apache SparkSimplifying AI integration on Apache Spark
Simplifying AI integration on Apache Spark
 
Productionalizing Models through CI/CD Design with MLflow
Productionalizing Models through CI/CD Design with MLflowProductionalizing Models through CI/CD Design with MLflow
Productionalizing Models through CI/CD Design with MLflow
 
Weave GitOps - continuous delivery for any Kubernetes
Weave GitOps - continuous delivery for any KubernetesWeave GitOps - continuous delivery for any Kubernetes
Weave GitOps - continuous delivery for any Kubernetes
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learning
 
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
 

Similar a Model versioning done right: A ModelDB 2.0 Walkthrough

Why real integration developers ride Camels
Why real integration developers ride CamelsWhy real integration developers ride Camels
Why real integration developers ride CamelsChristian Posta
 
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as CodeConfoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as CodeSteve Mercier
 
Cloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a CacheCloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a Cachecornelia davis
 
Azureサーバーレスで行う情報のスクリーニング
Azureサーバーレスで行う情報のスクリーニングAzureサーバーレスで行う情報のスクリーニング
Azureサーバーレスで行う情報のスクリーニングryosuke matsumura
 
Cincom Smalltalk News
Cincom Smalltalk NewsCincom Smalltalk News
Cincom Smalltalk NewsESUG
 
ThatConference 2016 - Highly Available Node.js
ThatConference 2016 - Highly Available Node.jsThatConference 2016 - Highly Available Node.js
ThatConference 2016 - Highly Available Node.jsBrad Williams
 
DevOps Evolution - The Next Generation ?
DevOps Evolution - The Next Generation ?DevOps Evolution - The Next Generation ?
DevOps Evolution - The Next Generation ?Marc Hornbeek
 
Relational Database CI/CD
Relational Database CI/CDRelational Database CI/CD
Relational Database CI/CDJasmin Fluri
 
Five Ways to Fix Your SQL Server Dev-Test Problems
Five Ways to Fix Your SQL Server Dev-Test Problems Five Ways to Fix Your SQL Server Dev-Test Problems
Five Ways to Fix Your SQL Server Dev-Test Problems Catalogic Software
 
AI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with KnativeAI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with KnativeAnimesh Singh
 
Kubo (Cloud Foundry Container Platform): Your Gateway Drug to Cloud-native
Kubo (Cloud Foundry Container Platform): Your Gateway Drug to Cloud-nativeKubo (Cloud Foundry Container Platform): Your Gateway Drug to Cloud-native
Kubo (Cloud Foundry Container Platform): Your Gateway Drug to Cloud-nativecornelia davis
 
Kubo (Cloud Foundry Container Platform): Your Gateway Drug to Cloud-native
Kubo (Cloud Foundry Container Platform): Your Gateway Drug to Cloud-nativeKubo (Cloud Foundry Container Platform): Your Gateway Drug to Cloud-native
Kubo (Cloud Foundry Container Platform): Your Gateway Drug to Cloud-nativeVMware Tanzu
 
Single Source of Truth for Network Automation
Single Source of Truth for Network AutomationSingle Source of Truth for Network Automation
Single Source of Truth for Network AutomationAndy Davidson
 
Continuous Integration
Continuous IntegrationContinuous Integration
Continuous IntegrationXPDays
 
The Science of database CICD - UKOUG Breakthrough
The Science of database CICD - UKOUG BreakthroughThe Science of database CICD - UKOUG Breakthrough
The Science of database CICD - UKOUG BreakthroughJasmin Fluri
 
IBM InterConnect 2015 - IIB in the Cloud
IBM InterConnect 2015 - IIB in the CloudIBM InterConnect 2015 - IIB in the Cloud
IBM InterConnect 2015 - IIB in the CloudAndrew Coleman
 
London DevOps Meetup - PaaS as a platform for devops
London DevOps Meetup - PaaS as a platform for devopsLondon DevOps Meetup - PaaS as a platform for devops
London DevOps Meetup - PaaS as a platform for devopsJeremy Brown
 
OSDC 2013 | Introduction into Chef by Andy Hawkins
OSDC 2013 | Introduction into Chef by Andy HawkinsOSDC 2013 | Introduction into Chef by Andy Hawkins
OSDC 2013 | Introduction into Chef by Andy HawkinsNETWAYS
 

Similar a Model versioning done right: A ModelDB 2.0 Walkthrough (20)

DevOps demystified
DevOps demystifiedDevOps demystified
DevOps demystified
 
Why real integration developers ride Camels
Why real integration developers ride CamelsWhy real integration developers ride Camels
Why real integration developers ride Camels
 
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as CodeConfoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as Code
 
Introduction to DevOps
Introduction to DevOpsIntroduction to DevOps
Introduction to DevOps
 
Cloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a CacheCloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a Cache
 
Azureサーバーレスで行う情報のスクリーニング
Azureサーバーレスで行う情報のスクリーニングAzureサーバーレスで行う情報のスクリーニング
Azureサーバーレスで行う情報のスクリーニング
 
Cincom Smalltalk News
Cincom Smalltalk NewsCincom Smalltalk News
Cincom Smalltalk News
 
ThatConference 2016 - Highly Available Node.js
ThatConference 2016 - Highly Available Node.jsThatConference 2016 - Highly Available Node.js
ThatConference 2016 - Highly Available Node.js
 
DevOps Evolution - The Next Generation ?
DevOps Evolution - The Next Generation ?DevOps Evolution - The Next Generation ?
DevOps Evolution - The Next Generation ?
 
Relational Database CI/CD
Relational Database CI/CDRelational Database CI/CD
Relational Database CI/CD
 
Five Ways to Fix Your SQL Server Dev-Test Problems
Five Ways to Fix Your SQL Server Dev-Test Problems Five Ways to Fix Your SQL Server Dev-Test Problems
Five Ways to Fix Your SQL Server Dev-Test Problems
 
AI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with KnativeAI & Machine Learning Pipelines with Knative
AI & Machine Learning Pipelines with Knative
 
Kubo (Cloud Foundry Container Platform): Your Gateway Drug to Cloud-native
Kubo (Cloud Foundry Container Platform): Your Gateway Drug to Cloud-nativeKubo (Cloud Foundry Container Platform): Your Gateway Drug to Cloud-native
Kubo (Cloud Foundry Container Platform): Your Gateway Drug to Cloud-native
 
Kubo (Cloud Foundry Container Platform): Your Gateway Drug to Cloud-native
Kubo (Cloud Foundry Container Platform): Your Gateway Drug to Cloud-nativeKubo (Cloud Foundry Container Platform): Your Gateway Drug to Cloud-native
Kubo (Cloud Foundry Container Platform): Your Gateway Drug to Cloud-native
 
Single Source of Truth for Network Automation
Single Source of Truth for Network AutomationSingle Source of Truth for Network Automation
Single Source of Truth for Network Automation
 
Continuous Integration
Continuous IntegrationContinuous Integration
Continuous Integration
 
The Science of database CICD - UKOUG Breakthrough
The Science of database CICD - UKOUG BreakthroughThe Science of database CICD - UKOUG Breakthrough
The Science of database CICD - UKOUG Breakthrough
 
IBM InterConnect 2015 - IIB in the Cloud
IBM InterConnect 2015 - IIB in the CloudIBM InterConnect 2015 - IIB in the Cloud
IBM InterConnect 2015 - IIB in the Cloud
 
London DevOps Meetup - PaaS as a platform for devops
London DevOps Meetup - PaaS as a platform for devopsLondon DevOps Meetup - PaaS as a platform for devops
London DevOps Meetup - PaaS as a platform for devops
 
OSDC 2013 | Introduction into Chef by Andy Hawkins
OSDC 2013 | Introduction into Chef by Andy HawkinsOSDC 2013 | Introduction into Chef by Andy Hawkins
OSDC 2013 | Introduction into Chef by Andy Hawkins
 

Último

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 

Último (20)

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 

Model versioning done right: A ModelDB 2.0 Walkthrough

  • 1. www.verta.ai Confidential Model Versioning Done Right: A ModelDB 2.0 Walkthrough Manasi Vartak, Ph.D. CEO, Verta.ai !1
  • 2. www.verta.ai Confidential Agenda • Welcome & Logistics (5 mins) • Model Versioning Overview (20 mins) • Hands-on tutorial (20 mins) • Wrap-up (5 mins) !2
  • 4. www.verta.ai Confidential About • MIT CSAIL Ph.D. • Thesis work: Infrastructure for Model Management and Diagnosis • Built first open-source model management system: ModelDB • Used by research labs, Fortune-500s • Founded Verta.ai to build ML Infra !4 • Building the operations stack for production ML • Model versioning, deployment & ops, monitoring • Built for data science; able to run at large scale
  • 5. www.verta.ai Confidential Logistics: This webinar will be hands-on! !5 bit.ly/verta-webinar-1-instructions
  • 7. www.verta.ai Confidential Data Scientists & ML Engineers build 100s of models !7
  • 8. www.verta.ai Confidential!8 Challenge: Model Development is empirical & ad-hoc Model 1 Accuracy: 62%
  • 9. www.verta.ai Confidential!9 Challenge: Model Development is empirical & ad-hoc Model 3 Accuracy: 76% val udf1: (Int => Int) = (delayed..) df.withColumn(“timesDelayed”, udf1) RandomForestClassifier
  • 10. www.verta.ai Confidential!10 Challenge: Model Development is empirical & ad-hoc Model 5 Accuracy: 68% val udf1: (Int => Int) = (delayed..) df.withColumn(“timesDelayed”, udf1) RandomForestClassifier credit-default-clean.csv val lrGrid = new ParamGridBuilder() .addGrid(rf.maxDepth, Array(5, 10, 15)) .addGrid(rf.numTrees, Array(50, 100))
  • 11. www.verta.ai Confidential!11 Challenge: Model Development is empirical & ad-hoc Model 50 Accuracy: 82% val udf1: (Int => Int) = (delayed..) df.withColumn(“timesDelayed”, udf1) RandomForestClassifier credit-default-clean.csv val lrGrid = new ParamGridBuilder() .addGrid(rf.maxDepth, Array(5, 10, 15)) .addGrid(rf.numTrees, Array(50, 100)) val labelIndexer1 = new LabelIndexer() val labelIndexer2 = new LabelIndexer() … val udf1: (Int => Int) = (delayed..) val udf2: (String, Int) = … df.withColumn(“timesDelayed”, udf1) .withColumn(“percentPaid”, udf2) .withColumn(“creditUsed”, udf3) val scaler = new StandardScaler() .setInputCol(“features”) …
  • 12. www.verta.ai Confidential SOTA in Model Versioning !12
  • 13. www.verta.ai Confidential Why is this a problem? Scientific rigor • “How do I or colleagues reproduce this work / this model?” Production Safety • “What version of the model is deployed?” • “This model is breaking, we need to rollback” Speed of Delivery • With the right versioning, you can use CI/CD tools Insurance • “If Jane leaves the team, how do I preserve the knowledge?” Productivity • “What experiments have I run before?” !13
  • 14. www.verta.ai Confidential Model Version Control • Manage changes to models over time • Audit, Change logs, Immutability • Uniquely identify a model and the ability to return to it at any time • Reproducibility, Governance • Enable multiple data scientists to contribute and reuse • Collaboration, Blame !14
  • 15. www.verta.ai Confidential The software industry was built on the back of code versioning • First code version system in 1972 for Linux • ~5 generations of code versioning systems until now !15 SCCS RCS Perforce
  • 16. www.verta.ai Confidential So what should we version? !16 Source Code Binary Container Software Code + Other Ingredients Weights / Checkpoints Container Machine Learning
  • 18. www.verta.ai Confidential Ingredients of a model !18 2. Config Same Code, Diff models
  • 19. www.verta.ai Confidential Ingredients of a model !19 3. Data Data changes underneath the model
  • 20. www.verta.ai Confidential Ingredients of a model !20 4. Environment
  • 21. www.verta.ai Confidential Versioning vs. Metadata vs. Experiment Management !21 Reproducibility • Uniquely identify a model • Track every change • Go back in time Analytics • Metrics, Performance • Organizational data • Searching, sorting Applications Experiment Management Compliance Metadata Versioning Collaboration run.metrics[‘accuracy’] = 0.7 run.tags = [‘demo’, ‘nlp’] run = run_experiment(code, data, config, environment)
  • 23. www.verta.ai Confidential ModelDB • ModelDB 1.0: Metadata focused • What did it manage? • Hyperparameters • Metrics • Pipeline metadata • Tags !23 • ModelDB 2.0: Versioning focused • Git-like constructs for versioning • Commits, Forks, Merges • Metadata tracking as before • APIs for applications • Integration into MLOps https://github.com/VertaAI/modeldb/
  • 24. www.verta.ai Confidential Why not Git? • Git is fantastic for code • Git doesn’t handle large files very well; DBMSs can’t be stored on Git • Needs to be accessed as a library vs. command-line • Git is distributed, each user must download the full repo history (1MM models) • All commits & branches (even unsuccessful ones) need to be preserved !24
  • 25. www.verta.ai Confidential ModelDB 2.0: A “super repo” !25 ModelDB Repo
  • 26. www.verta.ai Confidential ModelDB 2.0: A “super repo” !26 ModelDB Repo Config RepoEnv Repo Data Repo Code Repo Delegate operations to specialized versioning systems ?? ?? ?? Where systems don’t exist, we have created light-weight systems
  • 27. www.verta.ai Confidential ModelDB 2.0: A “super repo” !27 ModelDB Repo Config RepoEnv Repo Data Repo Code Repo Delegate operations to specialized versioning systems Where systems don’t exist, we have created light-weight systems Company FeatureStore
  • 28. www.verta.ai Confidential What does a ModelDB Version look like? !28 census-repo 5y64er mdb://census-repo data-train 34ae59 s3://census-data code ke87a5 git://github.com/census config 1ee3pk config://census-repo env-train 5le723 env://census-repo census-repo 5y64er mdb://census-repo data-train 34ae59 s3://census-data code ke87a5 git://github.com/census config 1ee3pk config://census-repo env-train 5le723 env://census-repo annotation-data 67er44 s3://census-annotations annotation-code 8bh651 git://github.com/annotation-code
  • 29. www.verta.ai Confidential How do you re-constitute a model? !29 > mdb checkout mdb://census-repo@5y64er > git checkout git://github.com/census@ke87a5 > docker --build env://census-repo@5le723 > docker run train.py --data s3://census-data@34ae59 --config config://census-repo@1ee3pk census-repo 5y64er mdb://census-repo data-train 34ae59 s3://census-data code ke87a5 git://github.com/census config 1ee3pk config://census-repo env-train 5le723 env://census-repo
  • 30. www.verta.ai Confidential Advantages of a “super repo” !30 ModelDB Repo Config RepoEnv Repo Data Repo Code Repo Revert RevertRevert RevertRevert
  • 31. www.verta.ai Confidential Advantages of a super repo !31 ModelDB Repo Config RepoEnv Repo Data Repo Code Repo Merge Merge MergeMerge Merge
  • 32. www.verta.ai Confidential How do you fork and merge? !32 census-repo 5y64er mdb://census-repo data-train 34ae59 s3://census-data code ke87a5 git://github.com/ censusconfig 1ee3pk config://census-repo env-train 5le723 env://census-repo Fork Fork census-repo 5y64er data-train g3e4rr code ke87a5 config 1ee3pk env-train 5le723 census-repo 5y64er data-train 34ae59 code r77fef config kj7h14 env-train 5le723 census-repo 5y64er data-train g3e4rr code r77fef config kj7h14 env-train 5le723 Merge
  • 34. www.verta.ai Confidential You will need… !34 bit.ly/verta-webinar-1-instructions
  • 36. www.verta.ai Confidential Thanks! • ModelDB 2.0 is constantly updated • Use it (Star, fork, etc., let us know about use cases) • Give us feedback, partner with us • Contribute to it • We love this stuff — hit us up if we can do a tech talk / help you get set up! • If you are looking for a hosted version, drop us a line: modeldb@verta.ai !36
  • 37. www.verta.ai Confidential Thanks! !37 https://github.com/VertaAI/modeldb | Slack: http://bit.ly/modeldb-mlops | https://verta.ai manasi@verta.ai | @DataCereal | modeldb@verta.ai