SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Asset Management for
Machine Learning
Georg Hildebrand
DATA AI
$$$
???
The Big Picture:
ML
=
Knowledge Workers Productivity
x 50 ??
Peter Drucker source
Increase of
Performance
Acceleration through
automatisation The area of knowledge workers
… the story of a model that was sent
as email attachment ...
… the story of features that were not
available anymore ...
*D. Sculley et al., “Hidden technical debt in machine learning
systems,” in Advances in Neural Information Processing
Systems, 2015, pp. 2503–2511.
Pick a complexity for your ML product:
source
System of ML silos
Productivity …
Hidden depth of
machine learning
How to reach
50 x productivity
?
… make models and features human
and machine readable!
Store context
- How was the input data
chosen?
- Who trained the model? Is
there a project paper?
- How to run the model?
- Store log output of model
training
- Framework used to build the
model.
- Chain model and feature
preprocessing!
Version and
test
- Snapshot input data /features
- Version the model (eg. into S3
or a backend database)
- Link the container artifact.
- Provide input and output
tests
- Store performance metrics
- …
Put it into practice
-
There is no one stop shop solution :-/
Manage ML Deployment pipeline:
Example: SeldonIO, serving and integrating ML
Purpose: End to end deployment for serving ML and more, very advanced.
https://github.com/SeldonIO/
Trys use the
modularised way!
May require data to
move around.
Manage Models + Data and serve low latency:
Example: Vespa, storing model together with the data
Purpose: eg. Low latency serving of ML
https://github.com/vespa-engine/vespa
Container node
Query
Application
Package
Admin &
Config
Content node
Deploy
- Configuration
- Components
- ML models
Scatter-gather
Core
sharding
models models models
Moves models to
data!
Manage Project, Code, Data and more:
Purpose: Git for data scientists - manage your code and data together
Example DVC (Data Version Control):
https://github.com/iterative/dvc
Input and output are
tracked --> DAG !!!
$ dvc run python code/xml_to_tsv.py
data/Posts.xml data/Posts.tsv python
* source, why luigi and airflow might not be
enough...
Model tracking, snapshotting and reproducibility:
Purpose: Git for data scientists - turn any repository into a trackable task record with reusable
environments and metrics logging.
Example Datmo:
https://github.com/datmo/datmo
OS, but tight to paid
service!!!
PipelineAI:
https://github.com/PipelineAI/pipeline
Reproducible Model Pipelines + serving:
Purpose: Consistent, Immutable, Reproducible Model Runtimes
Example PipelineAI:
https://github.com/PipelineAI/pipeline
Models are stored in
docker containers!!!
Open Source solutions
for model / workflow
management
Open Source
Project
K8s 1st class
support
# Active
Maintainers
# Commits 1st Commit Comp
Seldon.IO Yes + Kubeflow 3 1131 2017-Feb #6
H2o driverless-ai Yes + Kubeflow 14 22967 2014-Mar #12
PipelineAI Yes. No Kubeflow 1 4846 2015-Jul #273
Polyaxon Yes. No Kubeflow 1 2732 2017-May #88
Vespa-engine No, but possilbe 14 18290 2016-Jun #5854
IBM/FfDL Yes. No Kubeflow 4 bulk-init hidden #77
RiseML Yes. No Kubeflow 3 95 2017-Oct #9
databricks/mlflow No. No Kubeflow 5 bulk-init hidden #58
datmo No, possible 4 bulk-init 2018-04 #213
Data / Feature
Management
Open Source
Project
Note # Active
Maintainers
# Commits 1st Commit Comp
Data Version Control
(DVC)
project and
data
management
2 1364 2017-Mar #readme
pachyderm Data Pipeline,
management
6 11652 2014-09 tbd
quiltdata/quilt Data and
package
management
8 1177 2017-01 tbd
Questions?
Georg Hildebrand, zalando SE
https://github.com/georghildebrand
https://www.linkedin.com/in/georghildebrand/
Example Components
ML Journey
Idea
Explore
Combine and
Extract Data
Analyse and
Preprocess
Model and
train
Test and Monitor
Share and
Improve
There is even more hidden depth:
● Models erode boundaries of modularisation:
context and preprocessing dependencies
● Data Dependencies Cost More than Code Dependencies (unstable data
dependencies, data quality monitoring etc.)
● Complex hardware requirements (GPUs on kubernetes??)
● Feedback Loops (model influences its feature data)
● ML-System Anti-Patterns (glue code, pipeline jungles …)
* Own experience
** D. Sculley et al., “Hidden technical debt in machine learning
systems,” in Advances in Neural Information Processing Systems,
2015, pp. 2503–2511.
*** D. Sculley et al., “Machine Learning: The High-Interest Credit Card
of Technical Debt,” p. 9.

Más contenido relacionado

La actualidad más candente

How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformHow to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformDatabricks
 
Splunk Spark Integration
Splunk Spark IntegrationSplunk Spark Integration
Splunk Spark IntegrationGang Tao
 
Top 10 Enterprise Use Cases for NoSQL
Top 10 Enterprise Use Cases for NoSQLTop 10 Enterprise Use Cases for NoSQL
Top 10 Enterprise Use Cases for NoSQLDATAVERSITY
 
Scalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBERScalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBERShuyi Chen
 
Combining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observabilityCombining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observabilityElasticsearch
 
Alluxio: Data Orchestration on Multi-Cloud
Alluxio: Data Orchestration on Multi-CloudAlluxio: Data Orchestration on Multi-Cloud
Alluxio: Data Orchestration on Multi-CloudJinwook Chung
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Rundeck: The missing tool
Rundeck: The missing toolRundeck: The missing tool
Rundeck: The missing toolArtur Martins
 
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...Amazon Web Services
 
Spark Summit EU talk by Luc Bourlier
Spark Summit EU talk by Luc BourlierSpark Summit EU talk by Luc Bourlier
Spark Summit EU talk by Luc BourlierSpark Summit
 
CNTUG x SDN Meetup #33 Talk 1: 從 Cilium 認識 cgroup ebpf - Ruian
CNTUG x SDN Meetup #33  Talk 1: 從 Cilium 認識 cgroup ebpf - RuianCNTUG x SDN Meetup #33  Talk 1: 從 Cilium 認識 cgroup ebpf - Ruian
CNTUG x SDN Meetup #33 Talk 1: 從 Cilium 認識 cgroup ebpf - RuianHanLing Shen
 
Machine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesMachine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesArun Gupta
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorDatabricks
 
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...Databricks
 
Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas
Flintrock: A Faster, Better spark-ec2 by Nicholas ChammasFlintrock: A Faster, Better spark-ec2 by Nicholas Chammas
Flintrock: A Faster, Better spark-ec2 by Nicholas ChammasSpark Summit
 
Reactive programming with examples
Reactive programming with examplesReactive programming with examples
Reactive programming with examplesPeter Lawrey
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Databricks
 
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerDeep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerSachin Aggarwal
 
Terraform AWS modules and some best practices - September 2019
Terraform AWS modules and some best practices - September 2019Terraform AWS modules and some best practices - September 2019
Terraform AWS modules and some best practices - September 2019Anton Babenko
 

La actualidad más candente (20)

How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformHow to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
 
solar pond
solar pondsolar pond
solar pond
 
Splunk Spark Integration
Splunk Spark IntegrationSplunk Spark Integration
Splunk Spark Integration
 
Top 10 Enterprise Use Cases for NoSQL
Top 10 Enterprise Use Cases for NoSQLTop 10 Enterprise Use Cases for NoSQL
Top 10 Enterprise Use Cases for NoSQL
 
Scalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBERScalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBER
 
Combining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observabilityCombining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observability
 
Alluxio: Data Orchestration on Multi-Cloud
Alluxio: Data Orchestration on Multi-CloudAlluxio: Data Orchestration on Multi-Cloud
Alluxio: Data Orchestration on Multi-Cloud
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Rundeck: The missing tool
Rundeck: The missing toolRundeck: The missing tool
Rundeck: The missing tool
 
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT332) - AWS re:Inv...
 
Spark Summit EU talk by Luc Bourlier
Spark Summit EU talk by Luc BourlierSpark Summit EU talk by Luc Bourlier
Spark Summit EU talk by Luc Bourlier
 
CNTUG x SDN Meetup #33 Talk 1: 從 Cilium 認識 cgroup ebpf - Ruian
CNTUG x SDN Meetup #33  Talk 1: 從 Cilium 認識 cgroup ebpf - RuianCNTUG x SDN Meetup #33  Talk 1: 從 Cilium 認識 cgroup ebpf - Ruian
CNTUG x SDN Meetup #33 Talk 1: 從 Cilium 認識 cgroup ebpf - Ruian
 
Machine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesMachine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and Kubernetes
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
 
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
 
Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas
Flintrock: A Faster, Better spark-ec2 by Nicholas ChammasFlintrock: A Faster, Better spark-ec2 by Nicholas Chammas
Flintrock: A Faster, Better spark-ec2 by Nicholas Chammas
 
Reactive programming with examples
Reactive programming with examplesReactive programming with examples
Reactive programming with examples
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
 
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerDeep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
 
Terraform AWS modules and some best practices - September 2019
Terraform AWS modules and some best practices - September 2019Terraform AWS modules and some best practices - September 2019
Terraform AWS modules and some best practices - September 2019
 

Similar a 2018 data engineering for ml asset management for features and models

Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Luciano Resende
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...ScyllaDB
 
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML PipelinesPyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML PipelinesJim Dowling
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsStavros Kontopoulos
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in ProductionDataWorks Summit
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Sotrender
 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...Henry Saputra
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Tushar Katarki
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Databricks
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...Abhinav Joshi
 
Tech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsTech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsGianmario Spacagna
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureFei Chen
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentDatabricks
 
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Akash Tandon
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsMárton Kodok
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 

Similar a 2018 data engineering for ml asset management for features and models (20)

Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
 
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML PipelinesPyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutions
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
 
03_aiops-1.pptx
03_aiops-1.pptx03_aiops-1.pptx
03_aiops-1.pptx
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
 
Tech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsTech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning products
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 

Último

VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 

Último (20)

VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 

2018 data engineering for ml asset management for features and models

  • 1. Asset Management for Machine Learning Georg Hildebrand
  • 3. The Big Picture: ML = Knowledge Workers Productivity x 50 ?? Peter Drucker source
  • 5. … the story of a model that was sent as email attachment ...
  • 6. … the story of features that were not available anymore ...
  • 7. *D. Sculley et al., “Hidden technical debt in machine learning systems,” in Advances in Neural Information Processing Systems, 2015, pp. 2503–2511. Pick a complexity for your ML product:
  • 8. source System of ML silos Productivity … Hidden depth of machine learning
  • 9. How to reach 50 x productivity ?
  • 10. … make models and features human and machine readable!
  • 11. Store context - How was the input data chosen? - Who trained the model? Is there a project paper? - How to run the model? - Store log output of model training - Framework used to build the model. - Chain model and feature preprocessing!
  • 12. Version and test - Snapshot input data /features - Version the model (eg. into S3 or a backend database) - Link the container artifact. - Provide input and output tests - Store performance metrics - …
  • 13. Put it into practice - There is no one stop shop solution :-/
  • 14. Manage ML Deployment pipeline: Example: SeldonIO, serving and integrating ML Purpose: End to end deployment for serving ML and more, very advanced. https://github.com/SeldonIO/ Trys use the modularised way! May require data to move around.
  • 15. Manage Models + Data and serve low latency: Example: Vespa, storing model together with the data Purpose: eg. Low latency serving of ML https://github.com/vespa-engine/vespa Container node Query Application Package Admin & Config Content node Deploy - Configuration - Components - ML models Scatter-gather Core sharding models models models Moves models to data!
  • 16. Manage Project, Code, Data and more: Purpose: Git for data scientists - manage your code and data together Example DVC (Data Version Control): https://github.com/iterative/dvc Input and output are tracked --> DAG !!! $ dvc run python code/xml_to_tsv.py data/Posts.xml data/Posts.tsv python * source, why luigi and airflow might not be enough...
  • 17. Model tracking, snapshotting and reproducibility: Purpose: Git for data scientists - turn any repository into a trackable task record with reusable environments and metrics logging. Example Datmo: https://github.com/datmo/datmo OS, but tight to paid service!!!
  • 18. PipelineAI: https://github.com/PipelineAI/pipeline Reproducible Model Pipelines + serving: Purpose: Consistent, Immutable, Reproducible Model Runtimes Example PipelineAI: https://github.com/PipelineAI/pipeline Models are stored in docker containers!!!
  • 19. Open Source solutions for model / workflow management Open Source Project K8s 1st class support # Active Maintainers # Commits 1st Commit Comp Seldon.IO Yes + Kubeflow 3 1131 2017-Feb #6 H2o driverless-ai Yes + Kubeflow 14 22967 2014-Mar #12 PipelineAI Yes. No Kubeflow 1 4846 2015-Jul #273 Polyaxon Yes. No Kubeflow 1 2732 2017-May #88 Vespa-engine No, but possilbe 14 18290 2016-Jun #5854 IBM/FfDL Yes. No Kubeflow 4 bulk-init hidden #77 RiseML Yes. No Kubeflow 3 95 2017-Oct #9 databricks/mlflow No. No Kubeflow 5 bulk-init hidden #58 datmo No, possible 4 bulk-init 2018-04 #213
  • 20. Data / Feature Management Open Source Project Note # Active Maintainers # Commits 1st Commit Comp Data Version Control (DVC) project and data management 2 1364 2017-Mar #readme pachyderm Data Pipeline, management 6 11652 2014-09 tbd quiltdata/quilt Data and package management 8 1177 2017-01 tbd
  • 21. Questions? Georg Hildebrand, zalando SE https://github.com/georghildebrand https://www.linkedin.com/in/georghildebrand/
  • 23. ML Journey Idea Explore Combine and Extract Data Analyse and Preprocess Model and train Test and Monitor Share and Improve
  • 24. There is even more hidden depth: ● Models erode boundaries of modularisation: context and preprocessing dependencies ● Data Dependencies Cost More than Code Dependencies (unstable data dependencies, data quality monitoring etc.) ● Complex hardware requirements (GPUs on kubernetes??) ● Feedback Loops (model influences its feature data) ● ML-System Anti-Patterns (glue code, pipeline jungles …) * Own experience ** D. Sculley et al., “Hidden technical debt in machine learning systems,” in Advances in Neural Information Processing Systems, 2015, pp. 2503–2511. *** D. Sculley et al., “Machine Learning: The High-Interest Credit Card of Technical Debt,” p. 9.