SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Asset Management for
Machine Learning
Georg Hildebrand
DATA AI
$$$
???
The Big Picture:
ML
=
Knowledge Workers Productivity
x 50 ??
Peter Drucker source
Increase of
Performance
Acceleration through
automatisation The area of knowledge workers
… the story of a model that was sent
as email attachment ...
… the story of features that were not
available anymore ...
*D. Sculley et al., “Hidden technical debt in machine learning
systems,” in Advances in Neural Information Processing
Systems, 2015, pp. 2503–2511.
Pick a complexity for your ML product:
source
System of ML silos
Productivity …
Hidden depth of
machine learning
How to reach
50 x productivity
?
… make models and features human
and machine readable!
Store context
- How was the input data
chosen?
- Who trained the model? Is
there a project paper?
- How to run the model?
- Store log output of model
training
- Framework used to build the
model.
- Chain model and feature
preprocessing!
Version and
test
- Snapshot input data /features
- Version the model (eg. into S3
or a backend database)
- Link the container artifact.
- Provide input and output
tests
- Store performance metrics
- …
Put it into practice
-
There is no one stop shop solution :-/
Manage ML Deployment pipeline:
Example: SeldonIO, serving and integrating ML
Purpose: End to end deployment for serving ML and more, very advanced.
https://github.com/SeldonIO/
Trys use the
modularised way!
May require data to
move around.
Manage Models + Data and serve low latency:
Example: Vespa, storing model together with the data
Purpose: eg. Low latency serving of ML
https://github.com/vespa-engine/vespa
Container node
Query
Application
Package
Admin &
Config
Content node
Deploy
- Configuration
- Components
- ML models
Scatter-gather
Core
sharding
models models models
Moves models to
data!
Manage Project, Code, Data and more:
Purpose: Git for data scientists - manage your code and data together
Example DVC (Data Version Control):
https://github.com/iterative/dvc
Input and output are
tracked --> DAG !!!
$ dvc run python code/xml_to_tsv.py
data/Posts.xml data/Posts.tsv python
* source, why luigi and airflow might not be
enough...
Model tracking, snapshotting and reproducibility:
Purpose: Git for data scientists - turn any repository into a trackable task record with reusable
environments and metrics logging.
Example Datmo:
https://github.com/datmo/datmo
OS, but tight to paid
service!!!
PipelineAI:
https://github.com/PipelineAI/pipeline
Reproducible Model Pipelines + serving:
Purpose: Consistent, Immutable, Reproducible Model Runtimes
Example PipelineAI:
https://github.com/PipelineAI/pipeline
Models are stored in
docker containers!!!
Open Source solutions
for model / workflow
management
Open Source
Project
K8s 1st class
support
# Active
Maintainers
# Commits 1st Commit Comp
Seldon.IO Yes + Kubeflow 3 1131 2017-Feb #6
H2o driverless-ai Yes + Kubeflow 14 22967 2014-Mar #12
PipelineAI Yes. No Kubeflow 1 4846 2015-Jul #273
Polyaxon Yes. No Kubeflow 1 2732 2017-May #88
Vespa-engine No, but possilbe 14 18290 2016-Jun #5854
IBM/FfDL Yes. No Kubeflow 4 bulk-init hidden #77
RiseML Yes. No Kubeflow 3 95 2017-Oct #9
databricks/mlflow No. No Kubeflow 5 bulk-init hidden #58
datmo No, possible 4 bulk-init 2018-04 #213
Data / Feature
Management
Open Source
Project
Note # Active
Maintainers
# Commits 1st Commit Comp
Data Version Control
(DVC)
project and
data
management
2 1364 2017-Mar #readme
pachyderm Data Pipeline,
management
6 11652 2014-09 tbd
quiltdata/quilt Data and
package
management
8 1177 2017-01 tbd
Questions?
Georg Hildebrand, zalando SE
https://github.com/georghildebrand
https://www.linkedin.com/in/georghildebrand/
Example Components
ML Journey
Idea
Explore
Combine and
Extract Data
Analyse and
Preprocess
Model and
train
Test and Monitor
Share and
Improve
There is even more hidden depth:
● Models erode boundaries of modularisation:
context and preprocessing dependencies
● Data Dependencies Cost More than Code Dependencies (unstable data
dependencies, data quality monitoring etc.)
● Complex hardware requirements (GPUs on kubernetes??)
● Feedback Loops (model influences its feature data)
● ML-System Anti-Patterns (glue code, pipeline jungles …)
* Own experience
** D. Sculley et al., “Hidden technical debt in machine learning
systems,” in Advances in Neural Information Processing Systems,
2015, pp. 2503–2511.
*** D. Sculley et al., “Machine Learning: The High-Interest Credit Card
of Technical Debt,” p. 9.

Más contenido relacionado

La actualidad más candente

Simulation with Python and MATLAB® in Capella
Simulation with Python and MATLAB® in CapellaSimulation with Python and MATLAB® in Capella
Simulation with Python and MATLAB® in CapellaObeo
 
Artificial Neural Networks 1
Artificial Neural Networks 1Artificial Neural Networks 1
Artificial Neural Networks 1swapnac12
 
Automation 2018
Automation  2018Automation  2018
Automation 2018vcdas
 
AutoML - The Future of AI
AutoML - The Future of AIAutoML - The Future of AI
AutoML - The Future of AINing Jiang
 
Advanced vibrations
Advanced vibrationsAdvanced vibrations
Advanced vibrationsSpringer
 
Neural networks and fuzzy logic
Neural networks and fuzzy logicNeural networks and fuzzy logic
Neural networks and fuzzy logicSaurav Prasad
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector MachineShao-Chuan Wang
 
Industrial Automation in India
Industrial Automation in IndiaIndustrial Automation in India
Industrial Automation in IndiaAnand Prithviraj
 
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Seonho Park
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine LearningKnoldus Inc.
 
Multi objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions resultMulti objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions resultPiyush Agarwal
 
TensorFlow London 12: Oliver Gindele 'Recommender systems in Tensorflow'
TensorFlow London 12: Oliver Gindele 'Recommender systems in Tensorflow'TensorFlow London 12: Oliver Gindele 'Recommender systems in Tensorflow'
TensorFlow London 12: Oliver Gindele 'Recommender systems in Tensorflow'Seldon
 
Attention in Deep Learning
Attention in Deep LearningAttention in Deep Learning
Attention in Deep Learning健程 杨
 
Lecture1 introduction to machine learning
Lecture1 introduction to machine learningLecture1 introduction to machine learning
Lecture1 introduction to machine learningUmmeSalmaM1
 

La actualidad más candente (20)

Simulation with Python and MATLAB® in Capella
Simulation with Python and MATLAB® in CapellaSimulation with Python and MATLAB® in Capella
Simulation with Python and MATLAB® in Capella
 
Basics of Soft Computing
Basics of Soft  Computing Basics of Soft  Computing
Basics of Soft Computing
 
Artificial Neural Networks 1
Artificial Neural Networks 1Artificial Neural Networks 1
Artificial Neural Networks 1
 
Fuzzy expert system
Fuzzy expert systemFuzzy expert system
Fuzzy expert system
 
Automation 2018
Automation  2018Automation  2018
Automation 2018
 
AutoML - The Future of AI
AutoML - The Future of AIAutoML - The Future of AI
AutoML - The Future of AI
 
Advanced vibrations
Advanced vibrationsAdvanced vibrations
Advanced vibrations
 
Neural networks and fuzzy logic
Neural networks and fuzzy logicNeural networks and fuzzy logic
Neural networks and fuzzy logic
 
Backpropagation algo
Backpropagation  algoBackpropagation  algo
Backpropagation algo
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector Machine
 
Industrial Automation in India
Industrial Automation in IndiaIndustrial Automation in India
Industrial Automation in India
 
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
Graphics standards
Graphics standardsGraphics standards
Graphics standards
 
Multi objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions resultMulti objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions result
 
Fuzzy logic control system
Fuzzy logic control systemFuzzy logic control system
Fuzzy logic control system
 
TensorFlow London 12: Oliver Gindele 'Recommender systems in Tensorflow'
TensorFlow London 12: Oliver Gindele 'Recommender systems in Tensorflow'TensorFlow London 12: Oliver Gindele 'Recommender systems in Tensorflow'
TensorFlow London 12: Oliver Gindele 'Recommender systems in Tensorflow'
 
Attention in Deep Learning
Attention in Deep LearningAttention in Deep Learning
Attention in Deep Learning
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
Lecture1 introduction to machine learning
Lecture1 introduction to machine learningLecture1 introduction to machine learning
Lecture1 introduction to machine learning
 

Similar a 2018 data engineering for ml asset management for features and models

Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Luciano Resende
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...ScyllaDB
 
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML PipelinesPyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML PipelinesJim Dowling
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsStavros Kontopoulos
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in ProductionDataWorks Summit
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Sotrender
 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...Henry Saputra
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Tushar Katarki
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Databricks
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...Abhinav Joshi
 
Tech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsTech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsGianmario Spacagna
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureFei Chen
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentDatabricks
 
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Akash Tandon
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsMárton Kodok
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 

Similar a 2018 data engineering for ml asset management for features and models (20)

Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
 
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML PipelinesPyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutions
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
 
03_aiops-1.pptx
03_aiops-1.pptx03_aiops-1.pptx
03_aiops-1.pptx
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
 
Tech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsTech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning products
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 

Último

litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdfAlexander Litvinenko
 
Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)NareenAsad
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..MaherOthman7
 
Lab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxLab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxRashidFaridChishti
 
Module-III Varried Flow.pptx GVF Definition, Water Surface Profile Dynamic Eq...
Module-III Varried Flow.pptx GVF Definition, Water Surface Profile Dynamic Eq...Module-III Varried Flow.pptx GVF Definition, Water Surface Profile Dynamic Eq...
Module-III Varried Flow.pptx GVF Definition, Water Surface Profile Dynamic Eq...Nitin Sonavane
 
AI in Healthcare Innovative use cases and applications.pdf
AI in Healthcare Innovative use cases and applications.pdfAI in Healthcare Innovative use cases and applications.pdf
AI in Healthcare Innovative use cases and applications.pdfmahaffeycheryld
 
Theory for How to calculation capacitor bank
Theory for How to calculation capacitor bankTheory for How to calculation capacitor bank
Theory for How to calculation capacitor banktawat puangthong
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdfKamal Acharya
 
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Prakhyath Rai
 
Piping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdfPiping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdfAshrafRagab14
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxKarpagam Institute of Teechnology
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxCHAIRMAN M
 
"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.MdManikurRahman
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfJNTUA
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfragupathi90
 
Introduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of ArduinoIntroduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of ArduinoAbhimanyu Sangale
 
Microkernel in Operating System | Operating System
Microkernel in Operating System | Operating SystemMicrokernel in Operating System | Operating System
Microkernel in Operating System | Operating SystemSampad Kar
 
Intelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent ActsIntelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent ActsSheetal Jain
 
Multivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxMultivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxalijaker017
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdfKamal Acharya
 

Último (20)

litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
 
Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..
 
Lab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxLab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docx
 
Module-III Varried Flow.pptx GVF Definition, Water Surface Profile Dynamic Eq...
Module-III Varried Flow.pptx GVF Definition, Water Surface Profile Dynamic Eq...Module-III Varried Flow.pptx GVF Definition, Water Surface Profile Dynamic Eq...
Module-III Varried Flow.pptx GVF Definition, Water Surface Profile Dynamic Eq...
 
AI in Healthcare Innovative use cases and applications.pdf
AI in Healthcare Innovative use cases and applications.pdfAI in Healthcare Innovative use cases and applications.pdf
AI in Healthcare Innovative use cases and applications.pdf
 
Theory for How to calculation capacitor bank
Theory for How to calculation capacitor bankTheory for How to calculation capacitor bank
Theory for How to calculation capacitor bank
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdf
 
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
 
Piping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdfPiping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdf
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptx
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
 
"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdf
 
Introduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of ArduinoIntroduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of Arduino
 
Microkernel in Operating System | Operating System
Microkernel in Operating System | Operating SystemMicrokernel in Operating System | Operating System
Microkernel in Operating System | Operating System
 
Intelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent ActsIntelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent Acts
 
Multivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxMultivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptx
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdf
 

2018 data engineering for ml asset management for features and models

  • 1. Asset Management for Machine Learning Georg Hildebrand
  • 3. The Big Picture: ML = Knowledge Workers Productivity x 50 ?? Peter Drucker source
  • 5. … the story of a model that was sent as email attachment ...
  • 6. … the story of features that were not available anymore ...
  • 7. *D. Sculley et al., “Hidden technical debt in machine learning systems,” in Advances in Neural Information Processing Systems, 2015, pp. 2503–2511. Pick a complexity for your ML product:
  • 8. source System of ML silos Productivity … Hidden depth of machine learning
  • 9. How to reach 50 x productivity ?
  • 10. … make models and features human and machine readable!
  • 11. Store context - How was the input data chosen? - Who trained the model? Is there a project paper? - How to run the model? - Store log output of model training - Framework used to build the model. - Chain model and feature preprocessing!
  • 12. Version and test - Snapshot input data /features - Version the model (eg. into S3 or a backend database) - Link the container artifact. - Provide input and output tests - Store performance metrics - …
  • 13. Put it into practice - There is no one stop shop solution :-/
  • 14. Manage ML Deployment pipeline: Example: SeldonIO, serving and integrating ML Purpose: End to end deployment for serving ML and more, very advanced. https://github.com/SeldonIO/ Trys use the modularised way! May require data to move around.
  • 15. Manage Models + Data and serve low latency: Example: Vespa, storing model together with the data Purpose: eg. Low latency serving of ML https://github.com/vespa-engine/vespa Container node Query Application Package Admin & Config Content node Deploy - Configuration - Components - ML models Scatter-gather Core sharding models models models Moves models to data!
  • 16. Manage Project, Code, Data and more: Purpose: Git for data scientists - manage your code and data together Example DVC (Data Version Control): https://github.com/iterative/dvc Input and output are tracked --> DAG !!! $ dvc run python code/xml_to_tsv.py data/Posts.xml data/Posts.tsv python * source, why luigi and airflow might not be enough...
  • 17. Model tracking, snapshotting and reproducibility: Purpose: Git for data scientists - turn any repository into a trackable task record with reusable environments and metrics logging. Example Datmo: https://github.com/datmo/datmo OS, but tight to paid service!!!
  • 18. PipelineAI: https://github.com/PipelineAI/pipeline Reproducible Model Pipelines + serving: Purpose: Consistent, Immutable, Reproducible Model Runtimes Example PipelineAI: https://github.com/PipelineAI/pipeline Models are stored in docker containers!!!
  • 19. Open Source solutions for model / workflow management Open Source Project K8s 1st class support # Active Maintainers # Commits 1st Commit Comp Seldon.IO Yes + Kubeflow 3 1131 2017-Feb #6 H2o driverless-ai Yes + Kubeflow 14 22967 2014-Mar #12 PipelineAI Yes. No Kubeflow 1 4846 2015-Jul #273 Polyaxon Yes. No Kubeflow 1 2732 2017-May #88 Vespa-engine No, but possilbe 14 18290 2016-Jun #5854 IBM/FfDL Yes. No Kubeflow 4 bulk-init hidden #77 RiseML Yes. No Kubeflow 3 95 2017-Oct #9 databricks/mlflow No. No Kubeflow 5 bulk-init hidden #58 datmo No, possible 4 bulk-init 2018-04 #213
  • 20. Data / Feature Management Open Source Project Note # Active Maintainers # Commits 1st Commit Comp Data Version Control (DVC) project and data management 2 1364 2017-Mar #readme pachyderm Data Pipeline, management 6 11652 2014-09 tbd quiltdata/quilt Data and package management 8 1177 2017-01 tbd
  • 21. Questions? Georg Hildebrand, zalando SE https://github.com/georghildebrand https://www.linkedin.com/in/georghildebrand/
  • 23. ML Journey Idea Explore Combine and Extract Data Analyse and Preprocess Model and train Test and Monitor Share and Improve
  • 24. There is even more hidden depth: ● Models erode boundaries of modularisation: context and preprocessing dependencies ● Data Dependencies Cost More than Code Dependencies (unstable data dependencies, data quality monitoring etc.) ● Complex hardware requirements (GPUs on kubernetes??) ● Feedback Loops (model influences its feature data) ● ML-System Anti-Patterns (glue code, pipeline jungles …) * Own experience ** D. Sculley et al., “Hidden technical debt in machine learning systems,” in Advances in Neural Information Processing Systems, 2015, pp. 2503–2511. *** D. Sculley et al., “Machine Learning: The High-Interest Credit Card of Technical Debt,” p. 9.