SlideShare una empresa de Scribd logo
1 de 43
10/29/2016 Data Science Camp, Santa Clara
Managing and Versioning Machine Learning
Models in Python
Simon Frid github.com/fridiculous
Session Overview
1. Motivation
1. Image Recognition Use Case
2. Ad Conversion Use Case
3. Fraud Prediction Use Case
2. Strategies and Design Considerations
1. Data Science Workflow
2. What Can We Learn from Software Version Control
3. Python Tools
4. Solutions
1. Estimators and Django-Estimators
2. Demo
Session Overview
1. Motivation
1. Image Recognition Use Case
2. Ad Conversion Use Case
3. Fraud Prediction Use Case
2. Strategies and Design Considerations
1. Data Science Workflow
2. What Can We Learn from Software Version Control
3. Python Tools
4. Solutions
1. Estimators and Django-Estimators
2. Demo
Disclaimer
Use Case 1:
Car Rental Marketplace
Identifying Cars/Inventory with Image
Recognition
How do we Iterate?
✤ Help clarify features. Improve photo
attributes e.g. edge detection.
✤ Human in the loop!
✤ Add computational power & GPUs
Use Case: Image Recognition
✤ Lots of models.
✤ Time to Develop. Time to
Deploy.
✤ How do we reference these
models? Which one do we
choose for production?
Use Case 2:
Selling Student Loans
Predicting Conversion Rate on Ads
Frequent Training
✤ Yearly Seasonality
✤ Irregular Monthly Effects
✤ Current Activity of the User’s
Demographics Matters
✤ A/B testing and Multi-Armed
Bandits
Selling Student Loans
✤ Lots of Models
✤ Lot of Trained Versions
✤ Lots of Data “Slicing” Options
✤ How do we Reference Models at training
time? How do Reference Models for A/B
testing?
Use Case 3:
Payment Gateway
Predicting Fraudulent Transactions
Fraud Patterns Change over Time
✤ A game of Cat and Mouse
Predicting Fraud
✤ Sudden change
in Signature
Signal
✤ Forensic Analysis
of Obsolete
Models
✤ Time Relevance
of the Features
✤ How do we …
“There are practical little things in housekeeping which no man really understands.”
Session Overview
1. Motivation
1. Image Recognition Use Case
2. Ad Conversion Use Case
3. Fraud Prediction Use Case
2. Strategies and Considerations
1. Data Science Workflow
2. What Can We Learn from Software Version Control
3. Python Tools
4. Solutions
1. Estimators and Django-Estimators
2. Demo
Concept in
Software Version Control
Definition
Technology
Needed
Repository
Versioning
Commits, Tags and Labels
Push, Pull and Checkout
Diff
Concept in
Software Version Control
Definition
Technology
Needed
Repository
The repository is where files' current and historical data are stored,
often on a server. Sometimes also called a depot. Persistance & Serialization
Versioning
Commits, Tags and Labels
Push, Pull and Checkout
Diff
Concept in
Software Version Control
Definition
Technology
Needed
Repository
The repository is where files' current and historical data are stored,
often on a server. Sometimes also called a depot. Persistance & Serialization
Versioning
The process of assigning either unique version names or unique
version numbers to unique states of computer software.
Indexing & Hashing
Commits, Tags and Labels
Push, Pull and Checkout
Diff
Concept in
Software Version Control
Definition
Technology
Needed
Repository
The repository is where files' current and historical data are stored,
often on a server. Sometimes also called a depot. Persistance & Serialization
Versioning
The process of assigning either unique version names or unique
version numbers to unique states of computer software.
Indexing & Hashing
Commits, Tags and Labels
A tag or label refers to an important snapshot in time, consistent
across many files. These files at that point may all be tagged with a
user-friendly, meaningful name or revision number.
Attributes & Tags
Push, Pull and Checkout
Diff
Concept in
Software Version Control
Definition
Technology
Needed
Repository
The repository is where files' current and historical data are stored,
often on a server. Sometimes also called a depot. Persistance & Serialization
Versioning
The process of assigning either unique version names or unique
version numbers to unique states of computer software.
Indexing & Hashing
Commits, Tags and Labels
A tag or label refers to an important snapshot in time, consistent
across many files. These files at that point may all be tagged with a
user-friendly, meaningful name or revision number.
Attributes & Tags
Push, Pull and Checkout
To create a working copy from a repository.
With respect to pushing and pulling, a push sends a copy of one
repository to another repository. To pull retrieves a copy of a target
repository.
API
to persist and retrieve
Diff
Concept in
Software Version Control
Definition
Technology
Needed
Repository
The repository is where files' current and historical data are stored,
often on a server. Sometimes also called a depot. Persistance & Serialization
Versioning
The process of assigning either unique version names or unique
version numbers to unique states of computer software.
Indexing & Hashing
Commits, Tags and Labels
A tag or label refers to an important snapshot in time, consistent
across many files. These files at that point may all be tagged with a
user-friendly, meaningful name or revision number.
Attributes & Tags
Push, Pull and Checkout
To create a working copy from a repository.
With respect to pushing and pulling, a push sends a copy of one
repository to another repository. To pull retrieves a copy of a target
repository.
API
to persist and retrieve
Diff
represents a specific modification to a document under version
control. The granularity of the modification considered a change
varies between version control systems. 😃
Session Overview
1. Motivation
1. Image Recognition Use Case
2. Ad Conversion Use Case
3. Fraud Prediction Use Case
2. Strategies and Considerations
1. Data Science Workflow
2. What Can We Learn from Software Version Control
3. Python Tools
4. Solutions
1. Estimators and Django-Estimators
2. Demo
Algorithm Options
✤ scikit-learn
✤ MILK
✤ Statsmodels
✤ pylearn2
✤ nolearn
✤ nuPIC
✤ Nilearn
✤ gensim
✤ NLTK
✤ spacy
✤ scikit-image
✤ autolearn
✤ TPOT
✤ crab
✤ XGBoost
✤ pydeap
✤ pgmpy
✤ caffe
✤ tensorflow
✤ keras
✤ gym
Persistence Layer Options
✤ s3 - e.g. s3://bucket/project/model.pkl
✤ GitLFS
✤ Elasticsearch and Document-based Stores
✤ Docker
✤ Pachyderm
Serialization Options
✤ cpickle (py2) and pickle (py3)
✤ sklearn.joblib
✤ dill, cloudpickle and picklable-itertools
✤ PMML via jpmml-sklearn
✤ and what about transformer pipelines?
Indexing & Hashing
✤ Hashing the model
✤ Hashing the data
✤ Relational Database Table for Look Up
✤ Key Value Stores like Redis, Dynamo
Labels
✤ Semantic Versioning, Major.Minor.Patch
✤ Tags (django-taggit)
✤ Storing MetaData, create_dates, relationships between models
✤ Notes and learnings (from Human in the Loops)
API… components…
✤ Custom using an ORM/DAL like django and sqlachemy
✤ SaaS & PaaS - Turi, ScienceOps, PredictionIO, Azure ML
✤ Asynchronous Tasks - Airflow, Luigi, Celery
✤ Flows using Docker and Pachyderm
Session Overview
1. Motivation
1. Image Recognition Use Case
2. Ad Conversion Use Case
3. Fraud Prediction Use Case
2. Strategies and Considerations
1. Data Science Workflow
2. What Can We Learn from Software Version Control
3. Python Tools
4. Solutions
1. Estimators and Django-Estimators
2. Demo
Estimators
✤ a standalone client as an API for your ML
repo
✤ current focus “to persist upon prediction”
✤ Uses SQLAlchemy and local filesystem (for
now)
✤ github.com/fridiculous/estimators
✤ pip install estimators
(pre-alpha development version)
Django-Estimators
✤ an django-extension for ML models
✤ current focus “to persist each object”
✤ Uses Django and local filesystem (for now)
✤ github.com/fridiculous/django-estimators
✤ pip install django-estimators
(pre-alpha development version)
Demo
Fin.

Más contenido relacionado

La actualidad más candente

Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
DataWorks Summit
 

La actualidad más candente (20)

Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
 
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
Machine Learning with PyCarent + MLflow
Machine Learning with PyCarent + MLflowMachine Learning with PyCarent + MLflow
Machine Learning with PyCarent + MLflow
 
Real World End to End machine Learning Pipeline
Real World End to End machine Learning PipelineReal World End to End machine Learning Pipeline
Real World End to End machine Learning Pipeline
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Building an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflowBuilding an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflow
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture Deliverables
 
Data modelling 101
Data modelling 101Data modelling 101
Data modelling 101
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
Managing the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflowManaging the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflow
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
 
Simplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta LakeSimplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta Lake
 
Data Product Architectures
Data Product ArchitecturesData Product Architectures
Data Product Architectures
 

Destacado

Destacado (16)

Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning Models
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learning
 
Production machine learning_infrastructure
Production machine learning_infrastructureProduction machine learning_infrastructure
Production machine learning_infrastructure
 
Using PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataUsing PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of Data
 
Square's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanSquare's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong Yan
 
Machine learning in production with scikit-learn
Machine learning in production with scikit-learnMachine learning in production with scikit-learn
Machine learning in production with scikit-learn
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data Capture
 
Python as part of a production machine learning stack by Michael Manapat PyDa...
Python as part of a production machine learning stack by Michael Manapat PyDa...Python as part of a production machine learning stack by Michael Manapat PyDa...
Python as part of a production machine learning stack by Michael Manapat PyDa...
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operations
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 
Building A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning PipelineBuilding A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning Pipeline
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In Production
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architecture
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
 

Similar a Managing and Versioning Machine Learning Models in Python

Optimization In Mobile Systems
Optimization In Mobile SystemsOptimization In Mobile Systems
Optimization In Mobile Systems
momobangalore
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
HostedbyConfluent
 
Swift profiling middleware and tools
Swift profiling middleware and toolsSwift profiling middleware and tools
Swift profiling middleware and tools
zhang hua
 

Similar a Managing and Versioning Machine Learning Models in Python (20)

Java Performance & Profiling
Java Performance & ProfilingJava Performance & Profiling
Java Performance & Profiling
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
 
Optimization In Mobile Systems
Optimization In Mobile SystemsOptimization In Mobile Systems
Optimization In Mobile Systems
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
 
DevOps Spain 2019. Beatriz Martínez-IBM
DevOps Spain 2019. Beatriz Martínez-IBMDevOps Spain 2019. Beatriz Martínez-IBM
DevOps Spain 2019. Beatriz Martínez-IBM
 
WDI 2021 - Pierwszy duży projekt w Pythonie i Selenium - Katarzyna Javaheri-S...
WDI 2021 - Pierwszy duży projekt w Pythonie i Selenium - Katarzyna Javaheri-S...WDI 2021 - Pierwszy duży projekt w Pythonie i Selenium - Katarzyna Javaheri-S...
WDI 2021 - Pierwszy duży projekt w Pythonie i Selenium - Katarzyna Javaheri-S...
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Intro to Reproducible Research
Intro to Reproducible ResearchIntro to Reproducible Research
Intro to Reproducible Research
 
Java Performance and Profiling
Java Performance and ProfilingJava Performance and Profiling
Java Performance and Profiling
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
Measuring Your Code
Measuring Your CodeMeasuring Your Code
Measuring Your Code
 
Scaling AI in production using PyTorch
Scaling AI in production using PyTorchScaling AI in production using PyTorch
Scaling AI in production using PyTorch
 
Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)
 
Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)
 
Consumer centric api design v0.4.0
Consumer centric api design v0.4.0Consumer centric api design v0.4.0
Consumer centric api design v0.4.0
 
London atlassian meetup 31 jan 2016 jira metrics-extract slides
London atlassian meetup 31 jan 2016 jira metrics-extract slidesLondon atlassian meetup 31 jan 2016 jira metrics-extract slides
London atlassian meetup 31 jan 2016 jira metrics-extract slides
 
Swift profiling middleware and tools
Swift profiling middleware and toolsSwift profiling middleware and tools
Swift profiling middleware and tools
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 

Último

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 

Último (20)

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 

Managing and Versioning Machine Learning Models in Python

  • 1. 10/29/2016 Data Science Camp, Santa Clara Managing and Versioning Machine Learning Models in Python Simon Frid github.com/fridiculous
  • 2. Session Overview 1. Motivation 1. Image Recognition Use Case 2. Ad Conversion Use Case 3. Fraud Prediction Use Case 2. Strategies and Design Considerations 1. Data Science Workflow 2. What Can We Learn from Software Version Control 3. Python Tools 4. Solutions 1. Estimators and Django-Estimators 2. Demo
  • 3. Session Overview 1. Motivation 1. Image Recognition Use Case 2. Ad Conversion Use Case 3. Fraud Prediction Use Case 2. Strategies and Design Considerations 1. Data Science Workflow 2. What Can We Learn from Software Version Control 3. Python Tools 4. Solutions 1. Estimators and Django-Estimators 2. Demo
  • 4.
  • 6. Use Case 1: Car Rental Marketplace Identifying Cars/Inventory with Image Recognition
  • 7. How do we Iterate? ✤ Help clarify features. Improve photo attributes e.g. edge detection. ✤ Human in the loop! ✤ Add computational power & GPUs
  • 8. Use Case: Image Recognition ✤ Lots of models. ✤ Time to Develop. Time to Deploy. ✤ How do we reference these models? Which one do we choose for production?
  • 9. Use Case 2: Selling Student Loans Predicting Conversion Rate on Ads
  • 10. Frequent Training ✤ Yearly Seasonality ✤ Irregular Monthly Effects ✤ Current Activity of the User’s Demographics Matters ✤ A/B testing and Multi-Armed Bandits
  • 11. Selling Student Loans ✤ Lots of Models ✤ Lot of Trained Versions ✤ Lots of Data “Slicing” Options ✤ How do we Reference Models at training time? How do Reference Models for A/B testing?
  • 12. Use Case 3: Payment Gateway Predicting Fraudulent Transactions
  • 13. Fraud Patterns Change over Time ✤ A game of Cat and Mouse
  • 14. Predicting Fraud ✤ Sudden change in Signature Signal ✤ Forensic Analysis of Obsolete Models ✤ Time Relevance of the Features ✤ How do we …
  • 15.
  • 16. “There are practical little things in housekeeping which no man really understands.”
  • 17. Session Overview 1. Motivation 1. Image Recognition Use Case 2. Ad Conversion Use Case 3. Fraud Prediction Use Case 2. Strategies and Considerations 1. Data Science Workflow 2. What Can We Learn from Software Version Control 3. Python Tools 4. Solutions 1. Estimators and Django-Estimators 2. Demo
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25. Concept in Software Version Control Definition Technology Needed Repository Versioning Commits, Tags and Labels Push, Pull and Checkout Diff
  • 26. Concept in Software Version Control Definition Technology Needed Repository The repository is where files' current and historical data are stored, often on a server. Sometimes also called a depot. Persistance & Serialization Versioning Commits, Tags and Labels Push, Pull and Checkout Diff
  • 27. Concept in Software Version Control Definition Technology Needed Repository The repository is where files' current and historical data are stored, often on a server. Sometimes also called a depot. Persistance & Serialization Versioning The process of assigning either unique version names or unique version numbers to unique states of computer software. Indexing & Hashing Commits, Tags and Labels Push, Pull and Checkout Diff
  • 28. Concept in Software Version Control Definition Technology Needed Repository The repository is where files' current and historical data are stored, often on a server. Sometimes also called a depot. Persistance & Serialization Versioning The process of assigning either unique version names or unique version numbers to unique states of computer software. Indexing & Hashing Commits, Tags and Labels A tag or label refers to an important snapshot in time, consistent across many files. These files at that point may all be tagged with a user-friendly, meaningful name or revision number. Attributes & Tags Push, Pull and Checkout Diff
  • 29. Concept in Software Version Control Definition Technology Needed Repository The repository is where files' current and historical data are stored, often on a server. Sometimes also called a depot. Persistance & Serialization Versioning The process of assigning either unique version names or unique version numbers to unique states of computer software. Indexing & Hashing Commits, Tags and Labels A tag or label refers to an important snapshot in time, consistent across many files. These files at that point may all be tagged with a user-friendly, meaningful name or revision number. Attributes & Tags Push, Pull and Checkout To create a working copy from a repository. With respect to pushing and pulling, a push sends a copy of one repository to another repository. To pull retrieves a copy of a target repository. API to persist and retrieve Diff
  • 30. Concept in Software Version Control Definition Technology Needed Repository The repository is where files' current and historical data are stored, often on a server. Sometimes also called a depot. Persistance & Serialization Versioning The process of assigning either unique version names or unique version numbers to unique states of computer software. Indexing & Hashing Commits, Tags and Labels A tag or label refers to an important snapshot in time, consistent across many files. These files at that point may all be tagged with a user-friendly, meaningful name or revision number. Attributes & Tags Push, Pull and Checkout To create a working copy from a repository. With respect to pushing and pulling, a push sends a copy of one repository to another repository. To pull retrieves a copy of a target repository. API to persist and retrieve Diff represents a specific modification to a document under version control. The granularity of the modification considered a change varies between version control systems. 😃
  • 31. Session Overview 1. Motivation 1. Image Recognition Use Case 2. Ad Conversion Use Case 3. Fraud Prediction Use Case 2. Strategies and Considerations 1. Data Science Workflow 2. What Can We Learn from Software Version Control 3. Python Tools 4. Solutions 1. Estimators and Django-Estimators 2. Demo
  • 32. Algorithm Options ✤ scikit-learn ✤ MILK ✤ Statsmodels ✤ pylearn2 ✤ nolearn ✤ nuPIC ✤ Nilearn ✤ gensim ✤ NLTK ✤ spacy ✤ scikit-image ✤ autolearn ✤ TPOT ✤ crab ✤ XGBoost ✤ pydeap ✤ pgmpy ✤ caffe ✤ tensorflow ✤ keras ✤ gym
  • 33. Persistence Layer Options ✤ s3 - e.g. s3://bucket/project/model.pkl ✤ GitLFS ✤ Elasticsearch and Document-based Stores ✤ Docker ✤ Pachyderm
  • 34. Serialization Options ✤ cpickle (py2) and pickle (py3) ✤ sklearn.joblib ✤ dill, cloudpickle and picklable-itertools ✤ PMML via jpmml-sklearn ✤ and what about transformer pipelines?
  • 35. Indexing & Hashing ✤ Hashing the model ✤ Hashing the data ✤ Relational Database Table for Look Up ✤ Key Value Stores like Redis, Dynamo
  • 36. Labels ✤ Semantic Versioning, Major.Minor.Patch ✤ Tags (django-taggit) ✤ Storing MetaData, create_dates, relationships between models ✤ Notes and learnings (from Human in the Loops)
  • 37. API… components… ✤ Custom using an ORM/DAL like django and sqlachemy ✤ SaaS & PaaS - Turi, ScienceOps, PredictionIO, Azure ML ✤ Asynchronous Tasks - Airflow, Luigi, Celery ✤ Flows using Docker and Pachyderm
  • 38. Session Overview 1. Motivation 1. Image Recognition Use Case 2. Ad Conversion Use Case 3. Fraud Prediction Use Case 2. Strategies and Considerations 1. Data Science Workflow 2. What Can We Learn from Software Version Control 3. Python Tools 4. Solutions 1. Estimators and Django-Estimators 2. Demo
  • 39.
  • 40. Estimators ✤ a standalone client as an API for your ML repo ✤ current focus “to persist upon prediction” ✤ Uses SQLAlchemy and local filesystem (for now) ✤ github.com/fridiculous/estimators ✤ pip install estimators (pre-alpha development version)
  • 41. Django-Estimators ✤ an django-extension for ML models ✤ current focus “to persist each object” ✤ Uses Django and local filesystem (for now) ✤ github.com/fridiculous/django-estimators ✤ pip install django-estimators (pre-alpha development version)
  • 42. Demo
  • 43. Fin.

Notas del editor

  1. A handcar (also known as a pump trolley, pump car, jigger, Kalamazoo,[1] velocipede[citation needed], or draisine) is a railroad car powered by its passengers, or by people pushing the car from behind. It is mostly used as a maintenance of way or mining car, but it was also used for passenger service in some cases. A typical design consists of an arm, called the walking beam, that pivots, seesaw-like, on a base, which the passengers alternately push down and pull up to move the car. It reflects the current state of machine learning applications. “To discuss strategies and tools that help organize our ml systems.”
  2. I’m NOT an Expert. I’m a practitioner.
  3. but who knows, maybe the Pokemon mobile is the hottest rental over the weekend
  4. by Eleanor Roosevelt. We need a lot of tooling to automate and organize this information
  5. yellow is the data science sandbox blue is our business strategy role red is our product and engineering role
  6. “Automation” - when we need to script, schedule, repeat a particular process. It can be ETL, it can be training a model, it can be retrain models, it can be parameter optimization In all these cases, every time we automate, we need to know what we’re automating.
  7. we need help.