SlideShare una empresa de Scribd logo
1 de 25
Data Science: Good, Bad and Ugly
Do's and don'ts of working with data in production,
for collaboration, and for getting actionable insights
Irina Kukuyeva, Ph.D.
Big Data Day LA
August 5, 2017
2
Fashion:
Consulting:
IoT
Healthcare
Media & Entertainment
Finance
CPG
Retail
Video Game Publisher
Online Advertising:
Healthcare:
Ph.D.
My Background
3
Spectrum of (Select) Production Environments
Online Advertising*
Revenue $1M+/year
Near-real time
Consulting
Revenue $3M+/year
ASAP
Fashion
Patentable tech
Near-real time
Finance
Automation
Daily
Ralph Lauren
orange lace top
Women’s Top
(18mo part-time) (11mo part-time) (9mo part-time) (3wk part-time)
What ML Model Lifecycle Really Looks Like
what customer
described
what data
looked like
what DS
began to build
what budget
could buy
what pivot
looked like
what code
got reused
what got
tested [1]
what got
pushed to prod
what got
documented
what customer
wanted
4
Agenda
what customer
described
what data
looked like
what DS
began to build
what budget
could buy
what pivot
looked like
what code
got reused
what got
tested [1]
what got
pushed to prod
what got
documented
what customer
wanted
Step 1
(Appendix)
Step 2:
Data QA
(Another Talk)
Step 3:
ML Development
(Other Talks)
Step 4:
Pre-prod
Step 5:
Prod
Step 5:
Prod
Step 4:
Pre-prod
Step 1
(Appendix)
5
Step 4: Pre-Production
6
7
Pre-Production: Pay Down Tech Debt
Technical Debt — Borrowing against future to tradeoff between quality of
code and speed of delivery now [2], [3]
• Incur debt: write code, including ML pipelines [4]
• Pay down debt: extensively test and refactor pipeline end-to-end
[5]
→ Test 1: Joel’s Test: 12 Steps to Better Code [6]
• Spec?
• Source control? Best practices? [7]
• One-step build? Daily builds?
• Bug database? Release schedule? QA?
• Fix bugs before write new code?
8
Pre-Production: Pay Down Tech Debt
[8]
9
Consulting
Daily stand-up
Sprint planning
Version control*
Fix bugs first
Bugs emailed/db
Release on bugfix
One-step build
Atlassian suite
Virtual machines
Online Advertising*
Daily stand-up
Sprint planning
—
—
Bug database
—
—
—
—
Finance
—
—
Version control*
Fix bugs first
—
—
One-step build
Trello
—
Fashion
Daily stand-up
—
Version control
Fix bugs first
—
—
One-step build
Trello
Virtual machine
Test 1: Joel’s Test: 12 Steps to Better Code … in practice:
Pre-Production: Pay Down Tech Debt
(18mo part-time) (11mo part-time) (3wk part-time) (9mo part-time)
→ Test 2: ML Test Score [9], [10]
• Data and feature quality
• Model development
• ML infrastructure
• Monitoring ML
10
Pre-Production: Pay Down Tech Debt
[11]
11
→ Other tips — ML:
• Choose simplest model, most appropriate for task and prod env
• Test model against (simulated) “ground truth” or 2nd implementation [12]
• Evaluate effects of floating point [12]
• Model validation beyond accuracy
[13]
Pre-Production: Pay Down Tech Debt
12
→ Other tips — Code:
• Learn about production environment → less code to rewrite
• Set-up logging
• DRY → refactor
• Add else to if or try/except + error
• Add regression tests (tests to confirm bug fixed)
• Comment liberally, especially if have to ask “why”
Pre-Production: Pay Down Tech Debt
[14]
13
Consulting
Minimal time to add new
feature
Unsuitable features
excluded
Online Advertising* Fashion Finance
Minimal time to add new
feature
Privacy built-in
Test 2: ML Test Score … in practice:
– Data and Feature Quality –
Pre-Production: Pay Down Tech Debt
– Model Development –
Simulated ground truth Baseline model + 2nd
implementation
Rolling refresh
Performance overall +
those most likely to click
Proxy + actual metrics
Human labeled data
Bias correction
Performance overall +
main clothing categories
Baseline model
Bias correction
Proxy + actual metrics
14
Consulting
Loosely coupled fcns
Central repo for clients
Regression testing
One-step build, prod
Online Advertising*
Streaming
Fashion
Loosely coupled fcns
Streaming
One-step build, prod
Test for decreasing loss
Unit test classification
Finance
Loosely coupled fcns
Streaming
One-step build, prod*
Reproducibility of
training
Pre-Production: Pay Down Tech Debt
– ML Monitoring –
Logging
Software + package
versions check
Data availability check
Missing titles check
Offline = online (virtual)
env
Missing data check
Test 2: ML Test Score … in practice (cont’d):
– ML Infrastructure –
Step 5: Production
15
16
Production: Deploy Code and Monitor Performance
→ One-button push to prod branch/repo
→ Model Rollout
→ Monitoring
[15]
Step 6: Post-Production
17
18
→ Documentation + QA gets cut first
→ Debugging, debugging, debugging → code is never perfect
→ Bugfix vs. Feature
→ Post-mortem
→ Use the product
→ Support and Training
Post-Production: Keep Code Up and Running
[16]
19
Post-Production: Align Business and Team Goals
→ Team targets: deadlines and revenue goals
→ Team competitions
[17]
20
Key Takeaways
→ Communication, version control, logging, documentation, debugging
→ Automatically evaluate all components of ML pipeline
→ High model accuracy is not always the answer
→ Scope down, then scale up
[18]
21
You Did It!
Code is in prod!
Celebrate!
22
You Did It!
Code is in prod!
Celebrate!
… But not too hard. Tomorrow you start on v2.
Questions?
23
PS: We’re Hiring! https://www.dia.com/careers
[ 1] http://www.projectcartoon.com/cartoon/1
[ 2] https://research.google.com/pubs/pub43146.html
[ 3] https://www.linkedin.com/pulse/when-your-tech-debt-comes-due-kevin-scott
[ 4] https://www.sec.gov/news/press-release/2013-222
[ 5] http://dilbert.com/strip/2017-01-03
[ 6] https://www.joelonsoftware.com/2000/08/09/the-joel-test-12-steps-to-better-code/
[ 7] https://solidgeargroup.com/wp-content/uploads/2016/07/tower_cheatsheet_white_EN_0.pdf
[ 8] http://geek-and-poke.com/geekandpoke/2014/2/23/dev-cycle-friday-evening-edition
[ 9] https://www.eecs.tufts.edu/~dsculley/papers/ml_test_score.pdf
[10] http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf
[11] https://www.slideshare.net/Tech_InMobi/building-machine-learning-pipelines
[12] https://www.researchgate.net/publication/262569275_Testing_Scientific_Software_A_Systematic_Literature_Review
[13] https://classeval.wordpress.com/introduction/basic-evaluation-measures/
[14] http://geek-and-poke.com/geekandpoke/2015/10/18/why-logging-is-so-important
[15] https://www.pinterest.com/explore/programming-humor/
[16] https://www.devrant.io/search?term=debugging
[17] https://marketoonist.com/2015/03/hackathons.html
[18] https://s-media-cache-ak0.pinimg.com/originals/9c/25/08/9c25082f5c4d3477124356e45673d426.png
[19] https://www.pinterest.com/pin/177258935306351629/
24
References
25
Appendix: Establish Business Use Case
→ Kick-off meeting with stakeholders:
• Discuss use case, motivation and scope
• Find out about format of deliverable and how it will be used by team
• Brainstorm and discuss potential solutions
• Iron-out deadlines, checkpoints and on-going support structure
• Ask about prod env (if appropriate)
• Scope down, then scale up
• Close meeting with recap of action items
Key Takeaways: communication + clear expectations
[19]

Más contenido relacionado

La actualidad más candente

Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab CreateTuri, Inc.
 
AI Data Acquisition and Governance: Considerations for Success
AI Data Acquisition and Governance: Considerations for SuccessAI Data Acquisition and Governance: Considerations for Success
AI Data Acquisition and Governance: Considerations for SuccessDatabricks
 
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...Dataconomy Media
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...Sri Ambati
 
CI/DC in MLOps by J.B. Hunt
CI/DC in MLOps by J.B. HuntCI/DC in MLOps by J.B. Hunt
CI/DC in MLOps by J.B. HuntDatabricks
 
Dataiku data science studio
Dataiku data science studioDataiku data science studio
Dataiku data science studioNorman Poh
 
Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)Turi, Inc.
 
Building Intelligent Solutions with Graphs, Stefan Kolmar, Neo4j
Building Intelligent Solutions with Graphs, Stefan Kolmar, Neo4jBuilding Intelligent Solutions with Graphs, Stefan Kolmar, Neo4j
Building Intelligent Solutions with Graphs, Stefan Kolmar, Neo4jNeo4j
 
Bringing ML To Production, What Is Missing? AMLD 2020
Bringing ML To Production, What Is Missing? AMLD 2020Bringing ML To Production, What Is Missing? AMLD 2020
Bringing ML To Production, What Is Missing? AMLD 2020Mikio L. Braun
 
GraphTour - Keynote
GraphTour - KeynoteGraphTour - Keynote
GraphTour - KeynoteNeo4j
 
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4jNeo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4jNeo4j
 
Accelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSAccelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSSri Ambati
 
Dive into H2O: NYC
Dive into H2O: NYCDive into H2O: NYC
Dive into H2O: NYCSri Ambati
 
A field guide to the Financial Times, Rhys Evans, Financial Times
A field guide to the Financial Times, Rhys Evans, Financial TimesA field guide to the Financial Times, Rhys Evans, Financial Times
A field guide to the Financial Times, Rhys Evans, Financial TimesNeo4j
 
Your AI Transformation
Your AI Transformation Your AI Transformation
Your AI Transformation Sri Ambati
 
Human-Centered AI: Scalable, Interactive Tools for Interpretation and Attribu...
Human-Centered AI: Scalable, Interactive Tools for Interpretation and Attribu...Human-Centered AI: Scalable, Interactive Tools for Interpretation and Attribu...
Human-Centered AI: Scalable, Interactive Tools for Interpretation and Attribu...polochau
 
Data Science for Smart Manufacturing
Data Science for Smart ManufacturingData Science for Smart Manufacturing
Data Science for Smart ManufacturingCarlo Torniai
 
Ankit Sinha, Experian - Ascend Analytical Sandbox - #H2OWorld
Ankit Sinha, Experian - Ascend Analytical Sandbox - #H2OWorldAnkit Sinha, Experian - Ascend Analytical Sandbox - #H2OWorld
Ankit Sinha, Experian - Ascend Analytical Sandbox - #H2OWorldSri Ambati
 

La actualidad más candente (20)

Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab Create
 
AI Data Acquisition and Governance: Considerations for Success
AI Data Acquisition and Governance: Considerations for SuccessAI Data Acquisition and Governance: Considerations for Success
AI Data Acquisition and Governance: Considerations for Success
 
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...
Girish Sathyanarayana, Senior Data Scientist at AppLift, " Business Value Thr...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...
 
CI/DC in MLOps by J.B. Hunt
CI/DC in MLOps by J.B. HuntCI/DC in MLOps by J.B. Hunt
CI/DC in MLOps by J.B. Hunt
 
Dataiku data science studio
Dataiku data science studioDataiku data science studio
Dataiku data science studio
 
Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)
 
Building Intelligent Solutions with Graphs, Stefan Kolmar, Neo4j
Building Intelligent Solutions with Graphs, Stefan Kolmar, Neo4jBuilding Intelligent Solutions with Graphs, Stefan Kolmar, Neo4j
Building Intelligent Solutions with Graphs, Stefan Kolmar, Neo4j
 
Data Warehousing Trends
Data Warehousing TrendsData Warehousing Trends
Data Warehousing Trends
 
Bringing ML To Production, What Is Missing? AMLD 2020
Bringing ML To Production, What Is Missing? AMLD 2020Bringing ML To Production, What Is Missing? AMLD 2020
Bringing ML To Production, What Is Missing? AMLD 2020
 
GraphTour - Keynote
GraphTour - KeynoteGraphTour - Keynote
GraphTour - Keynote
 
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4jNeo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j
 
Accelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSAccelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWS
 
Dive into H2O: NYC
Dive into H2O: NYCDive into H2O: NYC
Dive into H2O: NYC
 
A field guide to the Financial Times, Rhys Evans, Financial Times
A field guide to the Financial Times, Rhys Evans, Financial TimesA field guide to the Financial Times, Rhys Evans, Financial Times
A field guide to the Financial Times, Rhys Evans, Financial Times
 
Your AI Transformation
Your AI Transformation Your AI Transformation
Your AI Transformation
 
Human-Centered AI: Scalable, Interactive Tools for Interpretation and Attribu...
Human-Centered AI: Scalable, Interactive Tools for Interpretation and Attribu...Human-Centered AI: Scalable, Interactive Tools for Interpretation and Attribu...
Human-Centered AI: Scalable, Interactive Tools for Interpretation and Attribu...
 
Data Science for Smart Manufacturing
Data Science for Smart ManufacturingData Science for Smart Manufacturing
Data Science for Smart Manufacturing
 
Ankit Sinha, Experian - Ascend Analytical Sandbox - #H2OWorld
Ankit Sinha, Experian - Ascend Analytical Sandbox - #H2OWorldAnkit Sinha, Experian - Ascend Analytical Sandbox - #H2OWorld
Ankit Sinha, Experian - Ascend Analytical Sandbox - #H2OWorld
 

Similar a Data Science: Good, Bad and Ugly by Irina Kukuyeva

Productionalizing Machine Learning Models: The Good, the Bad, and the Ugly
Productionalizing Machine Learning Models: The Good, the Bad, and the UglyProductionalizing Machine Learning Models: The Good, the Bad, and the Ugly
Productionalizing Machine Learning Models: The Good, the Bad, and the UglyIrina Kukuyeva, Ph.D.
 
Productionalizing Machine Learning Models: The Good, The Bad and The Ugly
Productionalizing Machine Learning Models: The Good, The Bad and The UglyProductionalizing Machine Learning Models: The Good, The Bad and The Ugly
Productionalizing Machine Learning Models: The Good, The Bad and The UglyIrina Kukuyeva, Ph.D.
 
Continuous Intelligence Workshop
Continuous Intelligence WorkshopContinuous Intelligence Workshop
Continuous Intelligence WorkshopDavid Tan
 
Six cigma AJAL
Six cigma AJALSix cigma AJAL
Six cigma AJALAJAL A J
 
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ..."Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...Fwdays
 
Using Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemUsing Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemVMware Tanzu
 
Enterprise Architecture in Practice: from Datastore to APIs and Apps
Enterprise Architecture in Practice: from Datastore to APIs and AppsEnterprise Architecture in Practice: from Datastore to APIs and Apps
Enterprise Architecture in Practice: from Datastore to APIs and AppsWSO2
 
DevopsBusinessCaseTemplate
DevopsBusinessCaseTemplateDevopsBusinessCaseTemplate
DevopsBusinessCaseTemplatePeter Lamar
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019DataKitchen
 
ODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoDataKitchen
 
MassChallenge Time Machine for .NET
MassChallenge Time Machine for .NETMassChallenge Time Machine for .NET
MassChallenge Time Machine for .NETTimeMachinefor
 
The Need for Speed
The Need for SpeedThe Need for Speed
The Need for SpeedCapgemini
 
Maximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and PracticesMaximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and PracticesJeff Bertman
 
Learn to see, measure and automate with value stream management
Learn to see, measure and automate with value stream managementLearn to see, measure and automate with value stream management
Learn to see, measure and automate with value stream managementLance Knight
 
Devoteam itsmf 2021 - from business automation to continuous value-driven i...
Devoteam   itsmf 2021 - from business automation to continuous value-driven i...Devoteam   itsmf 2021 - from business automation to continuous value-driven i...
Devoteam itsmf 2021 - from business automation to continuous value-driven i...itSMF Belgium
 
Software Solutions to Increase Construction Profits
Software Solutions to Increase Construction ProfitsSoftware Solutions to Increase Construction Profits
Software Solutions to Increase Construction ProfitsTheNetEffectContract
 
Six sigma ajal
Six sigma ajalSix sigma ajal
Six sigma ajalAJAL A J
 
Pyptug atdd agile_ci_uploaded
Pyptug atdd agile_ci_uploadedPyptug atdd agile_ci_uploaded
Pyptug atdd agile_ci_uploadedMacharla Pradeep
 

Similar a Data Science: Good, Bad and Ugly by Irina Kukuyeva (20)

Productionalizing Machine Learning Models: The Good, the Bad, and the Ugly
Productionalizing Machine Learning Models: The Good, the Bad, and the UglyProductionalizing Machine Learning Models: The Good, the Bad, and the Ugly
Productionalizing Machine Learning Models: The Good, the Bad, and the Ugly
 
Productionalizing Machine Learning Models: The Good, The Bad and The Ugly
Productionalizing Machine Learning Models: The Good, The Bad and The UglyProductionalizing Machine Learning Models: The Good, The Bad and The Ugly
Productionalizing Machine Learning Models: The Good, The Bad and The Ugly
 
Continuous Intelligence Workshop
Continuous Intelligence WorkshopContinuous Intelligence Workshop
Continuous Intelligence Workshop
 
Six cigma AJAL
Six cigma AJALSix cigma AJAL
Six cigma AJAL
 
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ..."Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
 
Using Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemUsing Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation System
 
Enterprise Architecture in Practice: from Datastore to APIs and Apps
Enterprise Architecture in Practice: from Datastore to APIs and AppsEnterprise Architecture in Practice: from Datastore to APIs and Apps
Enterprise Architecture in Practice: from Datastore to APIs and Apps
 
Introduction to Six Sigma
Introduction to Six SigmaIntroduction to Six Sigma
Introduction to Six Sigma
 
DevopsBusinessCaseTemplate
DevopsBusinessCaseTemplateDevopsBusinessCaseTemplate
DevopsBusinessCaseTemplate
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019
 
ODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps Manifesto
 
MassChallenge Time Machine for .NET
MassChallenge Time Machine for .NETMassChallenge Time Machine for .NET
MassChallenge Time Machine for .NET
 
The Need for Speed
The Need for SpeedThe Need for Speed
The Need for Speed
 
Maximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and PracticesMaximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and Practices
 
Learn to see, measure and automate with value stream management
Learn to see, measure and automate with value stream managementLearn to see, measure and automate with value stream management
Learn to see, measure and automate with value stream management
 
Devoteam itsmf 2021 - from business automation to continuous value-driven i...
Devoteam   itsmf 2021 - from business automation to continuous value-driven i...Devoteam   itsmf 2021 - from business automation to continuous value-driven i...
Devoteam itsmf 2021 - from business automation to continuous value-driven i...
 
Software Solutions to Increase Construction Profits
Software Solutions to Increase Construction ProfitsSoftware Solutions to Increase Construction Profits
Software Solutions to Increase Construction Profits
 
DevOps Days Rockies MLOps
DevOps Days Rockies MLOpsDevOps Days Rockies MLOps
DevOps Days Rockies MLOps
 
Six sigma ajal
Six sigma ajalSix sigma ajal
Six sigma ajal
 
Pyptug atdd agile_ci_uploaded
Pyptug atdd agile_ci_uploadedPyptug atdd agile_ci_uploaded
Pyptug atdd agile_ci_uploaded
 

Más de Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

Más de Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Último

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 

Último (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Data Science: Good, Bad and Ugly by Irina Kukuyeva

  • 1. Data Science: Good, Bad and Ugly Do's and don'ts of working with data in production, for collaboration, and for getting actionable insights Irina Kukuyeva, Ph.D. Big Data Day LA August 5, 2017
  • 2. 2 Fashion: Consulting: IoT Healthcare Media & Entertainment Finance CPG Retail Video Game Publisher Online Advertising: Healthcare: Ph.D. My Background
  • 3. 3 Spectrum of (Select) Production Environments Online Advertising* Revenue $1M+/year Near-real time Consulting Revenue $3M+/year ASAP Fashion Patentable tech Near-real time Finance Automation Daily Ralph Lauren orange lace top Women’s Top (18mo part-time) (11mo part-time) (9mo part-time) (3wk part-time)
  • 4. What ML Model Lifecycle Really Looks Like what customer described what data looked like what DS began to build what budget could buy what pivot looked like what code got reused what got tested [1] what got pushed to prod what got documented what customer wanted 4
  • 5. Agenda what customer described what data looked like what DS began to build what budget could buy what pivot looked like what code got reused what got tested [1] what got pushed to prod what got documented what customer wanted Step 1 (Appendix) Step 2: Data QA (Another Talk) Step 3: ML Development (Other Talks) Step 4: Pre-prod Step 5: Prod Step 5: Prod Step 4: Pre-prod Step 1 (Appendix) 5
  • 7. 7 Pre-Production: Pay Down Tech Debt Technical Debt — Borrowing against future to tradeoff between quality of code and speed of delivery now [2], [3] • Incur debt: write code, including ML pipelines [4] • Pay down debt: extensively test and refactor pipeline end-to-end [5]
  • 8. → Test 1: Joel’s Test: 12 Steps to Better Code [6] • Spec? • Source control? Best practices? [7] • One-step build? Daily builds? • Bug database? Release schedule? QA? • Fix bugs before write new code? 8 Pre-Production: Pay Down Tech Debt [8]
  • 9. 9 Consulting Daily stand-up Sprint planning Version control* Fix bugs first Bugs emailed/db Release on bugfix One-step build Atlassian suite Virtual machines Online Advertising* Daily stand-up Sprint planning — — Bug database — — — — Finance — — Version control* Fix bugs first — — One-step build Trello — Fashion Daily stand-up — Version control Fix bugs first — — One-step build Trello Virtual machine Test 1: Joel’s Test: 12 Steps to Better Code … in practice: Pre-Production: Pay Down Tech Debt (18mo part-time) (11mo part-time) (3wk part-time) (9mo part-time)
  • 10. → Test 2: ML Test Score [9], [10] • Data and feature quality • Model development • ML infrastructure • Monitoring ML 10 Pre-Production: Pay Down Tech Debt [11]
  • 11. 11 → Other tips — ML: • Choose simplest model, most appropriate for task and prod env • Test model against (simulated) “ground truth” or 2nd implementation [12] • Evaluate effects of floating point [12] • Model validation beyond accuracy [13] Pre-Production: Pay Down Tech Debt
  • 12. 12 → Other tips — Code: • Learn about production environment → less code to rewrite • Set-up logging • DRY → refactor • Add else to if or try/except + error • Add regression tests (tests to confirm bug fixed) • Comment liberally, especially if have to ask “why” Pre-Production: Pay Down Tech Debt [14]
  • 13. 13 Consulting Minimal time to add new feature Unsuitable features excluded Online Advertising* Fashion Finance Minimal time to add new feature Privacy built-in Test 2: ML Test Score … in practice: – Data and Feature Quality – Pre-Production: Pay Down Tech Debt – Model Development – Simulated ground truth Baseline model + 2nd implementation Rolling refresh Performance overall + those most likely to click Proxy + actual metrics Human labeled data Bias correction Performance overall + main clothing categories Baseline model Bias correction Proxy + actual metrics
  • 14. 14 Consulting Loosely coupled fcns Central repo for clients Regression testing One-step build, prod Online Advertising* Streaming Fashion Loosely coupled fcns Streaming One-step build, prod Test for decreasing loss Unit test classification Finance Loosely coupled fcns Streaming One-step build, prod* Reproducibility of training Pre-Production: Pay Down Tech Debt – ML Monitoring – Logging Software + package versions check Data availability check Missing titles check Offline = online (virtual) env Missing data check Test 2: ML Test Score … in practice (cont’d): – ML Infrastructure –
  • 16. 16 Production: Deploy Code and Monitor Performance → One-button push to prod branch/repo → Model Rollout → Monitoring [15]
  • 18. 18 → Documentation + QA gets cut first → Debugging, debugging, debugging → code is never perfect → Bugfix vs. Feature → Post-mortem → Use the product → Support and Training Post-Production: Keep Code Up and Running [16]
  • 19. 19 Post-Production: Align Business and Team Goals → Team targets: deadlines and revenue goals → Team competitions [17]
  • 20. 20 Key Takeaways → Communication, version control, logging, documentation, debugging → Automatically evaluate all components of ML pipeline → High model accuracy is not always the answer → Scope down, then scale up [18]
  • 21. 21 You Did It! Code is in prod! Celebrate!
  • 22. 22 You Did It! Code is in prod! Celebrate! … But not too hard. Tomorrow you start on v2.
  • 23. Questions? 23 PS: We’re Hiring! https://www.dia.com/careers
  • 24. [ 1] http://www.projectcartoon.com/cartoon/1 [ 2] https://research.google.com/pubs/pub43146.html [ 3] https://www.linkedin.com/pulse/when-your-tech-debt-comes-due-kevin-scott [ 4] https://www.sec.gov/news/press-release/2013-222 [ 5] http://dilbert.com/strip/2017-01-03 [ 6] https://www.joelonsoftware.com/2000/08/09/the-joel-test-12-steps-to-better-code/ [ 7] https://solidgeargroup.com/wp-content/uploads/2016/07/tower_cheatsheet_white_EN_0.pdf [ 8] http://geek-and-poke.com/geekandpoke/2014/2/23/dev-cycle-friday-evening-edition [ 9] https://www.eecs.tufts.edu/~dsculley/papers/ml_test_score.pdf [10] http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf [11] https://www.slideshare.net/Tech_InMobi/building-machine-learning-pipelines [12] https://www.researchgate.net/publication/262569275_Testing_Scientific_Software_A_Systematic_Literature_Review [13] https://classeval.wordpress.com/introduction/basic-evaluation-measures/ [14] http://geek-and-poke.com/geekandpoke/2015/10/18/why-logging-is-so-important [15] https://www.pinterest.com/explore/programming-humor/ [16] https://www.devrant.io/search?term=debugging [17] https://marketoonist.com/2015/03/hackathons.html [18] https://s-media-cache-ak0.pinimg.com/originals/9c/25/08/9c25082f5c4d3477124356e45673d426.png [19] https://www.pinterest.com/pin/177258935306351629/ 24 References
  • 25. 25 Appendix: Establish Business Use Case → Kick-off meeting with stakeholders: • Discuss use case, motivation and scope • Find out about format of deliverable and how it will be used by team • Brainstorm and discuss potential solutions • Iron-out deadlines, checkpoints and on-going support structure • Ask about prod env (if appropriate) • Scope down, then scale up • Close meeting with recap of action items Key Takeaways: communication + clear expectations [19]