SlideShare una empresa de Scribd logo
1 de 30
Descargar para leer sin conexión
Three lessons learned
from building a production
machine learning system
Michael Manapat
Stripe
@mlmanapat
Fraud
• Card numbers are stolen by hacking, malware, etc.
• “Dumps” are sold in “carding” forums
• Fraudsters use numbers in dumps to buy goods,
which they then resell
• Cardholders dispute transactions
• Merchant ends up bearing cost of fraud
• We train binary classifiers to predict fraud
• We use open source tools
• Scalding/Summingbird for feature generation
• scikit-learn for model training

(eventually: github.com/stripe/brushfire)
1
Don’t treat models as
black boxes
Early ML at Stripe
• Focused on training with more and more data and
adding more and more features
• Didn’t think much about
• ML algorithms (tuning hyperparameters, e.g.)
• The deeper reasons behind any particular set of
results
Substantial reduction in fraud rate
Product development
From a product standpoint:
• We were blocking high risk charges and surfacing
just the decision
• We wanted to provide Stripe users insight into our
actions—reasons for scores
Score reasons
X = 5, Y = 3: score = 0.1
Which feature is “driving” the score more?
X < 10
Y < 5 X < 15
0.1 (20) 0.3 (30) 0.5 (10) 0.9 (40)
True False
Score reasons
X = ?, Y = 3:

(20/70) * 0.1 + (10/70) * 0.5 + (40/70) * 0.9 = 0.61
Score Δ = |holdout - original| = |0.61 - 0.1| = 0.51
Now producing richer reasons with multiple predicates
X < ?
Y < 5 X < ?
0.1 (20) 0.3 (30) 0.5 (10) 0.9 (40)
Model introspection
If a model didn’t look good in validation, it wasn’t clear what
to do (besides trying more features/data)
What if we used our “score reasons” to debug model issues?
• Take all false positives (in validation data or in
production) and group by generated reason
• Were a substantial fraction of the false positives
driven by a few features?
• Did all the comparisons in the explanation
predicates make sense? (Were they comparisons a
human might make for fraud?)
• Our models were overfit!
Actioning insights
• Hyperparameter optimization
• Feature selection
Precision
Recall
Summary
• Don’t treat models as black boxes
• Thinking about the learning process (vs. just
features and data) can yield significant payoffs
• Tooling for introspection can accelerate model
development/“debugging”
Julia Evans, Alyssa Frazee, Erik Osheim, Sam Ritchie,
Jocelyn Ross, Tom Switzer
2
Have a plan for
counterfactual
evaluation
• December 31st, 2013
• Train a binary classifier for
disputes on data from Jan
1st to Sep 30th
• Validate on data from Oct
1st to Oct 31st (need to wait
~60 days for labels)
• Based on validation data, pick
a policy for actioning scores:

block if score > 50
Questions (1)
• Business complains about high false positive rate:
what would happen if we changed the policy to
"block if score > 70"?
• What are the production precision and recall of the
model?
• December 31st, 2014. We
repeat the exercise from a
year earlier
• Train a model on data
from Jan 1st to Sep 30th
• Validate on data from Oct
1st to Oct 31st (need to
wait ~60 days for labels)
• Validation results look
~ok (but not great)
• We put the model into
production and the results
are terrible
Questions (2)
• Why did the validation results for the new model
look so much worse?
• How do we know if the retrained model really is
better than the original model?
Counterfactual evaluation
• Our model changes reality (the world is different
because of its existence)
• We can answer some questions (around model
comparisons) with A/B tests
• For all these questions, we want an approximation
of the charge/outcome distribution that would exist
if there were no model
One approach
• Probabilistically reverse a
small fraction of our block
decisions
• The higher the score, the lower
probability we let the charge
through
• Weight samples by 1 / P(allow)
• Get information on the area we
want to improve on
ID Score p(Allow)
Original
Action
Selected
Action
Outcome
1 10 1.0 Allow Allow OK
2 45 1.0 Allow Allow Fraud
3 55 0.30 Block Block -
4 65 0.20 Block Allow Fraud
5 100 0.0005 Block Block -
6 60 0.25 Block Allow OK
ID Score P(Allow) Weight
Original
Action
Selected
Action
Outcome
1 10 1.0 1 Allow Allow OK
2 45 1.0 1 Allow Allow Fraud
4 65 0.20 5 Block Allow Fraud
6 60 0.25 4 Block Allow OK
Evaluating the "block if score > 50" policy
Precision = 5 / 9 = 0.56
Recall = 5 / 6 = 0.83
• The propensity function controls the exploration/
exploitation tradeoff
• Precision, recall, etc. are estimators
• Variance of the estimators decreases the more we
allow through
• Bootstrap to get error bars (pick rows from the table
uniformly at random with replacement)
• Li, Chen, Kleban, Gupta: "Counterfactual Estimation
and Optimization of Click Metrics for Search Engines"
Summary
• Have a plan for counterfactual evaluation before
you productionize your first model
• You can back yourself into a corner (with no data to
retrain on) if you address this later
• You should be monitoring the production
performance of your model anyway (cf. next lesson)
Alyssa Frazee, Julia Evans, Roban Kramer, Ryan
Wang
3
Invest in production
monitoring for your
models
Production vs. data stack
• Ruby/Mongo vs. Scala/Hadoop/Thrift
• Some issues
• Divergence between production and training
definitions
• Upstream changes to library code in production
feature generation can change feature definitions
• True vs. “True”
Domain-specific
scoring
service
(business
logic)
“Pure”
model
evaluation
service
Aggregation
jobs
Aggregation jobs keep
track of
• Overall action rate and
rate per Stripe user
• Score distributions
• Feature distributions (%
null, p50/p90 for
numerical values, etc.)
Logged
scoring
requests
Aggregation
jobs (get all
aggregates per
model)
Logged
scoring
requests
Domain-specific
scoring
service
(business
logic)
“Pure”
model
evaluation
service
Summary
• Monitor the production inputs to and outputs of
your models
• Have dashboards that can be watched on deploys
and alerting for significant anomalies
• Bake the monitoring into generic ML infrastructure
(so that each ML application isn’t redoing this)
Steve Mardenfeld, Tom Switzer
• Don’t treat models as black boxes
• Have a plan for counterfactual evaluation before
productionizing your first model
• Build production monitoring for action rates, score
distributions, and feature distributions (and bake
into ML infra)

Thanks
Stripe is hiring data scientists, engineers, and
engineering managers!
mlm@stripe.com | @mlmanapat

Más contenido relacionado

La actualidad más candente

Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...Andreas Grabner
 
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyDocker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyAndreas Grabner
 
Hugs instead of Bugs: Dreaming of Quality Tools for Devs and Testers
Hugs instead of Bugs: Dreaming of Quality Tools for Devs and TestersHugs instead of Bugs: Dreaming of Quality Tools for Devs and Testers
Hugs instead of Bugs: Dreaming of Quality Tools for Devs and TestersAndreas Grabner
 
Agile Data
Agile DataAgile Data
Agile Dataodsc
 
Database DevOps Anti-patterns
Database DevOps Anti-patternsDatabase DevOps Anti-patterns
Database DevOps Anti-patternsAlex Yates
 
How to keep you out of the News: Web and End-to-End Performance Tips
How to keep you out of the News: Web and End-to-End Performance TipsHow to keep you out of the News: Web and End-to-End Performance Tips
How to keep you out of the News: Web and End-to-End Performance TipsAndreas Grabner
 
Getting CI right for SQL Server
Getting CI right for SQL ServerGetting CI right for SQL Server
Getting CI right for SQL ServerAlex Yates
 
MLconf NYC Ted Willke
MLconf NYC Ted WillkeMLconf NYC Ted Willke
MLconf NYC Ted WillkeMLconf
 
DevOps 101 for data professionals
DevOps 101 for data professionalsDevOps 101 for data professionals
DevOps 101 for data professionalsAlex Yates
 
Building Scalable Prediction Services in R
Building Scalable Prediction Services in RBuilding Scalable Prediction Services in R
Building Scalable Prediction Services in RWork-Bench
 
Scaling Your Architecture with Services and Events
Scaling Your Architecture with Services and EventsScaling Your Architecture with Services and Events
Scaling Your Architecture with Services and EventsRandy Shoup
 
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!Andreas Grabner
 
Managing Data in Microservices
Managing Data in MicroservicesManaging Data in Microservices
Managing Data in MicroservicesRandy Shoup
 
HSPS 2015 - SharePoint Performance Santiy Checks
HSPS 2015 - SharePoint Performance Santiy ChecksHSPS 2015 - SharePoint Performance Santiy Checks
HSPS 2015 - SharePoint Performance Santiy ChecksAndreas Grabner
 
Four Practices to Fix Your Top .NET Performance Problems
Four Practices to Fix Your Top .NET Performance ProblemsFour Practices to Fix Your Top .NET Performance Problems
Four Practices to Fix Your Top .NET Performance ProblemsAndreas Grabner
 
Workbooks-Convert-OMG-LOL!.PDF
Workbooks-Convert-OMG-LOL!.PDFWorkbooks-Convert-OMG-LOL!.PDF
Workbooks-Convert-OMG-LOL!.PDFKyle Lambert
 
Making operations visible - devopsdays tokyo 2013
Making operations visible  - devopsdays tokyo 2013Making operations visible  - devopsdays tokyo 2013
Making operations visible - devopsdays tokyo 2013Nick Galbreath
 
Web and App Performance: Top Problems to avoid to keep you out of the News
Web and App Performance: Top Problems to avoid to keep you out of the NewsWeb and App Performance: Top Problems to avoid to keep you out of the News
Web and App Performance: Top Problems to avoid to keep you out of the NewsAndreas Grabner
 
DevSecOps - London Gathering : June 2018
DevSecOps - London Gathering : June 2018DevSecOps - London Gathering : June 2018
DevSecOps - London Gathering : June 2018Michael Man
 

La actualidad más candente (20)

Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
 
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyDocker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
 
Hugs instead of Bugs: Dreaming of Quality Tools for Devs and Testers
Hugs instead of Bugs: Dreaming of Quality Tools for Devs and TestersHugs instead of Bugs: Dreaming of Quality Tools for Devs and Testers
Hugs instead of Bugs: Dreaming of Quality Tools for Devs and Testers
 
Agile Data
Agile DataAgile Data
Agile Data
 
DMCA#21: reactive-programming
DMCA#21: reactive-programmingDMCA#21: reactive-programming
DMCA#21: reactive-programming
 
Database DevOps Anti-patterns
Database DevOps Anti-patternsDatabase DevOps Anti-patterns
Database DevOps Anti-patterns
 
How to keep you out of the News: Web and End-to-End Performance Tips
How to keep you out of the News: Web and End-to-End Performance TipsHow to keep you out of the News: Web and End-to-End Performance Tips
How to keep you out of the News: Web and End-to-End Performance Tips
 
Getting CI right for SQL Server
Getting CI right for SQL ServerGetting CI right for SQL Server
Getting CI right for SQL Server
 
MLconf NYC Ted Willke
MLconf NYC Ted WillkeMLconf NYC Ted Willke
MLconf NYC Ted Willke
 
DevOps 101 for data professionals
DevOps 101 for data professionalsDevOps 101 for data professionals
DevOps 101 for data professionals
 
Building Scalable Prediction Services in R
Building Scalable Prediction Services in RBuilding Scalable Prediction Services in R
Building Scalable Prediction Services in R
 
Scaling Your Architecture with Services and Events
Scaling Your Architecture with Services and EventsScaling Your Architecture with Services and Events
Scaling Your Architecture with Services and Events
 
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
 
Managing Data in Microservices
Managing Data in MicroservicesManaging Data in Microservices
Managing Data in Microservices
 
HSPS 2015 - SharePoint Performance Santiy Checks
HSPS 2015 - SharePoint Performance Santiy ChecksHSPS 2015 - SharePoint Performance Santiy Checks
HSPS 2015 - SharePoint Performance Santiy Checks
 
Four Practices to Fix Your Top .NET Performance Problems
Four Practices to Fix Your Top .NET Performance ProblemsFour Practices to Fix Your Top .NET Performance Problems
Four Practices to Fix Your Top .NET Performance Problems
 
Workbooks-Convert-OMG-LOL!.PDF
Workbooks-Convert-OMG-LOL!.PDFWorkbooks-Convert-OMG-LOL!.PDF
Workbooks-Convert-OMG-LOL!.PDF
 
Making operations visible - devopsdays tokyo 2013
Making operations visible  - devopsdays tokyo 2013Making operations visible  - devopsdays tokyo 2013
Making operations visible - devopsdays tokyo 2013
 
Web and App Performance: Top Problems to avoid to keep you out of the News
Web and App Performance: Top Problems to avoid to keep you out of the NewsWeb and App Performance: Top Problems to avoid to keep you out of the News
Web and App Performance: Top Problems to avoid to keep you out of the News
 
DevSecOps - London Gathering : June 2018
DevSecOps - London Gathering : June 2018DevSecOps - London Gathering : June 2018
DevSecOps - London Gathering : June 2018
 

Destacado

DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchDataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchHakka Labs
 
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor DataDataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor DataHakka Labs
 
DataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineeringDataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineeringHakka Labs
 
DataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scaleDataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scaleHakka Labs
 
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using SparkDataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using SparkHakka Labs
 
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQDataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQHakka Labs
 
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestDataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestHakka Labs
 
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedInDataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedInHakka Labs
 
DataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with OurselvesDataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with OurselvesHakka Labs
 
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High DeliverabilityDataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High DeliverabilityHakka Labs
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale Hakka Labs
 
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...Hakka Labs
 
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...Hakka Labs
 
DataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at InstacartDataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at InstacartHakka Labs
 
Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)Hakka Labs
 
DataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data StructuresDataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data StructuresHakka Labs
 
DataEngConf: Apache Spark in Financial Modeling at BlackRock
DataEngConf: Apache Spark in Financial Modeling at BlackRock DataEngConf: Apache Spark in Financial Modeling at BlackRock
DataEngConf: Apache Spark in Financial Modeling at BlackRock Hakka Labs
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleHakka Labs
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity ResolutionBenjamin Bengfort
 

Destacado (20)

DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchDataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series search
 
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor DataDataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
 
DataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineeringDataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineering
 
DataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scaleDataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scale
 
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using SparkDataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
 
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQDataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
 
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestDataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
 
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedInDataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
 
DataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with OurselvesDataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with Ourselves
 
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High DeliverabilityDataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale
 
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
 
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
 
DataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at InstacartDataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at Instacart
 
Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)
 
DataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data StructuresDataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data Structures
 
DataEngConf: Apache Spark in Financial Modeling at BlackRock
DataEngConf: Apache Spark in Financial Modeling at BlackRock DataEngConf: Apache Spark in Financial Modeling at BlackRock
DataEngConf: Apache Spark in Financial Modeling at BlackRock
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
 

Similar a Three lessons from building a production ML system

Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
 
Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem Dataiku
 
Why Automated Testing Matters To DevOps
Why Automated Testing Matters To DevOpsWhy Automated Testing Matters To DevOps
Why Automated Testing Matters To DevOpsdpaulmerrill
 
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)Amazon Web Services
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Alok Singh
 
[QE 2018] Paul Gerrard – Automating Assurance: Tools, Collaboration and DevOps
[QE 2018] Paul Gerrard – Automating Assurance: Tools, Collaboration and DevOps[QE 2018] Paul Gerrard – Automating Assurance: Tools, Collaboration and DevOps
[QE 2018] Paul Gerrard – Automating Assurance: Tools, Collaboration and DevOpsFuture Processing
 
10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptx10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptxkpcp
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemPierre Gutierrez
 
ML Application Life Cycle
ML Application Life CycleML Application Life Cycle
ML Application Life CycleSrujanaMerugu1
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableJustin Basilico
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for TestingSQALab
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining ProcessMarc Berman
 
A New Model For Testing
A New Model For TestingA New Model For Testing
A New Model For TestingTEST Huddle
 
AI Models For Fun and Profit by Walmart Director of Artificial Intelligence
AI Models For Fun and Profit by Walmart Director of Artificial IntelligenceAI Models For Fun and Profit by Walmart Director of Artificial Intelligence
AI Models For Fun and Profit by Walmart Director of Artificial IntelligenceProduct School
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?Michaela Greiler
 
Business process simulations: from GREAT! to good, Razvan Radulian, Sept 2013
Business process simulations: from GREAT! to good, Razvan Radulian, Sept 2013Business process simulations: from GREAT! to good, Razvan Radulian, Sept 2013
Business process simulations: from GREAT! to good, Razvan Radulian, Sept 2013Why-What-How Consulting, LLC
 

Similar a Three lessons from building a production ML system (20)

Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem
 
Before Kaggle
Before KaggleBefore Kaggle
Before Kaggle
 
Why Automated Testing Matters To DevOps
Why Automated Testing Matters To DevOpsWhy Automated Testing Matters To DevOps
Why Automated Testing Matters To DevOps
 
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
 
[QE 2018] Paul Gerrard – Automating Assurance: Tools, Collaboration and DevOps
[QE 2018] Paul Gerrard – Automating Assurance: Tools, Collaboration and DevOps[QE 2018] Paul Gerrard – Automating Assurance: Tools, Collaboration and DevOps
[QE 2018] Paul Gerrard – Automating Assurance: Tools, Collaboration and DevOps
 
10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptx10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptx
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
 
ML Application Life Cycle
ML Application Life CycleML Application Life Cycle
ML Application Life Cycle
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for Testing
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 
New model
New modelNew model
New model
 
A New Model For Testing
A New Model For TestingA New Model For Testing
A New Model For Testing
 
AI Models For Fun and Profit by Walmart Director of Artificial Intelligence
AI Models For Fun and Profit by Walmart Director of Artificial IntelligenceAI Models For Fun and Profit by Walmart Director of Artificial Intelligence
AI Models For Fun and Profit by Walmart Director of Artificial Intelligence
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?
 
Business process simulations: from GREAT! to good, Razvan Radulian, Sept 2013
Business process simulations: from GREAT! to good, Razvan Radulian, Sept 2013Business process simulations: from GREAT! to good, Razvan Radulian, Sept 2013
Business process simulations: from GREAT! to good, Razvan Radulian, Sept 2013
 

Más de Hakka Labs

DataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL WorkshopDataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL WorkshopHakka Labs
 
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...Hakka Labs
 
DataEngConf: Data Science at the New York Times by Chris Wiggins
DataEngConf: Data Science at the New York Times by Chris WigginsDataEngConf: Data Science at the New York Times by Chris Wiggins
DataEngConf: Data Science at the New York Times by Chris WigginsHakka Labs
 
DataEngConf: Building the Next New York Times Recommendation Engine
DataEngConf: Building the Next New York Times Recommendation EngineDataEngConf: Building the Next New York Times Recommendation Engine
DataEngConf: Building the Next New York Times Recommendation EngineHakka Labs
 
DataEngConf: Measuring Impact with Data in a Distributed World at Conde Nast
DataEngConf: Measuring Impact with Data in a Distributed World at Conde NastDataEngConf: Measuring Impact with Data in a Distributed World at Conde Nast
DataEngConf: Measuring Impact with Data in a Distributed World at Conde NastHakka Labs
 
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...Hakka Labs
 
DataEngConf: The Science of Virality at BuzzFeed
DataEngConf: The Science of Virality at BuzzFeedDataEngConf: The Science of Virality at BuzzFeed
DataEngConf: The Science of Virality at BuzzFeedHakka Labs
 
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...Hakka Labs
 
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataDataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataHakka Labs
 
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...Hakka Labs
 
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedInDataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedInHakka Labs
 

Más de Hakka Labs (11)

DataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL WorkshopDataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL Workshop
 
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
 
DataEngConf: Data Science at the New York Times by Chris Wiggins
DataEngConf: Data Science at the New York Times by Chris WigginsDataEngConf: Data Science at the New York Times by Chris Wiggins
DataEngConf: Data Science at the New York Times by Chris Wiggins
 
DataEngConf: Building the Next New York Times Recommendation Engine
DataEngConf: Building the Next New York Times Recommendation EngineDataEngConf: Building the Next New York Times Recommendation Engine
DataEngConf: Building the Next New York Times Recommendation Engine
 
DataEngConf: Measuring Impact with Data in a Distributed World at Conde Nast
DataEngConf: Measuring Impact with Data in a Distributed World at Conde NastDataEngConf: Measuring Impact with Data in a Distributed World at Conde Nast
DataEngConf: Measuring Impact with Data in a Distributed World at Conde Nast
 
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
 
DataEngConf: The Science of Virality at BuzzFeed
DataEngConf: The Science of Virality at BuzzFeedDataEngConf: The Science of Virality at BuzzFeed
DataEngConf: The Science of Virality at BuzzFeed
 
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
 
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataDataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
 
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
 
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedInDataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
 

Último

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Three lessons from building a production ML system

  • 1. Three lessons learned from building a production machine learning system Michael Manapat Stripe @mlmanapat
  • 2. Fraud • Card numbers are stolen by hacking, malware, etc. • “Dumps” are sold in “carding” forums • Fraudsters use numbers in dumps to buy goods, which they then resell • Cardholders dispute transactions • Merchant ends up bearing cost of fraud
  • 3. • We train binary classifiers to predict fraud • We use open source tools • Scalding/Summingbird for feature generation • scikit-learn for model training
 (eventually: github.com/stripe/brushfire)
  • 4. 1 Don’t treat models as black boxes
  • 5. Early ML at Stripe • Focused on training with more and more data and adding more and more features • Didn’t think much about • ML algorithms (tuning hyperparameters, e.g.) • The deeper reasons behind any particular set of results Substantial reduction in fraud rate
  • 6. Product development From a product standpoint: • We were blocking high risk charges and surfacing just the decision • We wanted to provide Stripe users insight into our actions—reasons for scores
  • 7. Score reasons X = 5, Y = 3: score = 0.1 Which feature is “driving” the score more? X < 10 Y < 5 X < 15 0.1 (20) 0.3 (30) 0.5 (10) 0.9 (40) True False
  • 8. Score reasons X = ?, Y = 3:
 (20/70) * 0.1 + (10/70) * 0.5 + (40/70) * 0.9 = 0.61 Score Δ = |holdout - original| = |0.61 - 0.1| = 0.51 Now producing richer reasons with multiple predicates X < ? Y < 5 X < ? 0.1 (20) 0.3 (30) 0.5 (10) 0.9 (40)
  • 9. Model introspection If a model didn’t look good in validation, it wasn’t clear what to do (besides trying more features/data) What if we used our “score reasons” to debug model issues?
  • 10. • Take all false positives (in validation data or in production) and group by generated reason • Were a substantial fraction of the false positives driven by a few features? • Did all the comparisons in the explanation predicates make sense? (Were they comparisons a human might make for fraud?) • Our models were overfit!
  • 11. Actioning insights • Hyperparameter optimization • Feature selection Precision Recall
  • 12. Summary • Don’t treat models as black boxes • Thinking about the learning process (vs. just features and data) can yield significant payoffs • Tooling for introspection can accelerate model development/“debugging” Julia Evans, Alyssa Frazee, Erik Osheim, Sam Ritchie, Jocelyn Ross, Tom Switzer
  • 13. 2 Have a plan for counterfactual evaluation
  • 14. • December 31st, 2013 • Train a binary classifier for disputes on data from Jan 1st to Sep 30th • Validate on data from Oct 1st to Oct 31st (need to wait ~60 days for labels) • Based on validation data, pick a policy for actioning scores:
 block if score > 50
  • 15. Questions (1) • Business complains about high false positive rate: what would happen if we changed the policy to "block if score > 70"? • What are the production precision and recall of the model?
  • 16. • December 31st, 2014. We repeat the exercise from a year earlier • Train a model on data from Jan 1st to Sep 30th • Validate on data from Oct 1st to Oct 31st (need to wait ~60 days for labels) • Validation results look ~ok (but not great) • We put the model into production and the results are terrible
  • 17. Questions (2) • Why did the validation results for the new model look so much worse? • How do we know if the retrained model really is better than the original model?
  • 18. Counterfactual evaluation • Our model changes reality (the world is different because of its existence) • We can answer some questions (around model comparisons) with A/B tests • For all these questions, we want an approximation of the charge/outcome distribution that would exist if there were no model
  • 19. One approach • Probabilistically reverse a small fraction of our block decisions • The higher the score, the lower probability we let the charge through • Weight samples by 1 / P(allow) • Get information on the area we want to improve on
  • 20. ID Score p(Allow) Original Action Selected Action Outcome 1 10 1.0 Allow Allow OK 2 45 1.0 Allow Allow Fraud 3 55 0.30 Block Block - 4 65 0.20 Block Allow Fraud 5 100 0.0005 Block Block - 6 60 0.25 Block Allow OK
  • 21. ID Score P(Allow) Weight Original Action Selected Action Outcome 1 10 1.0 1 Allow Allow OK 2 45 1.0 1 Allow Allow Fraud 4 65 0.20 5 Block Allow Fraud 6 60 0.25 4 Block Allow OK Evaluating the "block if score > 50" policy Precision = 5 / 9 = 0.56 Recall = 5 / 6 = 0.83
  • 22. • The propensity function controls the exploration/ exploitation tradeoff • Precision, recall, etc. are estimators • Variance of the estimators decreases the more we allow through • Bootstrap to get error bars (pick rows from the table uniformly at random with replacement) • Li, Chen, Kleban, Gupta: "Counterfactual Estimation and Optimization of Click Metrics for Search Engines"
  • 23. Summary • Have a plan for counterfactual evaluation before you productionize your first model • You can back yourself into a corner (with no data to retrain on) if you address this later • You should be monitoring the production performance of your model anyway (cf. next lesson) Alyssa Frazee, Julia Evans, Roban Kramer, Ryan Wang
  • 25. Production vs. data stack • Ruby/Mongo vs. Scala/Hadoop/Thrift • Some issues • Divergence between production and training definitions • Upstream changes to library code in production feature generation can change feature definitions • True vs. “True”
  • 26. Domain-specific scoring service (business logic) “Pure” model evaluation service Aggregation jobs Aggregation jobs keep track of • Overall action rate and rate per Stripe user • Score distributions • Feature distributions (% null, p50/p90 for numerical values, etc.) Logged scoring requests
  • 27. Aggregation jobs (get all aggregates per model) Logged scoring requests Domain-specific scoring service (business logic) “Pure” model evaluation service
  • 28. Summary • Monitor the production inputs to and outputs of your models • Have dashboards that can be watched on deploys and alerting for significant anomalies • Bake the monitoring into generic ML infrastructure (so that each ML application isn’t redoing this) Steve Mardenfeld, Tom Switzer
  • 29. • Don’t treat models as black boxes • Have a plan for counterfactual evaluation before productionizing your first model • Build production monitoring for action rates, score distributions, and feature distributions (and bake into ML infra)

  • 30. Thanks Stripe is hiring data scientists, engineers, and engineering managers! mlm@stripe.com | @mlmanapat