[PositConf 2023] How Data Scientists Broke A/B Testing (and How We Can Fix It)

C
Carl VogelData Scientist
HowDataScientists
BrokeA/BTesting
(andhowwecanfixit)
Questions?
pos.it/slido-A
A Completely
True Story
[PositConf 2023] How Data Scientists Broke A/B Testing (and How We Can Fix It)
Launch on
Neutral
(But thanks anyway)
ExistentialDread
(Get used to it)
A real PM
“If it’s something we really believe
in, I’ll launch on a flat result … if
it’s part of a broader strategy.”
“My features are hard as shit to build,
but easy to tweak, so I’m not always
worried about statistical significance.”
Another real PM
NotjustNHST
Features aren’t IID
Path dependencies in
feature roadmaps
We develop experiences by
building up features over
time and it’s helpful to
launch them incrementally
MDE is basically zero
Feature costs are nearly all
sunk before the test
Any lift pays off
NotjustNHST
Risk is mismeasured
Decision makers don’t
think about Type I and II
error rates, per se
They just want to make
more money than they lose
CanImakegood
decisionsabout
smalltomoderate
effectsquickly?
Youcan’tmake
reliableinferences
aboutsmallto
moderateeffects
quickly.
Didtheymisusethetool?
Ordidwehandthemthewrongone?
Non-Inferiority
Designs
Non-inferioritydesigns
Let’s try not to wreck the place
Superiority Non-Inferiority
Non-inferioritydesigns
Let’s try not to wreck the place
• Inferiority margins ( ) prompt us to ask:
• How much do we believe in this feature?
• How quickly will we improve on it?
• Stakeholders can give meaningful answers to these questions
• Compare to MDE/minimal lift, which is often made up
• Avoid meaningless minimum e
ff
ect estimates
• Can power against a “no e
ff
ect” alternative
Δ
[PositConf 2023] How Data Scientists Broke A/B Testing (and How We Can Fix It)
What’s
the rush?
Thecostsoflongexperiments
Time is money, folks
• Opportunity cost of time:
• Experimental features live on a roadmap, waiting for launch decisions
delays development of subsequent features
• Opportunity cost of sampling:
• As long as the experiment runs, many users aren’t getting the best
variant
• Maintenance costs:
• More experiments running means more complexity in the codebase,
more e
ff
ort, etc.
Value of
Information
Designs
Whenisdataworthit?
Good things are worth waiting for
•Waiting is costly, but data is valuable.
•We should keep going as long as the value
of more data exceeds the cost of more time
•Quantify our impatience as part of test
design
ExpectedValuevs.CostofData
$0
$20,000
$40,000
$60,000
$80,000
Test Length
0 15 30 45 60
Exp. Value
Cost
Net Exp.
Value
Whyisdatavaluable?
How dumb am I, in dollars?
• Before we have data, our range of potential lifts is wide
• Our best guess could be way o
ff
; we could make a big
mistake
• Observing data narrows the range, even if our new guess is
wrong, it won’t be wrong by as much.
• If the value of being less wrong (in expectation) exceeds the
cost of waiting for the data, LFG!
ExpectedValueofSampleInformation
ExpectedValueofSampleInformation
ExpectedValueofSampleInformation
ExpectedValueofSampleInformation
ExpectedValueofSampleInformation
ExpectedValueofSampleInformation
$0
$10K
$200K
Sequentialtestingdecisions
Don’t stop ’til you get enough
• We can do this again after collecting some data
• This changes the core decision from: “is B > A?” to “should I stop or
continue testing?”
• Good
fi
t for A/B tests, where we collect data passively just by
waiting
• Once more data isn’t worth it, launch the best observed variant,
the inference problem is irrelevant (Claxton ’96)
• This is our best information, and it’s not worth getting more
Lessons
What’stheProblem?
Going back to basics
There’s no silver bullet
You may have other problems; you’ll need
other solutions
Misuse of tools should prompt us to
rethink the problem
What are we actually trying to solve?
What are the costs, benefits, and risks?
What’stheProblem?
Going back to basics
Are we solving the problem, or treating
symptoms?
Launch-on-neutral, run-til-significant, peeking,
etc. are symptoms, not the root problem
Lots of advanced techniques speed up tests, but
don’t actually address reasons for impatience
Here,there,andeverywhere
You’re soaking in it
This isn’t just about A/B testing
But it’s a domain where we have very
familiar tools close at hand
Whatareweherefor?
People who solve problems for people are the luckiest people in the world
This is the fun stuff
This is where we add value as data
scientists
These problems aren’t solved
Try new stuff!
Carl Vogel
Principal Data Scientist
carl.vogel@babylist.com
Thanks!
1 de 34

Recomendados

Tale of Two Tests por
Tale of Two TestsTale of Two Tests
Tale of Two TestsOptimizely
239 vistas41 diapositivas
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making por
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision MakingData-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Makingindeedeng
2.5K vistas227 diapositivas
To Estimate or Not to Estimate, Is that the Question? (2017 Better Software C... por
To Estimate or Not to Estimate, Is that the Question? (2017 Better Software C...To Estimate or Not to Estimate, Is that the Question? (2017 Better Software C...
To Estimate or Not to Estimate, Is that the Question? (2017 Better Software C...Matthew Philip
574 vistas50 diapositivas
The Myths of Big Data por
The Myths of Big DataThe Myths of Big Data
The Myths of Big DataProphet
12.7K vistas29 diapositivas
Iwsm2014 why cant people estimate (dan galorath) por
Iwsm2014   why cant people estimate (dan galorath)Iwsm2014   why cant people estimate (dan galorath)
Iwsm2014 why cant people estimate (dan galorath)Nesma
973 vistas40 diapositivas
Building a culture of testing like lucid por
Building a culture of testing like lucidBuilding a culture of testing like lucid
Building a culture of testing like lucidKissmetrics on SlideShare
497 vistas22 diapositivas

Más contenido relacionado

Similar a [PositConf 2023] How Data Scientists Broke A/B Testing (and How We Can Fix It)

What do we do with all this big por
What do we do with all this big What do we do with all this big
What do we do with all this big Rajeev Ranjan Dwivedi
26 vistas16 diapositivas
Portfolio Management Using Questionable Quality Data por
Portfolio Management Using Questionable Quality DataPortfolio Management Using Questionable Quality Data
Portfolio Management Using Questionable Quality DataPortfolio Decisions
269 vistas32 diapositivas
GDG Cloud Southlake #5 Eric Harvieux: Site Reliability Engineering (SRE) in P... por
GDG Cloud Southlake #5 Eric Harvieux: Site Reliability Engineering (SRE) in P...GDG Cloud Southlake #5 Eric Harvieux: Site Reliability Engineering (SRE) in P...
GDG Cloud Southlake #5 Eric Harvieux: Site Reliability Engineering (SRE) in P...James Anderson
198 vistas10 diapositivas
mtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdf por
mtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdfmtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdf
mtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdfJens-Fabian Goetzmann
557 vistas17 diapositivas
Managing Data Science by David Martínez Rego por
Managing Data Science by David Martínez RegoManaging Data Science by David Martínez Rego
Managing Data Science by David Martínez RegoBig Data Spain
556 vistas24 diapositivas
How to use data to make a hit tv show por
How to use data to make a hit tv showHow to use data to make a hit tv show
How to use data to make a hit tv showParul Verma
67 vistas17 diapositivas

Similar a [PositConf 2023] How Data Scientists Broke A/B Testing (and How We Can Fix It)(20)

Portfolio Management Using Questionable Quality Data por Portfolio Decisions
Portfolio Management Using Questionable Quality DataPortfolio Management Using Questionable Quality Data
Portfolio Management Using Questionable Quality Data
Portfolio Decisions269 vistas
GDG Cloud Southlake #5 Eric Harvieux: Site Reliability Engineering (SRE) in P... por James Anderson
GDG Cloud Southlake #5 Eric Harvieux: Site Reliability Engineering (SRE) in P...GDG Cloud Southlake #5 Eric Harvieux: Site Reliability Engineering (SRE) in P...
GDG Cloud Southlake #5 Eric Harvieux: Site Reliability Engineering (SRE) in P...
James Anderson198 vistas
mtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdf por Jens-Fabian Goetzmann
mtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdfmtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdf
mtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdf
Managing Data Science by David Martínez Rego por Big Data Spain
Managing Data Science by David Martínez RegoManaging Data Science by David Martínez Rego
Managing Data Science by David Martínez Rego
Big Data Spain556 vistas
How to use data to make a hit tv show por Parul Verma
How to use data to make a hit tv showHow to use data to make a hit tv show
How to use data to make a hit tv show
Parul Verma67 vistas
Software estimation is crap por Ian Garrison
Software estimation is crapSoftware estimation is crap
Software estimation is crap
Ian Garrison66 vistas
Is data visualisation bullshit? por Alban Gérôme
Is data visualisation bullshit?Is data visualisation bullshit?
Is data visualisation bullshit?
Alban Gérôme637 vistas
CommonAnalyticMistakes_v1.17_Unbranded por Jim Parnitzke
CommonAnalyticMistakes_v1.17_UnbrandedCommonAnalyticMistakes_v1.17_Unbranded
CommonAnalyticMistakes_v1.17_Unbranded
Jim Parnitzke190 vistas
Is Bigger Data Really Better? 10 Facts from Theory and Practice por DataWorks Summit
Is Bigger Data Really Better? 10 Facts from Theory and PracticeIs Bigger Data Really Better? 10 Facts from Theory and Practice
Is Bigger Data Really Better? 10 Facts from Theory and Practice
DataWorks Summit720 vistas
Corporate Climb Presentation por Kirill Storch
Corporate Climb PresentationCorporate Climb Presentation
Corporate Climb Presentation
Kirill Storch332 vistas
Why business people should always be involved por Jaap Vink
Why business people should always be involvedWhy business people should always be involved
Why business people should always be involved
Jaap Vink52 vistas
I love the smell of data in the morning (getting started with data science) ... por Troy Magennis
I love the smell of data in the morning (getting started with data science)  ...I love the smell of data in the morning (getting started with data science)  ...
I love the smell of data in the morning (getting started with data science) ...
Troy Magennis1.2K vistas
Module 4: Model Selection and Evaluation por Sara Hooker
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
Sara Hooker694 vistas
Intro to Data Analytics with Oscar's Director of Product por Product School
 Intro to Data Analytics with Oscar's Director of Product Intro to Data Analytics with Oscar's Director of Product
Intro to Data Analytics with Oscar's Director of Product
Product School878 vistas
The Hidden ABCs of Product Management: Reveal Your Product Blind Spots por Wes Galliher
The Hidden ABCs of Product Management: Reveal Your Product Blind SpotsThe Hidden ABCs of Product Management: Reveal Your Product Blind Spots
The Hidden ABCs of Product Management: Reveal Your Product Blind Spots
Wes Galliher337 vistas
Data-driven Product Management por Tathagat Varma
Data-driven Product ManagementData-driven Product Management
Data-driven Product Management
Tathagat Varma1.3K vistas
01 introduction to graph data science por Neo4j
01   introduction to graph data science01   introduction to graph data science
01 introduction to graph data science
Neo4j216 vistas

Último

Data about the sector workshop por
Data about the sector workshopData about the sector workshop
Data about the sector workshopinfo828217
16 vistas27 diapositivas
UNEP FI CRS Climate Risk Results.pptx por
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptxpekka28
11 vistas51 diapositivas
LIVE OAK MEMORIAL PARK.pptx por
LIVE OAK MEMORIAL PARK.pptxLIVE OAK MEMORIAL PARK.pptx
LIVE OAK MEMORIAL PARK.pptxms2332always
7 vistas6 diapositivas
Custom Tag Manager Templates por
Custom Tag Manager TemplatesCustom Tag Manager Templates
Custom Tag Manager TemplatesMarkus Baersch
28 vistas17 diapositivas
SUPER STORE SQL PROJECT.pptx por
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptxkhan888620
13 vistas16 diapositivas
[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int... por
[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int...[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int...
[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int...DataScienceConferenc1
5 vistas17 diapositivas

Último(20)

Data about the sector workshop por info828217
Data about the sector workshopData about the sector workshop
Data about the sector workshop
info82821716 vistas
UNEP FI CRS Climate Risk Results.pptx por pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 vistas
LIVE OAK MEMORIAL PARK.pptx por ms2332always
LIVE OAK MEMORIAL PARK.pptxLIVE OAK MEMORIAL PARK.pptx
LIVE OAK MEMORIAL PARK.pptx
ms2332always7 vistas
SUPER STORE SQL PROJECT.pptx por khan888620
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptx
khan88862013 vistas
[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int... por DataScienceConferenc1
[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int...[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int...
[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int...
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx por DataScienceConferenc1
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx por ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20047 vistas
Short Story Assignment by Kelly Nguyen por kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0119 vistas
Organic Shopping in Google Analytics 4.pdf por GA4 Tutorials
Organic Shopping in Google Analytics 4.pdfOrganic Shopping in Google Analytics 4.pdf
Organic Shopping in Google Analytics 4.pdf
GA4 Tutorials16 vistas
Data Journeys Hard Talk workshop final.pptx por info828217
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptx
info82821710 vistas
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ... por DataScienceConferenc1
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
PRIVACY AWRE PERSONAL DATA STORAGE por antony420421
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGE
antony4204217 vistas
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an... por StatsCommunications
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
OPPOTUS - Malaysians on Malaysia 3Q2023.pdf por Oppotus
OPPOTUS - Malaysians on Malaysia 3Q2023.pdfOPPOTUS - Malaysians on Malaysia 3Q2023.pdf
OPPOTUS - Malaysians on Malaysia 3Q2023.pdf
Oppotus23 vistas
CRM stick or twist.pptx por info828217
CRM stick or twist.pptxCRM stick or twist.pptx
CRM stick or twist.pptx
info82821711 vistas
[DSC Europe 23] Ivana Sesic - Use of AI in Public Health.pptx por DataScienceConferenc1
[DSC Europe 23] Ivana Sesic - Use of AI in Public Health.pptx[DSC Europe 23] Ivana Sesic - Use of AI in Public Health.pptx
[DSC Europe 23] Ivana Sesic - Use of AI in Public Health.pptx
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ... por DataScienceConferenc1
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...

[PositConf 2023] How Data Scientists Broke A/B Testing (and How We Can Fix It)