SlideShare una empresa de Scribd logo
1 de 50
A/B Testing: Avoiding
Common Pitfalls
Danielle Jabin

March 5, 2014
1

Make all the world’s music
available instantly to everyone,
wherever and whenever they
want it
2

As of March 5, 2014
3

Over 24 million active users

As of March 5, 2014
4

Access to more than 20 million
songs

As of March 5, 2014
5
6

But can we make it even
easier?
7

We can try…
…with A/B testing!
8

So…what’s an A/B test?
9

Control

A
Pitfall #1: Not limiting
your error rate
11

Source: assets.20bits.com/20081027/normal-curve-small.png
12

What if I flip a coin 100 times and get 51 heads?
13

What if I flip a coin 100 times and get 5 heads?
14
15

The likelihood of obtaining a
certain value under a given
distribution is measured by its
p-value
16

If there is a low likelihood that a
change is due to chance alone,
we call our results statistically
significant
17

What if I flip a coin 100 times and get 5 heads?
18

Statistical significance is measured by alpha
● alpha levels of 5% and 1% are most commonly used
– Alternatively: P(significant) = .05 or .01
19

Each alpha has a corresponding Z-score
alpha

Z-score (two-sided test)

.10

1.65

.05

1.96

.01

2.58
20

The Z-score tells us how far a
particular value is from the
mean (and what the corresponding
likelihood is)
21

Source: assets.20bits.com/20081027/normal-curve-small.png
22

Compute the Z-score at the end of the test
23

Standard deviation (σ) tells us
how spread out the numbers
are
24
25

To lock in error rates before you
start, fix your sample size
26

What should my sample size be?
● To lock in error rates before you start a test, fix your sample size
Represents the
desired power
(typically .84 for 80%
power).

Sample size in each
group (assumes equal
sized groups)

2s (Z b + Za /2 )
n=
2
difference Represents the desired
2

Standard deviation of
the outcome variable

Source: www.stanford.edu/~kcobb/hrp259/lecture11.ppt

2

Effect Size (the
difference in
means)

level of statistical
significance (typically
1.96).
27

Recap: running an A/B test
● Compute your sample size
– Using alpha, beta, standard deviation of your metric, and effect size
● Run your test! But stop once you’ve reached the fixed sample size stopping point
● Compute your z-score and compare it with the z-score for the chosen alpha level
28

Control

A
29

Resulting Z-score?
30

33.3
Pitfall #2: Stopping your
test before the fixed
sample size stopping
point
32

Sample size for varying alpha levels
● With σ = 10, difference in means = 1

Two-sided test
alpha = .10, beta = .80

1230

alpha = .05, beta = .80

1568

alpha = .01, beta = .80

2339
33

Let’s see some numbers
● 1,000 experiments with 200,000 fake participants divided randomly into two groups both
receiving the exact same version, A, with a 3% conversion rate

90% significance
reached
95% significance
reached
99% significance
reached
Source: destack.home.xs4all.nl/projects/significance/

Stop at first point of
significance
654 of 1,000

Ended as significant

427 of 1,000

49 of 1,000

146 of 1,000

14 of 1,000

100 of 1,000
34

Remedies
● Don’t peek
● Okay, maybe you can peek, but don’t stop or make a decision before you reach the fixed
sample size stopping point
● Sequential sampling
35

Control

A

B
Pitfall #3: Making
multiple comparisons in
one test
37

A test can be one of two things: significant or not significant
● P(significant) + P(not significant) = 1
● Let’s take an alpha of .05
– P(significant) = .05
– P(not significant) = 1 – P(significant) = 1 - .05 = .95
38

What about for two comparisons?
● P(at least 1 significant) = 1 - P(none of the 2 are significant)
● P(none of the 2 are significant) = P(not significant)*P(not significant) = .95*.95 = .9025
● P(at least 1 significant) = 1 - .9025 = .0975
39

What about for two comparisons?

●That’s almost 2x (1.95x, to be precise) your .05
significance rate!
40

And it just gets worse…
P(at least 1 signifcant)

An increase of…

5 variations

1 – (1-.05)^5 = .23

4.6x

10 variations

1 – (1-.05)^10 = .40

8x

20 variations

1 – (1-.05)^20 = .64

12.8x
41

How can we remedy this?
●Bonferroni correction
– Divide P(significant), your alpha, by the number of variations you are testing, n
– alpha/n becomes the new level of statistical significance
42

So what about two comparisons now?
● Our new P(significant) = .05/2 = .025
● Our new P(not significant) = 1 - .025 = .975
● P(at least 1 significant) = 1 - P(none of the 2 are significant)
● P(none of the 2 are significant) = P(not significant)*P(not significant) = .975*.975 = .951
● P(at least 1 significant) = 1 - .951 = .0499
43

P(significant) stays under .05 
Corrected alpha

P(at least 1 signifcant)

5 variations

.05/5 = .01

1 – (1-.01)^5 = .049

10 variations

.05/10 = .005

1 – (1-.005)^10 = .049

20 variations

.05/20 = .0025

1 – (1-.0025)^20 = .049
Questions?
Appendix
46

A/B test steps:
1. Decide what to test
2. Determine a metric to test
3. Formulate your hypothesis
1. Select an effect size threshold: what change of the metric would make a rollout
worthwhile?
4. Calculate sample size (your stopping point)
1. Decide your Type I (alpha) and Type 2 (beta) error levels and the corresponding zscores
2. Determine the standard deviation of your metric
5. Run your test! But stop once you’ve reached the fixed sample size stopping point
6. Compute your z-score and compare it with the z-score for your chosen alpha level
47

Type I and Type II error
● Type I error: incorrectly reject a true null hypothesis
– alpha
● Type II error: incorrectly accept a false null hypothesis
– beta
– Power: 1 - beta
48

Z-score reference table
alpha

One-sided test

Two-sided test

.10

1.28

1.65

.05

1.65

1.96

.01

2.33

2.58
49

Z-score for proportions (e.g. conversion)

Más contenido relacionado

La actualidad más candente

Practical Introduction to A/B Testing
Practical Introduction to A/B TestingPractical Introduction to A/B Testing
Practical Introduction to A/B TestingAlex Alwan
 
Intro to A/B Testing by Ever's Senior Product Manager
Intro to A/B Testing by Ever's Senior Product ManagerIntro to A/B Testing by Ever's Senior Product Manager
Intro to A/B Testing by Ever's Senior Product ManagerProduct School
 
A/B Testing Framework Design
A/B Testing Framework DesignA/B Testing Framework Design
A/B Testing Framework DesignPatrick McKenzie
 
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PM
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PMControlled Experimentation aka A/B Testing for PMs by Tinder Sr PM
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PMProduct School
 
A/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'tsA/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'tsRamkumar Ravichandran
 
The Scientific Method of Experimentation by Google PM
The Scientific Method of Experimentation by Google PMThe Scientific Method of Experimentation by Google PM
The Scientific Method of Experimentation by Google PMProduct School
 
A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation WrangleConf
 
Experimentation Platform at Netflix
Experimentation Platform at NetflixExperimentation Platform at Netflix
Experimentation Platform at NetflixSteve Urban
 
Lean startup, customer development, and the business model canvas
Lean startup, customer development, and the business model canvasLean startup, customer development, and the business model canvas
Lean startup, customer development, and the business model canvasgistinitiative
 
A/B Testing with Yammer's Product Manager
A/B Testing with Yammer's Product ManagerA/B Testing with Yammer's Product Manager
A/B Testing with Yammer's Product ManagerProduct School
 
A Playbook for Achieving Product-Market Fit
A Playbook for Achieving Product-Market FitA Playbook for Achieving Product-Market Fit
A Playbook for Achieving Product-Market FitLean Startup Co.
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyChris Johnson
 
From Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyFrom Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyChris Johnson
 
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들Minho Lee
 
Measuring What Matters in Your Product by Amazon Product Leader.pdf
Measuring What Matters in Your Product by Amazon Product Leader.pdfMeasuring What Matters in Your Product by Amazon Product Leader.pdf
Measuring What Matters in Your Product by Amazon Product Leader.pdfProduct School
 
Machine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyMachine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyChing-Wei Chen
 
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.jsNetflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.jsChris Saint-Amant
 
Aatif Awan, Head of Growth LinkedIn - Growth Hacking is Dead. Long Live Growth.
Aatif Awan, Head of Growth LinkedIn - Growth Hacking is Dead. Long Live Growth. Aatif Awan, Head of Growth LinkedIn - Growth Hacking is Dead. Long Live Growth.
Aatif Awan, Head of Growth LinkedIn - Growth Hacking is Dead. Long Live Growth. Traction Conf
 
Music Recommendations at Scale with Spark
Music Recommendations at Scale with SparkMusic Recommendations at Scale with Spark
Music Recommendations at Scale with SparkChris Johnson
 
Product Development with Spotify's Product Manager
 Product Development with Spotify's Product Manager Product Development with Spotify's Product Manager
Product Development with Spotify's Product ManagerProduct School
 

La actualidad más candente (20)

Practical Introduction to A/B Testing
Practical Introduction to A/B TestingPractical Introduction to A/B Testing
Practical Introduction to A/B Testing
 
Intro to A/B Testing by Ever's Senior Product Manager
Intro to A/B Testing by Ever's Senior Product ManagerIntro to A/B Testing by Ever's Senior Product Manager
Intro to A/B Testing by Ever's Senior Product Manager
 
A/B Testing Framework Design
A/B Testing Framework DesignA/B Testing Framework Design
A/B Testing Framework Design
 
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PM
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PMControlled Experimentation aka A/B Testing for PMs by Tinder Sr PM
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PM
 
A/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'tsA/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'ts
 
The Scientific Method of Experimentation by Google PM
The Scientific Method of Experimentation by Google PMThe Scientific Method of Experimentation by Google PM
The Scientific Method of Experimentation by Google PM
 
A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation
 
Experimentation Platform at Netflix
Experimentation Platform at NetflixExperimentation Platform at Netflix
Experimentation Platform at Netflix
 
Lean startup, customer development, and the business model canvas
Lean startup, customer development, and the business model canvasLean startup, customer development, and the business model canvas
Lean startup, customer development, and the business model canvas
 
A/B Testing with Yammer's Product Manager
A/B Testing with Yammer's Product ManagerA/B Testing with Yammer's Product Manager
A/B Testing with Yammer's Product Manager
 
A Playbook for Achieving Product-Market Fit
A Playbook for Achieving Product-Market FitA Playbook for Achieving Product-Market Fit
A Playbook for Achieving Product-Market Fit
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
 
From Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyFrom Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover Weekly
 
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
 
Measuring What Matters in Your Product by Amazon Product Leader.pdf
Measuring What Matters in Your Product by Amazon Product Leader.pdfMeasuring What Matters in Your Product by Amazon Product Leader.pdf
Measuring What Matters in Your Product by Amazon Product Leader.pdf
 
Machine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyMachine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at Spotify
 
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.jsNetflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
 
Aatif Awan, Head of Growth LinkedIn - Growth Hacking is Dead. Long Live Growth.
Aatif Awan, Head of Growth LinkedIn - Growth Hacking is Dead. Long Live Growth. Aatif Awan, Head of Growth LinkedIn - Growth Hacking is Dead. Long Live Growth.
Aatif Awan, Head of Growth LinkedIn - Growth Hacking is Dead. Long Live Growth.
 
Music Recommendations at Scale with Spark
Music Recommendations at Scale with SparkMusic Recommendations at Scale with Spark
Music Recommendations at Scale with Spark
 
Product Development with Spotify's Product Manager
 Product Development with Spotify's Product Manager Product Development with Spotify's Product Manager
Product Development with Spotify's Product Manager
 

Destacado

Making Better Mistakes Tomorrow
Making Better Mistakes TomorrowMaking Better Mistakes Tomorrow
Making Better Mistakes TomorrowDanielle Jabin
 
Measuring team performance at spotify slideshare
Measuring team performance at spotify slideshareMeasuring team performance at spotify slideshare
Measuring team performance at spotify slideshareDanielle Jabin
 
Africa DevOps Day 2015
Africa DevOps Day 2015Africa DevOps Day 2015
Africa DevOps Day 2015Danielle Jabin
 
The math-behind-ab-testing
The math-behind-ab-testingThe math-behind-ab-testing
The math-behind-ab-testingAmit Sawhney
 
Managing Experiment at Spotify
Managing Experiment at SpotifyManaging Experiment at Spotify
Managing Experiment at SpotifyAli Sarrafi
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At SpotifyAdam Kawa
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyChris Johnson
 
Explore and Evaluate - A different intro to research and data
Explore and Evaluate - A different intro to research and dataExplore and Evaluate - A different intro to research and data
Explore and Evaluate - A different intro to research and dataBen Dressler
 
Being a Data Driven Business
Being a Data Driven Business Being a Data Driven Business
Being a Data Driven Business Ali Sarrafi
 
How spotify makes product
How spotify makes productHow spotify makes product
How spotify makes productAli Sarrafi
 
Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved Peter Antman
 
An Intro to Learning Organization
An Intro to Learning OrganizationAn Intro to Learning Organization
An Intro to Learning OrganizationHakan Cuzdan
 
How data drives spotify
How data drives spotifyHow data drives spotify
How data drives spotifyAli Sarrafi
 
A/B Testing for Lean Startups
A/B Testing for Lean StartupsA/B Testing for Lean Startups
A/B Testing for Lean StartupsPete Mauro
 
Fluent at agile - agile sverige 2014
Fluent at agile - agile sverige 2014Fluent at agile - agile sverige 2014
Fluent at agile - agile sverige 2014Peter Antman
 
A/B Testing: You Might be Driving in the Wrong Direction
A/B Testing: You Might be Driving in the Wrong DirectionA/B Testing: You Might be Driving in the Wrong Direction
A/B Testing: You Might be Driving in the Wrong DirectionKissmetrics on SlideShare
 
온라인 서비스 개선을 데이터 활용법 - 김진영 (How We Use Data)
온라인 서비스 개선을 데이터 활용법  - 김진영 (How We Use Data)온라인 서비스 개선을 데이터 활용법  - 김진영 (How We Use Data)
온라인 서비스 개선을 데이터 활용법 - 김진영 (How We Use Data)Jin Young Kim
 
Product Owner presentation for Spotify
Product Owner presentation for SpotifyProduct Owner presentation for Spotify
Product Owner presentation for Spotifypdicorpo
 
eMetrics London - The AB Testing Hype Cycle
eMetrics London - The AB Testing Hype CycleeMetrics London - The AB Testing Hype Cycle
eMetrics London - The AB Testing Hype CycleCraig Sullivan
 

Destacado (20)

Making Better Mistakes Tomorrow
Making Better Mistakes TomorrowMaking Better Mistakes Tomorrow
Making Better Mistakes Tomorrow
 
Measuring team performance at spotify slideshare
Measuring team performance at spotify slideshareMeasuring team performance at spotify slideshare
Measuring team performance at spotify slideshare
 
Data at Spotify
Data at SpotifyData at Spotify
Data at Spotify
 
Africa DevOps Day 2015
Africa DevOps Day 2015Africa DevOps Day 2015
Africa DevOps Day 2015
 
The math-behind-ab-testing
The math-behind-ab-testingThe math-behind-ab-testing
The math-behind-ab-testing
 
Managing Experiment at Spotify
Managing Experiment at SpotifyManaging Experiment at Spotify
Managing Experiment at Spotify
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at Spotify
 
Explore and Evaluate - A different intro to research and data
Explore and Evaluate - A different intro to research and dataExplore and Evaluate - A different intro to research and data
Explore and Evaluate - A different intro to research and data
 
Being a Data Driven Business
Being a Data Driven Business Being a Data Driven Business
Being a Data Driven Business
 
How spotify makes product
How spotify makes productHow spotify makes product
How spotify makes product
 
Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved
 
An Intro to Learning Organization
An Intro to Learning OrganizationAn Intro to Learning Organization
An Intro to Learning Organization
 
How data drives spotify
How data drives spotifyHow data drives spotify
How data drives spotify
 
A/B Testing for Lean Startups
A/B Testing for Lean StartupsA/B Testing for Lean Startups
A/B Testing for Lean Startups
 
Fluent at agile - agile sverige 2014
Fluent at agile - agile sverige 2014Fluent at agile - agile sverige 2014
Fluent at agile - agile sverige 2014
 
A/B Testing: You Might be Driving in the Wrong Direction
A/B Testing: You Might be Driving in the Wrong DirectionA/B Testing: You Might be Driving in the Wrong Direction
A/B Testing: You Might be Driving in the Wrong Direction
 
온라인 서비스 개선을 데이터 활용법 - 김진영 (How We Use Data)
온라인 서비스 개선을 데이터 활용법  - 김진영 (How We Use Data)온라인 서비스 개선을 데이터 활용법  - 김진영 (How We Use Data)
온라인 서비스 개선을 데이터 활용법 - 김진영 (How We Use Data)
 
Product Owner presentation for Spotify
Product Owner presentation for SpotifyProduct Owner presentation for Spotify
Product Owner presentation for Spotify
 
eMetrics London - The AB Testing Hype Cycle
eMetrics London - The AB Testing Hype CycleeMetrics London - The AB Testing Hype Cycle
eMetrics London - The AB Testing Hype Cycle
 

Similar a A/B Testing Pitfalls and Lessons Learned at Spotify

Quantitative Aptitude Test (QAT)- Tips & Tricks
Quantitative Aptitude Test (QAT)- Tips & TricksQuantitative Aptitude Test (QAT)- Tips & Tricks
Quantitative Aptitude Test (QAT)- Tips & Tricksshwetavashishtha
 
Quantitative Aptitude Test (QAT)-Tips & Tricks
Quantitative Aptitude Test (QAT)-Tips & TricksQuantitative Aptitude Test (QAT)-Tips & Tricks
Quantitative Aptitude Test (QAT)-Tips & Trickscocubes_learningcalendar
 
Uji liliefors
Uji lilieforsUji liliefors
Uji lilieforsjojun
 
WEEK 6 – HOMEWORK 6 LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docx
WEEK 6 – HOMEWORK 6  LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docxWEEK 6 – HOMEWORK 6  LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docx
WEEK 6 – HOMEWORK 6 LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docxcockekeshia
 
Design of Experiments
Design of Experiments Design of Experiments
Design of Experiments Furk Kruf
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionRupak Roy
 
Tips & tricks for Quantitative Aptitude
Tips & tricks for Quantitative AptitudeTips & tricks for Quantitative Aptitude
Tips & tricks for Quantitative AptitudeAmber Bhaumik
 
CrashCourse_0622
CrashCourse_0622CrashCourse_0622
CrashCourse_0622Dexen Xi
 
Statistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference LondonStatistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference LondonTom Capper
 
Conversion Conference Berlin
Conversion Conference BerlinConversion Conference Berlin
Conversion Conference BerlinTom Capper
 
6.7 Percent Equations
6.7 Percent Equations6.7 Percent Equations
6.7 Percent EquationsJessca Lundin
 
Lecture_Wk08.pdf
Lecture_Wk08.pdfLecture_Wk08.pdf
Lecture_Wk08.pdfNiel89
 
Memorization of Various Calculator shortcuts
Memorization of Various Calculator shortcutsMemorization of Various Calculator shortcuts
Memorization of Various Calculator shortcutsPrincessNorberte
 
Week8 Live Lecture for Final Exam
Week8 Live Lecture for Final ExamWeek8 Live Lecture for Final Exam
Week8 Live Lecture for Final ExamBrent Heard
 

Similar a A/B Testing Pitfalls and Lessons Learned at Spotify (20)

Quantitative Aptitude Test (QAT)- Tips & Tricks
Quantitative Aptitude Test (QAT)- Tips & TricksQuantitative Aptitude Test (QAT)- Tips & Tricks
Quantitative Aptitude Test (QAT)- Tips & Tricks
 
Quantitative Aptitude Test (QAT)-Tips & Tricks
Quantitative Aptitude Test (QAT)-Tips & TricksQuantitative Aptitude Test (QAT)-Tips & Tricks
Quantitative Aptitude Test (QAT)-Tips & Tricks
 
Uji liliefors
Uji lilieforsUji liliefors
Uji liliefors
 
WEEK 6 – HOMEWORK 6 LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docx
WEEK 6 – HOMEWORK 6  LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docxWEEK 6 – HOMEWORK 6  LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docx
WEEK 6 – HOMEWORK 6 LANE CHAPTERS, 11, 12, AND 13; ILLOWSKY CHAP.docx
 
Quant tips edited
Quant tips editedQuant tips edited
Quant tips edited
 
7. the t distribution
7. the t distribution7. the t distribution
7. the t distribution
 
Design of Experiments
Design of Experiments Design of Experiments
Design of Experiments
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
MATHS
MATHSMATHS
MATHS
 
Tips & tricks for Quantitative Aptitude
Tips & tricks for Quantitative AptitudeTips & tricks for Quantitative Aptitude
Tips & tricks for Quantitative Aptitude
 
CrashCourse_0622
CrashCourse_0622CrashCourse_0622
CrashCourse_0622
 
Statistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference LondonStatistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference London
 
01_SLR_final (1).pptx
01_SLR_final (1).pptx01_SLR_final (1).pptx
01_SLR_final (1).pptx
 
Conversion Conference Berlin
Conversion Conference BerlinConversion Conference Berlin
Conversion Conference Berlin
 
6.7 Percent Equations
6.7 Percent Equations6.7 Percent Equations
6.7 Percent Equations
 
Lecture_Wk08.pdf
Lecture_Wk08.pdfLecture_Wk08.pdf
Lecture_Wk08.pdf
 
Memorization of Various Calculator shortcuts
Memorization of Various Calculator shortcutsMemorization of Various Calculator shortcuts
Memorization of Various Calculator shortcuts
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Week8 Live Lecture for Final Exam
Week8 Live Lecture for Final ExamWeek8 Live Lecture for Final Exam
Week8 Live Lecture for Final Exam
 

Último

It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Dave Litwiller
 
Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1kcpayne
 
Falcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investorsFalcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investorsFalcon Invoice Discounting
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxAndy Lambert
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfAdmir Softic
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...amitlee9823
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsP&CO
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMANIlamathiKannappan
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...Aggregage
 
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...lizamodels9
 
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with CultureSeta Wicaksana
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageMatteo Carbone
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxWorkforce Group
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataExhibitors Data
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentationuneakwhite
 

Último (20)

It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1
 
Falcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investorsFalcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investors
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptx
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMAN
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
 
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
 
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with Culture
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptx
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors Data
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentation
 

A/B Testing Pitfalls and Lessons Learned at Spotify

  • 1. A/B Testing: Avoiding Common Pitfalls Danielle Jabin March 5, 2014
  • 2. 1 Make all the world’s music available instantly to everyone, wherever and whenever they want it
  • 3. 2 As of March 5, 2014
  • 4. 3 Over 24 million active users As of March 5, 2014
  • 5. 4 Access to more than 20 million songs As of March 5, 2014
  • 6. 5
  • 7. 6 But can we make it even easier?
  • 8. 7 We can try… …with A/B testing!
  • 11. Pitfall #1: Not limiting your error rate
  • 13. 12 What if I flip a coin 100 times and get 51 heads?
  • 14. 13 What if I flip a coin 100 times and get 5 heads?
  • 15. 14
  • 16. 15 The likelihood of obtaining a certain value under a given distribution is measured by its p-value
  • 17. 16 If there is a low likelihood that a change is due to chance alone, we call our results statistically significant
  • 18. 17 What if I flip a coin 100 times and get 5 heads?
  • 19. 18 Statistical significance is measured by alpha ● alpha levels of 5% and 1% are most commonly used – Alternatively: P(significant) = .05 or .01
  • 20. 19 Each alpha has a corresponding Z-score alpha Z-score (two-sided test) .10 1.65 .05 1.96 .01 2.58
  • 21. 20 The Z-score tells us how far a particular value is from the mean (and what the corresponding likelihood is)
  • 23. 22 Compute the Z-score at the end of the test
  • 24. 23 Standard deviation (σ) tells us how spread out the numbers are
  • 25. 24
  • 26. 25 To lock in error rates before you start, fix your sample size
  • 27. 26 What should my sample size be? ● To lock in error rates before you start a test, fix your sample size Represents the desired power (typically .84 for 80% power). Sample size in each group (assumes equal sized groups) 2s (Z b + Za /2 ) n= 2 difference Represents the desired 2 Standard deviation of the outcome variable Source: www.stanford.edu/~kcobb/hrp259/lecture11.ppt 2 Effect Size (the difference in means) level of statistical significance (typically 1.96).
  • 28. 27 Recap: running an A/B test ● Compute your sample size – Using alpha, beta, standard deviation of your metric, and effect size ● Run your test! But stop once you’ve reached the fixed sample size stopping point ● Compute your z-score and compare it with the z-score for the chosen alpha level
  • 32. Pitfall #2: Stopping your test before the fixed sample size stopping point
  • 33. 32 Sample size for varying alpha levels ● With σ = 10, difference in means = 1 Two-sided test alpha = .10, beta = .80 1230 alpha = .05, beta = .80 1568 alpha = .01, beta = .80 2339
  • 34. 33 Let’s see some numbers ● 1,000 experiments with 200,000 fake participants divided randomly into two groups both receiving the exact same version, A, with a 3% conversion rate 90% significance reached 95% significance reached 99% significance reached Source: destack.home.xs4all.nl/projects/significance/ Stop at first point of significance 654 of 1,000 Ended as significant 427 of 1,000 49 of 1,000 146 of 1,000 14 of 1,000 100 of 1,000
  • 35. 34 Remedies ● Don’t peek ● Okay, maybe you can peek, but don’t stop or make a decision before you reach the fixed sample size stopping point ● Sequential sampling
  • 37. Pitfall #3: Making multiple comparisons in one test
  • 38. 37 A test can be one of two things: significant or not significant ● P(significant) + P(not significant) = 1 ● Let’s take an alpha of .05 – P(significant) = .05 – P(not significant) = 1 – P(significant) = 1 - .05 = .95
  • 39. 38 What about for two comparisons? ● P(at least 1 significant) = 1 - P(none of the 2 are significant) ● P(none of the 2 are significant) = P(not significant)*P(not significant) = .95*.95 = .9025 ● P(at least 1 significant) = 1 - .9025 = .0975
  • 40. 39 What about for two comparisons? ●That’s almost 2x (1.95x, to be precise) your .05 significance rate!
  • 41. 40 And it just gets worse… P(at least 1 signifcant) An increase of… 5 variations 1 – (1-.05)^5 = .23 4.6x 10 variations 1 – (1-.05)^10 = .40 8x 20 variations 1 – (1-.05)^20 = .64 12.8x
  • 42. 41 How can we remedy this? ●Bonferroni correction – Divide P(significant), your alpha, by the number of variations you are testing, n – alpha/n becomes the new level of statistical significance
  • 43. 42 So what about two comparisons now? ● Our new P(significant) = .05/2 = .025 ● Our new P(not significant) = 1 - .025 = .975 ● P(at least 1 significant) = 1 - P(none of the 2 are significant) ● P(none of the 2 are significant) = P(not significant)*P(not significant) = .975*.975 = .951 ● P(at least 1 significant) = 1 - .951 = .0499
  • 44. 43 P(significant) stays under .05  Corrected alpha P(at least 1 signifcant) 5 variations .05/5 = .01 1 – (1-.01)^5 = .049 10 variations .05/10 = .005 1 – (1-.005)^10 = .049 20 variations .05/20 = .0025 1 – (1-.0025)^20 = .049
  • 47. 46 A/B test steps: 1. Decide what to test 2. Determine a metric to test 3. Formulate your hypothesis 1. Select an effect size threshold: what change of the metric would make a rollout worthwhile? 4. Calculate sample size (your stopping point) 1. Decide your Type I (alpha) and Type 2 (beta) error levels and the corresponding zscores 2. Determine the standard deviation of your metric 5. Run your test! But stop once you’ve reached the fixed sample size stopping point 6. Compute your z-score and compare it with the z-score for your chosen alpha level
  • 48. 47 Type I and Type II error ● Type I error: incorrectly reject a true null hypothesis – alpha ● Type II error: incorrectly accept a false null hypothesis – beta – Power: 1 - beta
  • 49. 48 Z-score reference table alpha One-sided test Two-sided test .10 1.28 1.65 .05 1.65 1.96 .01 2.33 2.58
  • 50. 49 Z-score for proportions (e.g. conversion)