SlideShare una empresa de Scribd logo
1 de 20
A/B testing
Shlomo Lahav
The problem

Measuring the effect of multiple alternatives
on the performance over a given population.

2
Performance

A list of objective measurements

3
Possible solutions

• A model that describe the results and
evaluates the marginal effect of the
alternatives
• Test the alternatives side by side while all
the rest is equal

4
Example

• the problem: Testing two different layouts
of a web page (A and B)
•
•
•
•

Population: visitors/visits
Performance: conversion rate
Alternatives: two different layouts
Objective: the find the better layout and
asses the performance difference

5
What does it mean all the rest being equal

• Fairness: for every member in the
population, the probability to be allocated
to A is the same.
• For each member, any other decisions is
independent with the test allocation (A/B).
• Observations are independent

6
Population: Visitor vs. visit
Population
Visitor

Visitor

Visit

Measurement
Visit conversion
rate
Lifetime
conversions per
visitor
Visit conversion
rate

Issues
Independency is
violated

A visitor may be
exposed to both A
and B (in different
visits)

7
Errors

• When we compare a test alternative to the
control alternative
• False Positive – Calling the test to be the
winner by mistake
• False Negative – calling the control to be
the winner by mistake

8
When do we end the test

• After a predefined period/observations.
• When the difference is significant

9
What does it mean all the rest being equal

• Fairness: for every member in the
population, the probability to be allocated
to A is the same.
• For each member, any other decisions is
independent with the test allocation (A/B).
• Observations are independent

10
Example

• We want to test two alternatives and
select the better one.
• The results are: CR(A)=9.21%,
CR(B)=11.93%. The win of B is statistical
significant (p-value<5%).
• We need to estimate the gain of B vs. A.
• Is our estimate of 2.72% a fair estimate?

11
Results
p-value

Rate

Actual

A

B

Gain B
over A

10.00%

11.00%

1.00%

B wins

5%

92.5%

9.21%

11.93%

2.72%

A wins

5%

7.5%

13.71%

7.61%

-6.10%

B wins

1%

98.5%

9.59%

11.43%

1.84%

A wins

1%

1.5%

14.94%

7.05%

-7.89%

12
Selection bias

• An AB test is conducted between A1,
A2,…,An
• After the test is completed, we select Ak.
• Should we expect Ak to perform as it did
during the test?
• Does the test outcome (the rank of k)
affects our expectation?

13
What else can go wrong?

• Independency is not maintained (traffic,
changes etc.)
• The fairness is handled by random
allocation. This can be biased due chance
• The significance level is usually higher
than planned (continues evaluation) which
results in a higher false positive.

14
How to control the traffic split?

• By percentage or round robin?
• Can we change the split?

15
Another example

• Need to test two design layouts in multiple
location, while each location has a
different conversion rate.
• Different populations – use lifts and
accumulate the lifts.
• How do we calculate the lift: A over B or B
over A?

16
lifts
A

B
8%
10%

10%
8%

Average

Lift B over A Lift A over B
25%
-20%
-20%
25%
2.5%
-2.5%

17
Change in split - Simpson ‘s paradox

New

Returning

A

B

CR(A)

CR(B)

CR(A)

6%

15%

CR(B)

5%

14%

Weekday

80%

20%

90%

10%

7.80%

6.80%

Weekend

10%

90%

50%

50%

14.10%

13.10%

10.05%

12.05%

total

18
Can we remove alternatives

• Start with 3 alternatives (equal split)
• Remove one

start

0

0

0.5

0.5

1

1

modify

0

0

0

1

1

1

19
Multiple tests

• Is it valid to run multiple AB tests
simultaneously?

20

Más contenido relacionado

Destacado

Map machinery
Map machinery Map machinery
Map machinery cvt2go
 
游戏运营(第二讲)
游戏运营(第二讲)游戏运营(第二讲)
游戏运营(第二讲)www.emean.com
 
De Vastgoedmanager Als Spin In Het Web Bij Het Verduurzamen Van Vastgoed
De Vastgoedmanager Als Spin In Het Web Bij Het Verduurzamen Van VastgoedDe Vastgoedmanager Als Spin In Het Web Bij Het Verduurzamen Van Vastgoed
De Vastgoedmanager Als Spin In Het Web Bij Het Verduurzamen Van VastgoedNetherlands Enterprise Agency (RVO.nl)
 
Internet Filtering In South Korea
Internet Filtering In South KoreaInternet Filtering In South Korea
Internet Filtering In South Koreamichroeder
 
Creating Compelling Videos2
Creating Compelling Videos2Creating Compelling Videos2
Creating Compelling Videos2GregTuke
 
How to deploy rpd and catalog without enterprise manger
How to deploy rpd and catalog without enterprise mangerHow to deploy rpd and catalog without enterprise manger
How to deploy rpd and catalog without enterprise mangerRavi Kumar Lanke
 
Suntronic Solsys Resist Dielectric Pv Products Feb2011
Suntronic Solsys Resist Dielectric Pv Products Feb2011Suntronic Solsys Resist Dielectric Pv Products Feb2011
Suntronic Solsys Resist Dielectric Pv Products Feb2011stu99dwn
 
Official Final CSP slideshow
Official Final CSP slideshowOfficial Final CSP slideshow
Official Final CSP slideshowlangevinm14
 
Innovation In Medical Care
Innovation In Medical CareInnovation In Medical Care
Innovation In Medical Caresirlkm
 
Recent Developments in Compensation Analysis
Recent Developments in Compensation AnalysisRecent Developments in Compensation Analysis
Recent Developments in Compensation AnalysisThomas Econometrics
 
W.K. Kellogg Foundation: Workforce Composition
W.K. Kellogg Foundation: Workforce CompositionW.K. Kellogg Foundation: Workforce Composition
W.K. Kellogg Foundation: Workforce CompositionW.K. Kellogg Foundation
 

Destacado (20)

2016 07 efw sap functional short
2016 07 efw sap functional short2016 07 efw sap functional short
2016 07 efw sap functional short
 
Map machinery
Map machinery Map machinery
Map machinery
 
MakerFaire Tokyo 2014, Yantra 3.0 Nepal, Aki Party in Shenzhen
MakerFaire Tokyo 2014, Yantra 3.0 Nepal, Aki Party in ShenzhenMakerFaire Tokyo 2014, Yantra 3.0 Nepal, Aki Party in Shenzhen
MakerFaire Tokyo 2014, Yantra 3.0 Nepal, Aki Party in Shenzhen
 
游戏运营(第二讲)
游戏运营(第二讲)游戏运营(第二讲)
游戏运营(第二讲)
 
De Vastgoedmanager Als Spin In Het Web Bij Het Verduurzamen Van Vastgoed
De Vastgoedmanager Als Spin In Het Web Bij Het Verduurzamen Van VastgoedDe Vastgoedmanager Als Spin In Het Web Bij Het Verduurzamen Van Vastgoed
De Vastgoedmanager Als Spin In Het Web Bij Het Verduurzamen Van Vastgoed
 
Internet Filtering In South Korea
Internet Filtering In South KoreaInternet Filtering In South Korea
Internet Filtering In South Korea
 
Creating Compelling Videos2
Creating Compelling Videos2Creating Compelling Videos2
Creating Compelling Videos2
 
How to deploy rpd and catalog without enterprise manger
How to deploy rpd and catalog without enterprise mangerHow to deploy rpd and catalog without enterprise manger
How to deploy rpd and catalog without enterprise manger
 
Tevii
TeviiTevii
Tevii
 
Hilversum Media Campus John Leek 160414
Hilversum Media Campus   John Leek  160414Hilversum Media Campus   John Leek  160414
Hilversum Media Campus John Leek 160414
 
Corporate Websites
Corporate WebsitesCorporate Websites
Corporate Websites
 
Suntronic Solsys Resist Dielectric Pv Products Feb2011
Suntronic Solsys Resist Dielectric Pv Products Feb2011Suntronic Solsys Resist Dielectric Pv Products Feb2011
Suntronic Solsys Resist Dielectric Pv Products Feb2011
 
Sonicview
SonicviewSonicview
Sonicview
 
Onderzoek CO2 reductiepotentieel Duurzaam Inkopen kantoorgebouwen
Onderzoek CO2 reductiepotentieel Duurzaam Inkopen kantoorgebouwenOnderzoek CO2 reductiepotentieel Duurzaam Inkopen kantoorgebouwen
Onderzoek CO2 reductiepotentieel Duurzaam Inkopen kantoorgebouwen
 
Official Final CSP slideshow
Official Final CSP slideshowOfficial Final CSP slideshow
Official Final CSP slideshow
 
Antech
AntechAntech
Antech
 
How To Use Green View With On Par
How To Use Green View With On ParHow To Use Green View With On Par
How To Use Green View With On Par
 
Innovation In Medical Care
Innovation In Medical CareInnovation In Medical Care
Innovation In Medical Care
 
Recent Developments in Compensation Analysis
Recent Developments in Compensation AnalysisRecent Developments in Compensation Analysis
Recent Developments in Compensation Analysis
 
W.K. Kellogg Foundation: Workforce Composition
W.K. Kellogg Foundation: Workforce CompositionW.K. Kellogg Foundation: Workforce Composition
W.K. Kellogg Foundation: Workforce Composition
 

Similar a How can A/B testing go wrong?

Multiple regression to findout drivers of online satisfaction
Multiple regression to findout drivers of  online satisfactionMultiple regression to findout drivers of  online satisfaction
Multiple regression to findout drivers of online satisfactionSomdeep Sen
 
A Introduction To A-B Test
A Introduction To A-B TestA Introduction To A-B Test
A Introduction To A-B Testyihucha
 
Conversion Conference Berlin
Conversion Conference BerlinConversion Conference Berlin
Conversion Conference BerlinTom Capper
 
Statistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference LondonStatistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference LondonTom Capper
 
A B testing introduction.pptx
A B testing introduction.pptxA B testing introduction.pptx
A B testing introduction.pptxAhmed Khaled
 
Data-Driven Decision Making by Expedia Sr PM
Data-Driven Decision Making by Expedia Sr PMData-Driven Decision Making by Expedia Sr PM
Data-Driven Decision Making by Expedia Sr PMProduct School
 
Res 342 final exam
Res 342 final examRes 342 final exam
Res 342 final exammn8676766
 
Res 342 final exam
Res 342 final examRes 342 final exam
Res 342 final examnbvyut9878
 
You should test that: How to use A/B testing in product design
You should test that: How to use A/B testing in product designYou should test that: How to use A/B testing in product design
You should test that: How to use A/B testing in product designKelley Howell
 
Optimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with StatisticsOptimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with StatisticsOptimizely
 
RES 342 Final Exam
RES 342 Final Exam RES 342 Final Exam
RES 342 Final Exam heightly
 
RES 342 Final Exam Answers
RES 342 Final Exam AnswersRES 342 Final Exam Answers
RES 342 Final Exam Answersheightly
 
Res 342 Final
Res 342 FinalRes 342 Final
Res 342 Finalheightly
 
How to know the impact of changes on audience reach - User and partner confer...
How to know the impact of changes on audience reach - User and partner confer...How to know the impact of changes on audience reach - User and partner confer...
How to know the impact of changes on audience reach - User and partner confer...AT Internet
 
Podium_20190115TRB
Podium_20190115TRBPodium_20190115TRB
Podium_20190115TRBXiaoyu Guo
 
Webinar: Common Mistakes in A/B Testing
Webinar: Common Mistakes in A/B TestingWebinar: Common Mistakes in A/B Testing
Webinar: Common Mistakes in A/B TestingOptimizely
 
Drippler's A/B test library
Drippler's A/B test libraryDrippler's A/B test library
Drippler's A/B test libraryNir Hartmann
 

Similar a How can A/B testing go wrong? (20)

The Finishing Line
The Finishing LineThe Finishing Line
The Finishing Line
 
Multiple regression to findout drivers of online satisfaction
Multiple regression to findout drivers of  online satisfactionMultiple regression to findout drivers of  online satisfaction
Multiple regression to findout drivers of online satisfaction
 
A Introduction To A-B Test
A Introduction To A-B TestA Introduction To A-B Test
A Introduction To A-B Test
 
Conversion Conference Berlin
Conversion Conference BerlinConversion Conference Berlin
Conversion Conference Berlin
 
Statistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference LondonStatistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference London
 
A B testing introduction.pptx
A B testing introduction.pptxA B testing introduction.pptx
A B testing introduction.pptx
 
Data-Driven Decision Making by Expedia Sr PM
Data-Driven Decision Making by Expedia Sr PMData-Driven Decision Making by Expedia Sr PM
Data-Driven Decision Making by Expedia Sr PM
 
Res 342 final exam
Res 342 final examRes 342 final exam
Res 342 final exam
 
Res 342 final exam
Res 342 final examRes 342 final exam
Res 342 final exam
 
You should test that: How to use A/B testing in product design
You should test that: How to use A/B testing in product designYou should test that: How to use A/B testing in product design
You should test that: How to use A/B testing in product design
 
Optimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with StatisticsOptimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with Statistics
 
Ab testing
Ab testingAb testing
Ab testing
 
RES 342 Final Exam
RES 342 Final Exam RES 342 Final Exam
RES 342 Final Exam
 
RES 342 Final Exam Answers
RES 342 Final Exam AnswersRES 342 Final Exam Answers
RES 342 Final Exam Answers
 
Res 342 Final
Res 342 FinalRes 342 Final
Res 342 Final
 
How to know the impact of changes on audience reach - User and partner confer...
How to know the impact of changes on audience reach - User and partner confer...How to know the impact of changes on audience reach - User and partner confer...
How to know the impact of changes on audience reach - User and partner confer...
 
Podium_20190115TRB
Podium_20190115TRBPodium_20190115TRB
Podium_20190115TRB
 
Webinar: Common Mistakes in A/B Testing
Webinar: Common Mistakes in A/B TestingWebinar: Common Mistakes in A/B Testing
Webinar: Common Mistakes in A/B Testing
 
Drippler's A/B test library
Drippler's A/B test libraryDrippler's A/B test library
Drippler's A/B test library
 
Significance Tests
Significance TestsSignificance Tests
Significance Tests
 

Más de LivePerson

Microservices on top of kafka
Microservices on top of kafkaMicroservices on top of kafka
Microservices on top of kafkaLivePerson
 
Graph QL Introduction
Graph QL IntroductionGraph QL Introduction
Graph QL IntroductionLivePerson
 
Kubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platformKubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platformLivePerson
 
Growing into a proactive Data Platform
Growing into a proactive Data PlatformGrowing into a proactive Data Platform
Growing into a proactive Data PlatformLivePerson
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die() LivePerson
 
Resilience from Theory to Practice
Resilience from Theory to PracticeResilience from Theory to Practice
Resilience from Theory to PracticeLivePerson
 
System Revolution- How We Did It
System Revolution- How We Did It System Revolution- How We Did It
System Revolution- How We Did It LivePerson
 
Liveperson DLD 2015
Liveperson DLD 2015 Liveperson DLD 2015
Liveperson DLD 2015 LivePerson
 
Http 2: Should I care?
Http 2: Should I care?Http 2: Should I care?
Http 2: Should I care?LivePerson
 
Mobile app real-time content modifications using websockets
Mobile app real-time content modifications using websocketsMobile app real-time content modifications using websockets
Mobile app real-time content modifications using websocketsLivePerson
 
Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices LivePerson
 
Functional programming with Java 8
Functional programming with Java 8Functional programming with Java 8
Functional programming with Java 8LivePerson
 
Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]LivePerson
 
Apache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePersonApache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePersonLivePerson
 
Data compression in Modern Application
Data compression in Modern ApplicationData compression in Modern Application
Data compression in Modern ApplicationLivePerson
 
Support Office Hour Webinar - LivePerson API
Support Office Hour Webinar - LivePerson API Support Office Hour Webinar - LivePerson API
Support Office Hour Webinar - LivePerson API LivePerson
 
SIP - Introduction to SIP Protocol
SIP - Introduction to SIP ProtocolSIP - Introduction to SIP Protocol
SIP - Introduction to SIP ProtocolLivePerson
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceLivePerson
 
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...LivePerson
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceLivePerson
 

Más de LivePerson (20)

Microservices on top of kafka
Microservices on top of kafkaMicroservices on top of kafka
Microservices on top of kafka
 
Graph QL Introduction
Graph QL IntroductionGraph QL Introduction
Graph QL Introduction
 
Kubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platformKubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platform
 
Growing into a proactive Data Platform
Growing into a proactive Data PlatformGrowing into a proactive Data Platform
Growing into a proactive Data Platform
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die()
 
Resilience from Theory to Practice
Resilience from Theory to PracticeResilience from Theory to Practice
Resilience from Theory to Practice
 
System Revolution- How We Did It
System Revolution- How We Did It System Revolution- How We Did It
System Revolution- How We Did It
 
Liveperson DLD 2015
Liveperson DLD 2015 Liveperson DLD 2015
Liveperson DLD 2015
 
Http 2: Should I care?
Http 2: Should I care?Http 2: Should I care?
Http 2: Should I care?
 
Mobile app real-time content modifications using websockets
Mobile app real-time content modifications using websocketsMobile app real-time content modifications using websockets
Mobile app real-time content modifications using websockets
 
Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices
 
Functional programming with Java 8
Functional programming with Java 8Functional programming with Java 8
Functional programming with Java 8
 
Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]
 
Apache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePersonApache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePerson
 
Data compression in Modern Application
Data compression in Modern ApplicationData compression in Modern Application
Data compression in Modern Application
 
Support Office Hour Webinar - LivePerson API
Support Office Hour Webinar - LivePerson API Support Office Hour Webinar - LivePerson API
Support Office Hour Webinar - LivePerson API
 
SIP - Introduction to SIP Protocol
SIP - Introduction to SIP ProtocolSIP - Introduction to SIP Protocol
SIP - Introduction to SIP Protocol
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduce
 
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 

Último

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Último (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

How can A/B testing go wrong?

  • 2. The problem Measuring the effect of multiple alternatives on the performance over a given population. 2
  • 3. Performance A list of objective measurements 3
  • 4. Possible solutions • A model that describe the results and evaluates the marginal effect of the alternatives • Test the alternatives side by side while all the rest is equal 4
  • 5. Example • the problem: Testing two different layouts of a web page (A and B) • • • • Population: visitors/visits Performance: conversion rate Alternatives: two different layouts Objective: the find the better layout and asses the performance difference 5
  • 6. What does it mean all the rest being equal • Fairness: for every member in the population, the probability to be allocated to A is the same. • For each member, any other decisions is independent with the test allocation (A/B). • Observations are independent 6
  • 7. Population: Visitor vs. visit Population Visitor Visitor Visit Measurement Visit conversion rate Lifetime conversions per visitor Visit conversion rate Issues Independency is violated A visitor may be exposed to both A and B (in different visits) 7
  • 8. Errors • When we compare a test alternative to the control alternative • False Positive – Calling the test to be the winner by mistake • False Negative – calling the control to be the winner by mistake 8
  • 9. When do we end the test • After a predefined period/observations. • When the difference is significant 9
  • 10. What does it mean all the rest being equal • Fairness: for every member in the population, the probability to be allocated to A is the same. • For each member, any other decisions is independent with the test allocation (A/B). • Observations are independent 10
  • 11. Example • We want to test two alternatives and select the better one. • The results are: CR(A)=9.21%, CR(B)=11.93%. The win of B is statistical significant (p-value<5%). • We need to estimate the gain of B vs. A. • Is our estimate of 2.72% a fair estimate? 11
  • 12. Results p-value Rate Actual A B Gain B over A 10.00% 11.00% 1.00% B wins 5% 92.5% 9.21% 11.93% 2.72% A wins 5% 7.5% 13.71% 7.61% -6.10% B wins 1% 98.5% 9.59% 11.43% 1.84% A wins 1% 1.5% 14.94% 7.05% -7.89% 12
  • 13. Selection bias • An AB test is conducted between A1, A2,…,An • After the test is completed, we select Ak. • Should we expect Ak to perform as it did during the test? • Does the test outcome (the rank of k) affects our expectation? 13
  • 14. What else can go wrong? • Independency is not maintained (traffic, changes etc.) • The fairness is handled by random allocation. This can be biased due chance • The significance level is usually higher than planned (continues evaluation) which results in a higher false positive. 14
  • 15. How to control the traffic split? • By percentage or round robin? • Can we change the split? 15
  • 16. Another example • Need to test two design layouts in multiple location, while each location has a different conversion rate. • Different populations – use lifts and accumulate the lifts. • How do we calculate the lift: A over B or B over A? 16
  • 17. lifts A B 8% 10% 10% 8% Average Lift B over A Lift A over B 25% -20% -20% 25% 2.5% -2.5% 17
  • 18. Change in split - Simpson ‘s paradox New Returning A B CR(A) CR(B) CR(A) 6% 15% CR(B) 5% 14% Weekday 80% 20% 90% 10% 7.80% 6.80% Weekend 10% 90% 50% 50% 14.10% 13.10% 10.05% 12.05% total 18
  • 19. Can we remove alternatives • Start with 3 alternatives (equal split) • Remove one start 0 0 0.5 0.5 1 1 modify 0 0 0 1 1 1 19
  • 20. Multiple tests • Is it valid to run multiple AB tests simultaneously? 20