Limits of A/B Tests A/B tests don’t give you perfect decisions. No matter what you do, you’re never 100% certain If we’re not careful, winners aren’t really winners Your conversions go up… and then they come back down The Standard Solution Run your test until you hit 95% statistical significance. Go to getdatadriven.com if you need a significance calculator. Martin Goodson’s PDF on poor testing methods: kiss.ly/bad-testing This gives us the best data but not necessarily the best ROI. So how far do we take this? Simulation Time! We modeled several A/B testing strategies. Using Monte Carlo simulations, we tested different strategies over 1 million observations (people). Will Kurt gets full credit for all this. @willkurt 1 Pick the minimal improvement The Scientist: 2 Determine your sample size 3 Determine degree of certainty (95%) 4 Start test but don’t check it early 5 If results aren’t significant, keep control Results for the Scientist: 1 Waits until 80% significance The Reckless Marketer: 2 Calls a winner as soon as 80% gets hit Results for the Reckless Marketer: 1 Waits for 95% significance The Impatient Marketer: 2 Moves on to the next test after 500 people Results for the Impatient Marketer: The Realist 1 Waits for 99% significance 2 Moves on to the next test after 2,000 people Results for the Realist: The Persistent Realist 1 Waits for 99% significance 2 Moves on to the next test after 20,000 people Results for the Persistent Realist: The Blitz Realist 1 Waits for 99% significance 2 Moves on to the next test after 200 people Results for the Blitz Realist: Let’s compare them using the area under the curve. Don’t make decisions at less than 95% significance. You’ll waste all the time you spend testing 1 Be a scientist at 95% We have 3 viable strategies for making this work: 2 Only make changes at 99% 3 Sloppy 95% but make it up in volume 1 Pick the minimal improvement Be a scientist when you have lots of data and resources 2 Determine your sample size 3 Determine degree of certainty (95% 4 Start test but don’t check it early 5 If results aren’t significant, keep control If you don’t have the data or resources to be a scientist, go fast at 99%. And if you still want to play at 95% without being a scientist, never stop testing. How We A/B Test First, get volume to 4000+ people/month. Only make changes at 99% significance. Let the test run at least 1 week before checking results. If not at 99% after two weeks, launch the next test. If the next test isn’t ready, let it keep running while you build the next one. The KISSmetrics A/B Testing Strategy 1 Get to 4,000 people/month for test 2 Only change the control if you reach 99% 3 Check results after 1 week 4 Launch the next test at 2 weeks 5 Let old tests run if you’re still building This strategy isn’t perfect. It’s a balance between good data and speed.