This presentation by Anna Marie Clifton, Product Manager at Yammer, covers the important topics of when to use A/B testing, how to implement it and most importantly, how to measure the results.
The content is directed for software engineers who want to transition to product management, MBA's with finance/consulting background who wish to work high-tech companies as product managers and Project Managers, Marketers, and Designers who are seeking opportunities in product management.
17. New Signup — sequence test
1. The rosy side of A/B testing
18. ● 80% of users complete the sign up
New Signup — sequence test
1. The rosy side of A/B testing
19. ● 80% of users complete the sign up
● Release a new flow to all your users
New Signup — sequence test
1. The rosy side of A/B testing
20. ● 80% of users complete the sign up
● Release a new flow to all your users
● Measure again
New Signup — sequence test
1. The rosy side of A/B testing
21. ● 80% of users complete the sign up
● Release a new flow to all your users
● Measure again
● Now only 70% of users complete.
New Signup — sequence test
1. The rosy side of A/B testing
22. ● 80% of users complete the sign up
● Release a new flow to all your users
● Measure again
● Now only 70% of users complete.
New Signup — sequence test
1. The rosy side of A/B testing
23. New Signup — A/B test
1. The rosy side of A/B testing
24. ● 80% of users complete the sign up
New Signup — A/B test
1. The rosy side of A/B testing
25. ● 80% of users complete the sign up
● Release a new flow to 50% of your users — “Treatment”
New Signup — A/B test
1. The rosy side of A/B testing
26. ● 80% of users complete the sign up
● Release a new flow to 50% of your users — “Treatment”
● 70% of users in the treatment complete
New Signup — A/B test
1. The rosy side of A/B testing
27. ● 80% of users complete the sign up
● Release a new flow to 50% of your users — “Treatment”
● 70% of users in the treatment complete
● Meanwhile, only 60% of control users completed
New Signup — A/B test
1. The rosy side of A/B testing
28. ● 80% of users complete the sign up
● Release a new flow to 50% of your users — “Treatment”
● 70% of users in the treatment complete
● Meanwhile, only 60% of control users completed
New Signup — A/B test
1. The rosy side of A/B testing
29. Our roadmap
1. The rosy side of A/B testing
2. The grey side of A/B testing
3. Somewhere in the middle
32. 1. Takes longer to build
Engineering issues:
2. The grey side of A/B testing
33. 1. Takes longer to build
2. Can’t make changes in flight
Engineering issues:
2. The grey side of A/B testing
34. Engineering issues:
2. The grey side of A/B testing
1. Takes longer to build
2. Can’t make changes in flight
3. Confusion about expected experience
35. Engineering issues:
2. The grey side of A/B testing
1. Takes longer to build
2. Can’t make changes in flight
3. Confusion about expected experience
4. Code cleanup isn’t always perfect
38. 1. It can take Forever™
Product issues:
2. The grey side of A/B testing
39. 1. It can take Forever™
2. Need tons of users
Product issues:
2. The grey side of A/B testing
40. 1. It can take Forever™
2. Need tons of users
3. Lots of up front work
Product issues:
2. The grey side of A/B testing
41. 1. It can take Forever™
2. Need tons of users
3. Lots of up front work
4. Lots of after-the-fact work
Product issues:
2. The grey side of A/B testing
42. 1. It can take Forever™
2. Need tons of users
3. Lots of up front work
4. Lots of after-the-fact work
5. Geared toward small optimizations
Product issues:
2. The grey side of A/B testing
43. 1. It can take Forever™
2. Need tons of users
3. Lots of up front work
4. Lots of after-the-fact work
5. Geared toward small optimizations
6. Only useful in single-player mode
Product issues:
2. The grey side of A/B testing
44. 1. It can take Forever™
2. Need tons of users
3. Lots of up front work
4. Lots of after-the-fact work
5. Geared toward small optimizations
6. Only useful in single-player mode
7. Can’t be scientific
Product issues:
2. The grey side of A/B testing
45. 1. It can take Forever™
2. Need tons of users
3. Lots of up front work
4. Lots of after-the-fact work
5. Geared toward small optimizations
6. Only useful in single-player mode
7. Can’t be scientific
Product issues:
2. The grey side of A/B testing
46. Our roadmap
1. The rosy side of A/B testing
2. The grey side of A/B testing
3. Somewhere in the middle
In most talks, people present a nicely packaged presentation on some topic or other. The narrative is nice and clean, and at the end, you’ll understand the exact ideal way of doing something.
They may start with “Here’s an anecdote from when we were doing this poorly.” Or might even share some hiccups along the way. But, in the end, they never leave the audience with any ambiguity. Any grey zone. I think this does everyone a disservice, especially an audience of product managers.
Most of our job is pushing back the boundaries of ambiguity. Finding that line beyond which things don’t make sense, and organizing information at the fringes into coherent narratives so there’s a bigger space of “knowns” than you had before.
Now, of course we can’t really be sure of the things that we “know”… Say 5 out of 5 people we user tested this experience with couldn’t find the entry point for creating new groups. Does that mean it’s too hard to find? … Probably. Almost certainly. But there’s always room for doubt.
The problem is, if you focus on that sliver of doubt, you’ll never build anything because you’ll spend all your time trying to squash that doubt into a smaller and smaller pieces while your competitors are building product based on hunches and best guesses and totally obliterating your company and income.
So it’s your job as a PM to absorb as much of that ambiguity as possible, to shield your team from it so they have a clear path forward.
To that end, let’s put on our rose-colored glasses and take a journey through the world of A/B testing. After that, we’ll take those glasses off and look at some of the less pleasant realities.
A/B, also called split testing, is essentially a statistically valid way to see how good (or bad) your feature idea is.
The core principle is you release two versions of your feature to a random set of users, then you measure what those users do relative to each other.
It’s important to note that these two groups have to have these alternate versions of the feature at the same time. It’s tempting to take a feature (say, a sign up flow) that has some set metrics (like sign up completion rate) that you want to move. To build a new flow and then release that to 100% of your users and measure the new completion rate. That’s called sequential testing— Here’s why that’s not A/B testing and not statistically valid:
Say you had a completion rate of 80% in the old design and in the new experience you see that 70% of users are completing the flow. Womp womp. Looks like a loser of an idea. You lost 10% of your potential signups.
But maybe you didn’t. Maybe it was a holiday in France, or it rained in Ohio, or a competitor launched something great. There are so many external factors to consider, you can’t rely on tests in sequence.
This is the beauty of A/B testing. You can *know* how good your feature is. It’s like ______ {analogy}, and wow does it feel good.
After sorting through opposing opinions from two designers on your team, and trying to navigate tradeoffs between technical debt and velocity, and a thousand other things each week that have no “right” answer, having something definitive in hand feels like magic.
Retention
Throw out early data
Stats on how many users you need
Android project with 31k users
How many users do you need?
What you measure, base rate, what lift to detect, and risk tolerance
Log events
Data can’t tell you Why
It’s the reason we get paid a lot