The document discusses common mistakes in A/B testing and provides guidance on properly designing, interpreting results, and making decisions from A/B tests. It notes the most common serious errors include not considering multiple testing, choosing the wrong metrics, and improper stopping rules. It emphasizes the importance of considering significance and risk during test design, focusing on actual key performance indicators like profit, and recognizing the different risks between medical testing and business experimentation.
80. Analogy
Question: How likely is it that my
analytics or site are broken?
Non-Answer: We only go a whole day
with no conversions once every 2
months.
82. Interpreting the P-value
Question: How likely is it that this
variation actually does nothing?
Non-Answer: We’d only see a
difference this big 5% of the time.
83. Meanwhile in Industry Tools:
● “Chance to beat baseline”
● “We are 95% certain that the changes
in test “B” will improve your
conversion rate”
98. A Good A/B Test Result:
“10% Uplift, With 95%
Significance”
99. But what about this?
“10% Uplift, With 60%
Significance”
100. Jargon: P-Value
● The chance of a result this extreme if
the null hypothesis is true
● E.g. 0.05 for 95% significance
101. “10% Uplift, With 60% Significance”
● 40% chance of data at least this
extreme if variation functionally
identical
102. “10% Uplift, With 60% Significance”
● 40% chance of data at least this
extreme if variation functionally
identical
● The variation is probably better than
the baseline