2. Outline
● About Me
● Start with an example
● Awareness:
o Consumer, Business, Academic
● Learnings
o Experiments & Science, Psychology
● And a fresh, detailed example
3. About Me
20 years developing internet experiences
at eBay, Microsoft, smaller players
Studied in Cognitive Science/Psychology
left PhD program in ‘95, went back for Masters
Example Learning: Clicking “Page 2” vs Next
indicates user intent.
4. Making Images Larger at eBay
● Images on search result page (“SERP”) increased from 160px to 220px
● Consistent results across tests in US, UK, and DE for millions of users
● Traditional metrics of search clickthrough, # of products viewed, etc. had
typically negative outcomes
● Actual outcome was +10s of millions of incremental revenue
5. E-Commerce Experimentation
● Metrics
o Conversion Rate
o Total Revenue (Overall Evaluation Criteria, Kohavi)
Conversion Rate * Average Order Value
● Challenges
o Outliers - rare high dollar transactions valuable, but
not well distributed.
o Short term vs long term value
o Durability of findings & effect sizes
6. Understanding Behavior
Regressing time to first click treats the
new result presentation as a sort of
repeated measures design.
+200 msec evaluation per result.
seconds
13. Research Publications
● Methodology publications
dominate
● Kohavi (MSFT) started
publishing 2007
● CHI Workshop 2014
● Google 2010 /
Facebook 2014
14. Information Retrieval Concentration
Much of ongoing A/B work in published
research driven by search
● Search is hard to evaluate
● Algorithms are highly amenable to A/B
o Transparent to user
o Cheap to permute
● Conferences: ACM WWW, SIGIR, KDD,
CIKM
19. Close but no Cigar
A/B in business is not science:
● Trading velocity for accuracy is ok in some
cases
● Creating a culture of testing is challenging
o Requires a common basic acumen at interpretation
o User Experience & Design professionals often
under-skilled
20. Iterative Learning
● Low cost of experiments promotes iteration
● Lack of control of online experiments
promotes discovery
● Triangulation across lab-based studies,
survey methods, and analytic baselines key
More: Designing and Deploying Online Field Experiments. Eytan Bakshy, Dean Eckles, Michael
Bernstein. WWW 2014.
21. Interactions are Rare?
Common practice is to run massively parallel
experiments
● Lightly segmented across user experiences
(e.g. search, registration, checkout)
● Interactions are also informative!
o I prefer small factorial (2x2, 2x3, etc)
Overlapping Experiment Infrastructure: More, Better, Faster Experimentation. Proceedings 16th
Conference on Knowledge Discovery and Data Mining, ACM, Washington, DC (2010), pp. 17-26
http://research.google.com/pubs/pub36500.html
23. Design is Hard, Intuition Flawed
Industry success rate of A/B tests, while not
cleanly reported, is less than ⅓.
Causes: Technical issues, learning
experiments, incorrect intuitions on functionality
and design.
24. Change is Challenging
Practically, user change resistance is one of
the biggest problems for successful internet
companies evaluating new experiences.
Learnability and avoiding pro-active
interference are key areas for research.
25. Micro-Economic Theory
Key Concepts
● Cost of Action
o Perceived Cost
o Predicted Cost
o Actual Cost
● Utility
o Prediction of Utility
o Actual Utility
● Orienting Reference: Azzopardi, L. (2014). Modeling Interaction with Economic Models of
Search. Proceedings of the 37th International ACM SIGIR conference on Conference on
Research and Development in Information Retrieval.
27. Searchers go deep at RB
Aside: Single
User
visualization
is very useful
technique
combined
with large
scale
analytics.
28. Faster Search at Redbubble
● 2nd and
subsequent
searches from 4+
seconds to < 1
o By using “partial
page updates” vs
full page reloads
(e.g. AJAX)
Results, two-sample t-test
Treated = users who did a search.
About 300k users per condition, 200k users treated.
1 of several ongoing tests.
30. Micro-economic Explanation?
Users click more on
the last position (or
row). Why? Why oh
why?
The Ski JumpHypothesis: People are making a locally rational decision, or
satisficing, between the last set of results and the next button.
32. Useful Links
Videos
● ACM Chi Tutorial: https://www.youtube.com/watch?v=jQDnBIeoN3E
● Planout (Facebook’s EXP Platform): https://www.youtube.com/watch?v=Ayd4sqPH2DE
● EXP Platform at Microsoft, Kohavi et al. http://www.exp-platform.com/Pages/default.aspx
Articles
● Wired Magazine 2012, The A/B Test: Inside the Technology That’s Changing the Rules of Business
● Obama Multivariate Button & Video test, https://blog.optimizely.com/2010/11/29/how-obama-raised-60-million-
by-running-a-simple-experiment/
Research
● Facebook’s “Experimental evidence of massive-scale emotional contagion through social networks”,
http://www.pnas.org/content/111/24/8788.full.pd
● Micro-economic Behavioral Explanations, Citations of:
o Azzopardi, L. (2014). Modeling Interaction with Economic Models of Search,Proceedings of the 37th International ACM SIGIR conference on
Conference on Research and Development in Information Retrieval, 2014.