Video available at: http://youtu.be/Q1T5J0KXUwY
At this very moment, Indeed is running more than one hundred A/B experiments. In previous @IndeedEng talks, we have discussed how we use A/B testing to develop better products.
In this tech talk, software engineer Matt Schemmel and product manager Tom Bergman describe Proctor, the system we developed to define and manage all of these experiments. They explain how we use Proctor to target users using data-driven rules, adjust experiments on-the-fly, and ensure clean results for multi-variate tests. Over time, Proctor has evolved from a system designed for managing experiments to one that manages overall system behavior through dynamic "feature toggle" functionality. Matt and Tom also share lessons we have learned from years of experimenting at web scale.
Matt Schemmel is a Senior Software Engineer working primarily on our Resume products.
Tom Bergman is a Product Manager currently working on our Aggregation systems. He previously helped evolve many of Indeed's data analysis tools, and also helped us launch and grow our sites in Japan, Korea, and China.
7. A/B Testing: Definition
A/B testing is an experimental
methodology comparing at least
two variants, a control group A
and test group B, in a controlled
experiment
8. A/B Testing Key Points
Test and Control Groups should be:
1. Unbiased
2. Independent
3. Representative
50. Selecting a Test Bucket
Good science requires good sampling:
● Independent
● Unbiased
Good user experience does, too:
● Fast
● Consistent
51. Round robin assignment
Assign each subsequent visitor to the next
bucket.
● Requires global state for "next bucket"
● Requires state for assigned buckets
✘Fast
~ Consistent
✓Independent ✓Unbiased
52. Randomized Assignment
At small scale, you might need round-robin
to ensure equal sample sizes.
At large scale, randomized assignment is
uniform enough.
? Fast
? Consistent
✓Independent ✓Unbiased
53. Roll the dice as needed
Select a bucket at random at the point of
execution.
Consistent
✓Fast
✘
✓Independent ✓Unbiased
54. Roll Once and Cache in a Cookie
● Single-domain, Single-device
● N cookies: Hard to evolve
● One cookie: Fragile to edit
● Size scales with # experiments
~ Fast
~ Consistent
✓Independent ✓Unbiased
55. Roll Once and Cache in Session
● Consistent only to length of session
● Tied to one server / data-center
● Many apps don’t use sessions
Consistent
~ Fast
✘
✓Independent ✓Unbiased
56. Roll Dice and Cache in DB
● DB hit on every request
● More infrastructure
✘Fast
~ Consistent
✓Independent ✓Unbiased
57. We can do better
Flaws stem from the need to record selected
buckets.
What if we didn't?
58. Don’t Record. Recalculate.
1. Assign each user a unique ID
2. Map that ID to a bucket
3. Store the ID, not the assignments
? Fast
? Consistent
? Independent ? Unbiased
59. Simple Mapping: Mod N
id mod N=> bucket
Doesn’t work:
● Should provide uniform distribution;
mod N assumes it.
● Limited bucket distributions
89. Division of Responsibilities
Define the Experiment
Apply the Experiment
proctor data
(each product)
Proctor Library
Test
Definition
Test
Specification
91. Product Test Specification lists active tests
References into the global pool:
"tests": [{
"buttonBgcolorTest": {
"buckets": {
"gray": 0,
"blue": 1
}
}
}]
92. Apply the Experiment
On every request…
1. Select Groups
2. Render the Response
3. Log the Action
93. Determining Buckets in Code
On every request…
1. Collect identifiers
2. Select buckets for opted-in tests
94. Collect identifiers for all ID Types
// Product code
String cookie = getTrackingCookie(request);
String accountId = getAccountIdOrNull(request);
// Proctor preparation
Identifiers identifiers = Identifiers.of(
TestType.USER, cookie,
TestType.ACCOUNT, accountId
);
96. Apply the Experiment
On every request…
1. Select Groups
2. Render the Response
3. Log the Action
97. Choose behavior for selected bucket
int bgColorBucket;
/* … */
// Choose a background color for templates
if (bgColorBucket == 1) {
// Test
model.put("buttonBgColor", "#00f");
} else {
// Control group
model.put("buttonBgColor", "#ccc");
}
98. ProctorResult exposes buckets… verbosely
// Proctor assignments
ProctorResult assignments =
proctor.determineBuckets(identifiers);
// Get selected bucket for this user
int bgColorBucket = assignments
// Map<String, TestBucket>: All tests
.getBuckets()
// TestBucket: This assignment
.get("buttonBgColorTst") // TestBucket
// int: Enumerated ID
.getValue();
99. "Redundant" names in test spec…
"buttonBgColorTest": {
"buckets": {
"gray": 0,
"blue": 1
}
}
100. … are used to generate helper methods
// Choose a background color for templates
ResumeSearchGroups groups =
new ResumeSearchGroups(assignments);
// Enumerated value by test name
groups.getButtonBgColorTstValue();
// Boolean accessors for each test & bucket
groups.isButtonBgColorTstGray();
groups.isButtonBgColorTstBlue();
101. Helper designed for use in UI layer
This immutable bean is trivial to:
● Read from JSP/JSF
● Read from Templates
○ Freemarker, Velocity, Closure, etc
● Serialize as JSON
102. Apply the Experiment
On every request…
1. Select Groups
2. Render the Response
3. Log the Action
103. Logging Bucket Assignments
Proctor just selects the buckets.
When and how you log are up to you:
● On related events only
● On every event
107. Publication is also via Source Control
Individual test changes pushed to a named
branch:
/trunk
/branches/production
108. Overwriting Tests on a Named Branch
Not required to use proctor, but beneficial:
● Same features for free
History, Diff, ACL
● No merging
● Easy roll-back, roll-forward
113. Segmentation through Test Rules
● Test definition allows one optional rule
● A rule is simply a boolean expression
● If the rule passes, the user is assigned to a test
bucket
Rules are written in Unified EL
114. Simple Things are Simple
● No deployment needed
● Changes live within minutes
{
"description": "Button colors",
"rule": "country == ‘CA’"
"buckets": […]
}
115. Primitive and rich data types
"userAgent.phone || userAgent.tablet"
"userAgent.supports.html5"
"userAgent.supports.geolocation"
"userAgent.supports.fileUpload"
116. Commons EL is Easily Extended
JSTL Standard Functions
"rule":
"fn:endsWith(
account.email, '@indeed.com')"
Custom code
"rule":
"proctor:contains(
['US', 'CA'], country)"
118. What context is available?
So far we've seen:
● country
● language
● userAgent
● account
What's the full list of available context variables?
119. Context Defined in Test Specification
● Test spec declares available context variables
● This is a contract to provide values at runtime
{
"tests": […],
"providedContext": {
"country": "String",
"language": "String"
"userAgent":
"com.indeed.web.UserAgent"
}
}
120. Provided While Determining Buckets
Also generated from test specification:
private ResumeSearchProctor proctor;
// Proctor assignments
ProctorResult assignments =
proctor.determineBuckets(
identifiers,
country,
language,
userAgent);
122. Even Tiny Changes Need Deploys
// Choose a background color for templates
if (bgColorBucket == 1) {
// Test
model.put("btnBgcolor", "#00f");
} else {
// Control group
model.put("btnBgcolor", "#ccc");
}
123. Some Tests Just Vary Data
Many tests have no behavioral change:
● CSS Colors
● Display Text
● Algorithm Weights
124. Payloads
● Values added for each bucket in a test
● Proctor verifies payloads are "all or none"
Control: Gray
Test: Blue
125. Payloads
● Values added for each bucket in a test
● Proctor verifies payloads are "all or none"
Control: Gray
"#ccc"
Test: Blue
"#00f"
126. Part of Test Definition
● No deployment needed
● Changes live within minutes
"buckets": [{
"id": 0, "name": "gray",
"description": "Control group",
"payload": {
"stringValue": "#ccc"
}
}, …]
127. Declared in Project Test Specification
● Type definition only
● Must match test definition
"buttonBgColorTst": {
"buckets": […],
"payload": {
"type": "stringValue"
}
}
128. Cleaner Code, Only Data Deploy
// Choose a background color
model.put(
"btnBgcolor",
groups.getButtonBgColorTstPayload()
);
131. Cross-Product Tests
Even more ways to coordinate tests
● Tracking parameters on links, requests
● Service response metadata
● Different service calls
Proctor offers an interesting alternative
132. Two products can share test groups
As long as both products
● Share the test’s identifier
● Provide the context variables it uses
Deterministic selection guarantees
identical bucket assignment.
143. Allocations
Each test definition
● has one or more allocations
Each allocation
● has a rule and ranges totaling 1.0
● except the last, which has no rule.
144. Allocation Rules
● Use Unified EL, same as test rules.
● Use the same context variables as test rules.
● Choose the first matching allocation.
171. Description:
Group 0: control - Job alert label: Save Alert (control)
Group 1: labelSubscribe - Job alert label: Subscribe
Group 2: labelSignUp - Job alert label: Sign up
Group 3: labelGetJobs - Job alert label: Get jobs
Group 4: labelSendMeNewJobs - Job alert label: Send me new jobs
Group 5: labelActivate - Job alert label: Activate
Group 6: labelSave - Job alert label: Save
172. History:
jack @ 2013-03-12 (r203267): Promoting jasxjabtnlbltst (trunk r203089) to
production JASX-11365: jasxjabtnlbltst disabled
ketan @ 2012-12-11 (r190675): merged r190418: JASX-10663: Stop
jasxjabtnlbltst in all languages except nl
will @ 2012-11-29 (r188801): merged r187452: JASX-10457: exclude US from
jasxjabtnlbltst
ketan @ 2012-10-25 (r182881): merged r182688: JASX-10234 - Adding new
langauges to job alert button label test
ketan @ 2012-10-25 (r182876): merged r181938: JASX-10234 - Adding test
definition and allocations for job alert button label test
178. http://go.indeed.com/demo
Also a reference implementation
Running on heroku -- feel free to clone!
http://indeedeng-hello-proctor.herokuapp.com
Source:
github.com/indeedeng/proctor-demo