The document discusses using queueing theory to optimize the performance of a real-time crowdsourcing platform. It models the relationship between task arrivals, worker response times, and crowd size as an M/M/c/c queue. This allows calculating the probability of all workers being busy and tasks needing to wait, as well as the cost of maintaining different sized retainer pools. The goal is to optimize the tradeoff between recruiting too many workers and dropping too many tasks.
Analytic Methods for Optimizing Realtime Crowdsourcing
1. Analytic Methods for
Optimizing Realtime
Crowdsourcing
Michael Bernstein, David Karger, Rob Miller, and Joel Brandt
MIT CSAIL and Adobe Systems
MIT HUMAN-COMPUTER INTERACTION
Tuesday, May 8, 12
2. Use queueing theory to understand
and optimize performance of a
paid, realtime crowdsourcing platform.
• Relationship between crowd size and
response time
• Algorithm for optimizing crowd size & cost
vs. response time
• Improvements to the platform: 500
millisecond feedback
MIT HUMAN-COMPUTER INTERACTION
Tuesday, May 8, 12
3. Realtime Crowds
Answering
visual questions
for blind users
[Bigham et al. 2010]
Tuesday, May 8, 12
4. Realtime Crowds
Answering
visual questions
for blind users
[Bigham et al. 2010]
Crowd-assisted
photography
[Bernstein et al. 2011]
Tuesday, May 8, 12
5. Realtime Crowds
Answering
visual questions
for blind users
[Bigham et al. 2010]
Crowd-assisted
photography
[Bernstein et al. 2011]
Tuesday, May 8, 12
6. Paid Crowdsourcing
Pay small amounts of money for short tasks
Amazon Mechanical Turk: Roughly five million tasks
completed per year at 1-5¢ each [Ipeirotis 2010]
Label an image Transcribe short audio clip
Requester: Matt C. Requester: Gordon L.
Reward: $0.01 Reward: $0.04
Tuesday, May 8, 12
7. Retainer Recruitment
Workers sign up in advance
½¢ per minute to remain on call
Alert when the task is ready
Wait at most:
5 minutes
Task:
Click on the verbs He leapt the fence and
dashed toward the door.
in the paragraph
[Bernstein et al. 2011]
Tuesday, May 8, 12
8. Retainer Recruitment
Workers sign up in advance
½¢ per minute to remain on call
Alert when the task is ready
Wait at most: alert()
5 minutes Start now! OK
Task:
Click on the verbs He leapt the fence and
dashed toward the door.
in the paragraph
[Bernstein et al. 2011]
Tuesday, May 8, 12
9. Retainer Recruitment
Workers sign up in advance
½¢ per minute to remain on call
Alert when the task is ready
50% of workers return in two seconds, and
75% of workers return in three seconds.
[Bernstein et al. 2011]
Tuesday, May 8, 12
10. State of the Literature
Realtime Crowds
• Recruit crowds in two seconds, execute
traditional tasks (e.g., votes) in five seconds
• Maintain continuous control
of remote interfaces
• Opportunities in deployable, intelligently
reactive software
[Bigham et al. 2010, Bernstein et al. 2011, Lasecki et al. 2011]
Tuesday, May 8, 12
11. The Challenge
Running Out of Retainer Workers
Tuesday, May 8, 12
12. The Challenge
Running Out of Retainer Workers
Tuesday, May 8, 12
13. The Challenge
Running Out of Retainer Workers
Tuesday, May 8, 12
14. The Challenge
Running Out of Retainer Workers
Tuesday, May 8, 12
15. The Challenge
Running Out of Retainer Workers
Tuesday, May 8, 12
16. The Challenge
Running Out of Retainer Workers
Tuesday, May 8, 12
17. The Challenge
Running Out of Retainer Workers
Loss
Non-realtime response
Tuesday, May 8, 12
18. The Tradeoff
Missed tasks, Extra retainer workers,
non-realtime results extra cost
Tuesday, May 8, 12
19. The Goal
Optimize the tradeoff between
recruiting too many workers and
dropping too many tasks.
Tuesday, May 8, 12
20. The Goal
Optimize the tradeoff between
recruiting too many workers and
dropping too many tasks.
Budget-optimal crowdsourcing is possible
in non-realtime scenarios
[Dai, Mausam and Weld 2010; Kamar, Hacker and Horvitz 2012;
Karger, Oh, and Shah 2011]
Tuesday, May 8, 12
21. 1 Model
Outline
2 Optimization
3 Platform
Tuesday, May 8, 12
22. Queueing Theory
• Formal framework for stochastic arrival and
service processes
• Basic idea: random task arrivals and random
processing times for workers
• Quantify how long tasks will need to wait
in line
Model
Outline
Optimize
Queueing theory for completion times: [Ipeirotis 2010] Platform
Tuesday, May 8, 12
23. Queueing Theory
M/M/1 queue
Server
M Markovian (Poisson process) task arrivals, rate
M Markovian (Poisson process) server work time, rate µ
1 One server
Tuesday, May 8, 12
24. Queueing Theory
M/M/1 queue
Server
M Markovian (Poisson process) task arrivals, rate
M Markovian (Poisson process) server work time, rate µ
1 One server
Tuesday, May 8, 12
25. Queueing Theory
M/M/1 queue
Task Server
M Markovian (Poisson process) task arrivals, rate
M Markovian (Poisson process) server work time, rate µ
1 One server
Tuesday, May 8, 12
26. Queueing Theory
M/M/1 queue
Task Server
M Markovian (Poisson process) task arrivals, rate
M Markovian (Poisson process) server work time, rate µ
1 One server
Tuesday, May 8, 12
27. Queueing Theory
M/M/1 queue
Server
M Markovian (Poisson process) task arrivals, rate
M Markovian (Poisson process) server work time, rate µ
1 One server
Tuesday, May 8, 12
28. Queueing Theory
M/M/1 queue
Server
M Markovian (Poisson process) task arrivals, rate
M Markovian (Poisson process) server work time, rate µ
1 One server
Tuesday, May 8, 12
29. Queueing Theory
M/M/1 queue
Server
M Markovian (Poisson process) task arrivals, rate
M Markovian (Poisson process) server work time, rate µ
1 One server
Tuesday, May 8, 12
30. Queueing Theory
M/M/1 queue
Server
M Markovian (Poisson process) task arrivals, rate
M Markovian (Poisson process) server work time, rate µ
1 One server
Tuesday, May 8, 12
31. Queueing Theory
M/M/1 queue
Server
M Markovian (Poisson process) task arrivals, rate
M Markovian (Poisson process) server work time, rate µ
1 One server
Tuesday, May 8, 12
32. Queueing Theory
M/M/1 queue
Server
M Markovian (Poisson process) task arrivals, rate
M Markovian (Poisson process) server work time, rate µ
1 One server
Tuesday, May 8, 12
33. Queueing Theory
M/M/1 queue
Server
M Markovian (Poisson process) task arrivals, rate
M Markovian (Poisson process) server work time, rate µ
1 One server
Tuesday, May 8, 12
34. Queueing Theory
M/M/1 queue
Server
M Markovian (Poisson process) task arrivals, rate
M Markovian (Poisson process) server work time, rate µ
1 One server
Tuesday, May 8, 12
35. Queueing Theory
M/M/c/c queue
M Markovian (Poisson process) task arrivals, rate
M Markovian (Poisson process) server work time, rate µ
c c servers
c c max tasks in servers and queue
Tuesday, May 8, 12
36. Queueing Theory
M/M/c/c queue
M Markovian (Poisson process) task arrivals, rate
M Markovian (Poisson process) server work time, rate µ
c c servers
c c max tasks in servers and queue
Tuesday, May 8, 12
37. Queueing Theory
M/M/c/c queue
M Markovian (Poisson process) task arrivals, rate
M Markovian (Poisson process) server work time, rate µ
c c servers
c c max tasks in servers and queue
Tuesday, May 8, 12
38. Queueing Theory
M/M/c/c queue
M Markovian (Poisson process) task arrivals, rate
M Markovian (Poisson process) server work time, rate µ
c c servers
c c max tasks in servers and queue
Tuesday, May 8, 12
39. Queueing Theory
M/M/c/c queue
M Markovian (Poisson process) task arrivals, rate
M Markovian (Poisson process) server work time, rate µ
c c servers
c c max tasks in servers and queue
Tuesday, May 8, 12
40. Queueing Theory
M/M/c/c queue
M Markovian (Poisson process) task arrivals, rate
M Markovian (Poisson process) server work time, rate µ
c c servers
c c max tasks in servers and queue
Tuesday, May 8, 12
41. Queueing Theory
M/M/c/c queue
M Markovian (Poisson process) task arrivals, rate
M Markovian (Poisson process) server work time, rate µ
c c servers
c c max tasks in servers and queue
Tuesday, May 8, 12
42. Queueing Theory
M/M/c/c queue
All servers busy
M Markovian (Poisson process) task arrivals, rate
M Markovian (Poisson process) server work time, rate µ
c c servers
c c max tasks in servers and queue
Tuesday, May 8, 12
43. Queueing Theory
M/M/c/c queue
M Markovian (Poisson process) task arrivals, rate
M Markovian (Poisson process) server work time, rate µ
c c servers
c c max tasks in servers and queue
Tuesday, May 8, 12
45. Retainer Queue
M/M/c/c queue
Crowd
c workers, no waiting queue
Task arrivals: Poisson process, rate
Worker recruitment time: Poisson process, rate µ
Tuesday, May 8, 12
46. Retainer Queue
M/M/c/c queue
Crowd
c workers, no waiting queue
Task arrivals: Poisson process, rate
Worker recruitment time: Poisson process, rate µ
Tuesday, May 8, 12
47. Retainer Queue
M/M/c/c queue
Crowd
c workers, no waiting queue
Task arrivals: Poisson process, rate
Worker recruitment time: Poisson process, rate µ
Tuesday, May 8, 12
48. Retainer Queue
M/M/c/c queue
Crowd
c workers, no waiting queue
Task arrivals: Poisson process, rate
Worker recruitment time: Poisson process, rate µ
Tuesday, May 8, 12
49. Retainer Queue
M/M/c/c queue
Crowd
c workers, no waiting queue
Task arrivals: Poisson process, rate
Worker recruitment time: Poisson process, rate µ
Tuesday, May 8, 12
50. Retainer Queue
M/M/c/c queue
Crowd
c workers, no waiting queue
Task arrivals: Poisson process, rate
Worker recruitment time: Poisson process, rate µ
Tuesday, May 8, 12
51. Retainer Queue
Loss
Crowd
c workers, no waiting queue
Task arrivals: Poisson process, rate
Worker recruitment time: Poisson process, rate µ
Tuesday, May 8, 12
52. Retainer Queue
Loss
Crowd
c workers, no waiting queue
Task arrivals: Poisson process, rate
Worker recruitment time: Poisson process, rate µ
Tuesday, May 8, 12
53. Retainer Queue
Loss
Crowd
All servers busy
c workers, no waiting queue
Task arrivals: Poisson process, rate
Worker recruitment time: Poisson process, rate µ
Tuesday, May 8, 12
54. Retainer Queue
Loss
Crowd
All servers busy
Tuesday, May 8, 12
55. Retainer Queue
Loss
Crowd
All servers busy
Tuesday, May 8, 12
56. Retainer Queue
Loss
P (i servers busy) = ⇡(i)
Crowd
P (all servers busy) = ⇡(c)
All servers busy
Tuesday, May 8, 12
57. Retainer Queue
Loss
P (i servers busy) = ⇡(i)
Crowd
P (all servers busy) = ⇡(c)
All servers busy
Tuesday, May 8, 12
58. Retainer Queue
Loss
P (i servers busy) = ⇡(i)
Crowd
P (all servers busy) = ⇡(c)
All servers busy
Tuesday, May 8, 12
59. Retainer Queue
Loss
P (i servers busy) = ⇡(i)
Crowd
P (all servers busy) = ⇡(c)
All servers busy
Tuesday, May 8, 12
60. Model Predictions
1. Probability that all workers are busy: ⇡(c)
→ the task has to wait for expected time 1/µ
2. Cost of keeping a retainer pool of size c
→ cost depends on number of idle servers
Tuesday, May 8, 12
61. Probability of Loss
• Draw on Erlang’s Loss Formula from
queueing theory: probability of a rejected
request in an M/M/c/c queue
• Let ⇢ be the traffic intensity:
⇢ = /µ
(roughly, the number of new tasks that will
arrive in the time it takes to recruit a worker)
Tuesday, May 8, 12
62. Probability of Loss
Erlang’s Loss Formula says:
⇡(c) = P (c servers busy)
c
⇢ /c!
= Pc
i=0 ⇢i /i!
Remarkably, this result makes no assumptions
about the arrival distribution.
Tuesday, May 8, 12
64. Expected
Waiting Time
ers⇥ ⇥ (expected
P (expected recruitment time) recruitment time)
(c servers busy) recruitment time)
y) busy) ⇥ (expected
1 1 1
= ⇡(c) = ⇡(c) = ⇡(c)
µ µ µ
c
c
⇢ /c! ⇢ 1 c
/c! = Pc
1 ⇢ /c! 1
= Pc = Pc ⇢i /i! µ
i=0 ⇢i /i! µ ⇢i /i! µ i=0
i=0
Tuesday, May 8, 12
65. Expected Cost
How much do we pay in steady-state?
Depends on how many workers are usually
waiting on retainer.
Tuesday, May 8, 12
66. Expected Cost
Probability of i busy servers in an M/M/c/c queue
is a more general version of Erlang’s Loss Formula:
i
⇢ /i!
⇡(i) = Pc
i=0 ⇢i /i!
Derive the expected number of busy workers:
E[i] = ⇢[1 ⇡(c)]
Tuesday, May 8, 12
67. Expected Cost
Probability of i busy servers in an M/M/c/c queue
is a more general version of Erlang’s Loss Formula:
i
⇢ /i!
⇡(i) = Pc
i=0 ⇢i /i!
Derive the expected number of busy workers:
E[i] = ⇢[1 ⇡(c)]
Total cost is the number of idle workers:
c ⇢[1 ⇡(c)]
Tuesday, May 8, 12
70. Optimal Retainer Size
• Size of retainer pool is typically the only
value that requesters can manipulate
• Minimize costs by keeping the retainer pool
small while keeping ⇡(c) low
Model
Outline
Optimize
Platform
Tuesday, May 8, 12
71. Optimal Retainer Size
Based on Maximum Miss Probability
Cost vs. probability of waiting
Given a maximum ● ●●●
●● ●
● ●
●
● ●
● ● ●● ● ●●
● ●
●
● ●● ●
● ●
Probability of waiting
●
desired probability ● ● ● ● ●
● ●
● ● ● ●
● ●
1e−04
● ●
of a miss pmax : Traffic intensity ρ
0.1
●
●
●
●
●
●
● ●
0.5
Minimize c subject 1
5
● ●
●
●
to ⇡(c) pmax
●
1e−10
10
●
0 2 4 6 8
●
Expected payments per unit time
●
Tuesday, May 8, 12
72. Optimal Retainer Size
Based on Maximum Miss Probability
Cost vs. probability of waiting
Given a maximum ● ●●●
●● ●
● ●
●
● ●
● ● ●● ● ●●
● ●
●
● ●● ●
● ●
Probability of waiting
●
desired probability ● ● ● ● ●
● ●
● ● ● ●
● ●
1e−04
● ●
of a miss pmax : Traffic intensity ρ
0.1
●
●
●
●
●
●
● ●
0.5
Minimize c subject 1
5
● ●
●
●
to ⇡(c) pmax
●
1e−10
10
●
0 2 4 6 8
●
Expected payments per unit time
●
Tuesday, May 8, 12
73. Optimal Retainer Size
Based on Joint Cost
Combined cost of retainer and missed task
Total cost of task and retainer
500
●
Cost of a missed task
If the “pizza delivery” ●
1
100
10
property holds: we ●
●
100
20
● 1000
can quantify the ●
●
● ● ● ●
● ● ●
cost of loss ●
5
● ●
●
● ●
●
●
●
●
1
2 4 6 8 10
Size of retainer pool
Tuesday, May 8, 12
74. Improving the
Retainer Model
1 Subscriptions
2 Shared Pools
3 Predictive
Recruitment Model
Outline
Optimize
Platform
Tuesday, May 8, 12
75. Retainer
Subscriptions
• Proposal: increase µ by allowing workers to
subscribe to realtime tasks
• Instead of posting to the global task list, the
platform sends a message to subscribers
• Change crowdsourcing from a pull model
to a push model
Tuesday, May 8, 12
76. Global Retainer Pools
• Sharing one global retainer pool across
requesters improves performance
• Intuition: Most workers are padding for
unlikely runs of arrivals)
Time
Task 1
Tuesday, May 8, 12
77. Global Retainer Pools
• Sharing one global retainer pool across
requesters improves performance
• Intuition: Most workers are padding for
unlikely runs of arrivals)
Time
Task 1
Task 2
Tuesday, May 8, 12
78. Global Retainer Pools
• Sharing one global retainer pool across
requesters improves performance
• Intuition: Most workers are padding for
unlikely runs of arrivals)
Time
Combined
Tuesday, May 8, 12
79. Global Retainer Pools
• Sharing one global retainer pool across
requesters improves performance
• Intuition: Most workers are padding for
unlikely runs of arrivals)
Time
Combined
Tuesday, May 8, 12
80. Global Retainer Pools
• Through approximation, individual pools:
p ⇢ c
⇡(c) ⇡ p 2⇡c e (e⇢/c) c
⇢
2⇡c e (e⇢/c)
p across k requesters: c k
• Shared pools
p ⇢
2⇡kc e (e⇢/c) c k
⇢
⇡(c) ⇡ 2⇡kc e (e⇢/c)
• Loss rate declines exponentially with the
number of bundled retainer pools
Tuesday, May 8, 12
81. Global Retainer Pools
• Through approximation, individual pools:
p ⇢ c
⇡(c) ⇡ p 2⇡c e (e⇢/c) c
⇢
2⇡c e (e⇢/c)
p across k requesters: c k
• Shared pools
p ⇢
2⇡kc e (e⇢/c) c k
⇢
⇡(c) ⇡ 2⇡kc e (e⇢/c)
• Loss rate declines exponentially with the
number of bundled retainer pools
Tuesday, May 8, 12
82. Global Retainer Pools
Cost dramatically decreases
as you combine retainers:
k dollars to log(k) dollars
Tuesday, May 8, 12
83. Global Retainer
Routing
• Not every worker in a global retainer pool is
good at every task
• If we assigned each worker to any task they
could do, some tasks would starve
Tuesday, May 8, 12
84. Global Retainer
Routing
• We want to maintain a buffer of workers to
respond to all kinds of tasks
• A linear programming technique can
balance the traffic intensities across all tasks
Tuesday, May 8, 12
85. Precruitment
• Predictive Recruitment: notify workers
before the task arrives
• Recall workers in expectation of having
a task by the time they arrive 2–3
seconds later
Tuesday, May 8, 12
86. Precruitment
Formative Study, N=373 tasks
• 3¢ for 3-minute retainer task: whack-a-mole
• ‘Loading...’ screen for randomly-selected time
[0, 20] seconds after worker returns
• Click on randomly-placed mole
Tuesday, May 8, 12
87. Precruitment
Formative Study, N=373 tasks
• 3¢ for 3-minute retainer task: whack-a-mole
• ‘Loading...’ screen for randomly-selected time
[0, 20] seconds after worker returns
• Click on randomly-placed mole
Tuesday, May 8, 12