Breaking Bad - Understanding the Behavior of Crowd Workers in Categorization Microtasks

Breaking Bad - Understanding Behavior of
Crowd Workers
in Categorization Microtasks
Ujwal Gadiraju,
Ricardo Kawase, Patrick Siehndel and Besnik Fetahu
METU NCC, 2nd
September 2015

Outline
● Motivation
● Categorization Tasks
● Analysis & Results
● Conclusions
2

What is the problem?
● Increase in the number of new task requesters on
AMT (1000 per month) [Difallah et al., WWW’15].
○ Not all task requesters are familiar with task task
task-specific settings
○ No tangible guidelines for task design ;
■ task length
■ monetary incentive
■ task completion time

Worker Behavior in Categorization Tasks
● Categorization tasks are one of the most common
types of crowdsourced tasks. [Gadiraju et al., A taxonomy of
Microtasks on the Web, HT’14]
● Experimental Setup:
○ 9 tasks deployed on CrowdFlower
○ Task length : 20, 30, 40 units
○ Monetary Reward : 1 , 2, 3 USD cents

Tasks Design
● Clear instructions and help snippets.
● Workers have to select the most
suitable category in each Set (1-5)
consisting of 10 different categories.
● Category options were manually
tailored to avoid ambiguity.
● Set-1 was made compulsory, Set-2
through Set-5 were optional.
● Tasks were deployed non-
concurrently, and order of units were
randomized within each task.
● Tasks designed to facilitate 100%
accuracy in responses (with an aim to
study worker behavior).

Data Collection
● Responses gathered from 100 workers in
each task ; 900 workers in total.
● We collected 27,000 unit judgments in total.
In 88% of the cases, workers provided
responses for all sets (incl. optional).
● Average Task Completion Time
○ Tasks with length of 20 Units : 11.3 mins

● Tipping Point : The first point (unit-
index) at which a worker provides an
unacceptable response after having
provided at least one acceptable response.
[Gadiraju et al., CHI’2015]
● Beaver Workers : Workers who exert
additional effort by answering optional
questions in order to help task requesters.
Definitions

Consistency of Units within Tasks
● Avg. accuracy of around 90% with little Std. Dev.
● We tolerate 10% incorrect responses from workers, owing
to possible drifts in attention spans / boredom.
● Bad Workers : Workers who answer 10% or more of the
units within a categorization task incorrectly.
● Poor Starters : Workers whose first 2 responses within a
categorization task are incorrect.

Poor Starters, Bad Workers, & Tipping Point

Task Completion Time vs Worker Accuracy

Worker Behavior Within a Task
Key Findings
● A worker’s accuracy decreases through the
course of a task. (optional sets are not
considered).
○ This is more prominent as the task length
increases.
● Workers that exert additional effort project
higher accuracies within tasks.
● The additional effort that workers exert
decreases through the course of a task.
○ This is more prominent as the task length
increases.

Scrutiny of Additional Responses
● % Correct Additional Responses gradually
decreases from Set-1 to Set-5.
● On average, workers skip more optional
sets as they proceed from Set-2 to Set-5.

Workers Breaking Bad
Adjusted Tipping Point (ATP) : Workers that consecutively
respond to at least 10% of the units in a task incorrectly, are said to
have an ATP. The index of the first unit at which this is observed, is
called the ATP of the worker. Such a worker is called a BREAKER.

Conclusions & Future Work
To achieve good quality in categorization
tasks…
● It is better to err on the lower side of monetary
incentives offered.
● Use minimum time required as a filter, but give
ample time for task completion. It is better to err on
the higher side of maximum task completion
time.
● It is better to err on the shorter side of task
length.
● We can gauge worker intentions through the nature
of their responses to optional questions.
● We plan to quantify the limits and these guidelines
in the imminent future.

Contact Details :
gadiraju@l3s.de
http://www.L3S.de
SLIDES: http://www.
slideshare.net/ujwal07/
15

Removal of Ineligible Workers
Ineligible workers : The workers who do
not conform to the priorly stated
prerequisites, belong to this category.
● We found 9 ineligible workers who
used browser-embedded translator
tools in order to participate in the task.
● Ineligible workers were not considered
in the further analysis.

Breaking Bad - Understanding the Behavior of Crowd Workers in Categorization Microtasks

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (7)

Destacado

Destacado (13)

Similar a Breaking Bad - Understanding the Behavior of Crowd Workers in Categorization Microtasks

Similar a Breaking Bad - Understanding the Behavior of Crowd Workers in Categorization Microtasks (20)

Último

Último (20)

Breaking Bad - Understanding the Behavior of Crowd Workers in Categorization Microtasks