SlideShare una empresa de Scribd logo
1 de 23
Descargar para leer sin conexión
A BRIEF
INTRODUCTION TO
CROWDSOURCED
DATA COLLECTION
ELENA SIMPERL
UNIVERSITY OF SOUTHAMPTON
25-May-14
USEWOD@ESWC2014
1
EXECUTIVE SUMMARY
• Crowdsourcing can help
with research data forensics
• But
• There are things computers do better
than humans  hybrid approaches are
the ultimate solution
• There is crowdsourcing and
crowdsourcing  pick your faves and
mix them
• Human intelligence is a valuable
resource  experiment design is key
2
CROWDSOURCING:
PROBLEM SOLVING VIA
OPEN CALLS
"Simply defined, crowdsourcing represents the act of a
company or institution taking a function once performed by
employees and outsourcing it to an undefined (and generally
large) network of people in the form of an open call. This can
take the form of peer-production (when the job is performed
collaboratively), but is also often undertaken by sole
individuals. The crucial prerequisite is the use of the open
call format and the large network of potential
.“
[Howe, 2006]
25-May-14
3
CROWDSOURCING
COMES IN DIFFERENT
FORMS AND FLAVORS
25-May-14
4
25-May-14
5
DIMENSIONS OF CROWDSOURCING
DIMENSIONS OF CROWDSOURCING
WHAT IS
OUTSOURCED
• Tasks based on
human skills not
easily replicable by
machines
• Visual recognition
• Language
understanding
• Knowledge acquisition
• Basic human
communication
• ...
WHO IS THE CROWD
• Open call (crowd
accessible through a
platform)
• Call may target
specific skills and
expertise
(qualification tests)
• Requester typically
knows less about the
‘workers’ than in other
‘work’ environments
25-May-14
6
See also [Quinn & Bederson, 2012]
USEWOD EXPERIMENT: TASK AND CROWD
WHAT IS
OUTSOURCED
• Annotating research papers with
data set information.
• Alternative representations of the
domain
• Bibliographic reference
• Abstract + title
• Paragraph
• Full paper
• What if the domain is not known in
advance or is infinite?
• Do we know the list of potential
answers?
• Is there only one correct solution to
each atomic task?
• How many people would solve the
same task?
WHO IS THE CROWD
• People who know the
papers or the data sets
• Experts in the (broader )
field
• Casual gamers
• Librarians
• Anyone (knowledgeable
of English, with a
computer/cell phone…)
• Combinations thereof…
25-May-14
7
25-May-14
Tutorial@ISWC2013
CROWDSOURCING AS
‚HUMAN COMPUTATION‘
Outsourcing tasks that machines find difficult to solve
to humans
8
DIMENSIONS OF CROWDSOURCING (2)
HOW IS THE TASK OUTSOURCED
• Explicit vs. implicit participation
• Tasks broken down into smaller units
undertaken in parallel by different people
• Coordination required to handle cases with
more complex workflows
• Partial or independent answers consolidated
and aggregated into complete solution
25-May-14
9
See also [Quinn & Bederson, 2012]
EXAMPLE: CITIZEN SCIENCE
WHAT IS OUTSOURCED
• Object recognition, labeling,
categorization in media content
WHO IS THE CROWD
• Anyone
HOW IS THE TASK
OUTSOURCED
• Highly parallelizable tasks
• Every item is handled by multiple
annotators
• Every annotator provides an answer
• Consolidated answers solve scientific
problems
25-May-14
10
Users aware of how their
input contributes to the
achievement of
application’s goal (and
identify themselves with it)
vs.
Tasks are hidden behind
the application narratives.
Engagement ensured
through other incentives
25-May-14
11
EXPLICIT VS. IMPLICIT
CONTRIBUTION - AFFECTS
MOTIVATION AND ENGAGEMENT
USEWOD EXPERIMENT: TASK
DESIGN
HOW IS THE TASK OUTSOURCED:
ALTERNATIVE MODELS
• Use the data collected here to train a IE algorithm
• Use paid microtask workers to go a first screening, then expert
crowd to sort out challenging cases
• What if you have very long documents potentially mentioning
different/unknown data sets?
• Competition via Twitter
• ‘Which version of DBpedia does this paper use?’
• One question a day, prizes
• Needs golden standard to bootstrap and redundancy
• Involve the authors
• Use crowdsourcing to find out Twitter accounts, then launch campaign on
Twitter
• Write an email to the authors…
• Change the task
• Which papers use Dbpedia 3.X?
• Competition to find all papers
25-May-14
12
EXAMPLE: SOYLENT AND COMPLEX
WORKFLOWS
25-May-14
13
http://www.youtube.com/watch?v=n_miZqsPwsc
WHAT IS OUTSOURCED
• Text shortening, proof-
reading, open editing
WHO IS THE CROWD
• MTurk
HOW IS THE TASK
OUTSOURCED
• Text divided into paragraphs
• Select-fix-verify pattern
• Multiple workers in each step
See also [Bernstein et al., 2010]
DIMENSIONS OF CROWDSOURCING (3)
HOW ARE THE
RESULTS VALIDATED
• Solutions space closed
vs. open
• Performance
measurements/ground
truth
• Statistical techniques
employed to predict
accurate solutions
• May take into account
confidence values of
algorithmically
generated solutions
HOW CAN THE
PROCESS BE
OPTIMIZED
• Incentives and
motivators
• Assigning tasks to
people based on their
skills and performance
(as opposed to random
assignments)
• Symbiotic
combinations of
human- and machine-
driven computation,
including combinations
of different forms of
crowdsourcing
25-May-14
14
See also [Quinn & Bederson, 2012]
USEWOD EXPERIMENT:
VALIDATION
• Domain is fairly restricted
• Spam and obvious wrong answers can be detected easily
• When are two answers the same? Can there be more
than one correct answer per question?
• Redundancy may not be the final answer
• Most people will be able to identify the data set, but
sometimes the actual version is not trivial to reproduce
• Make educated version guess based on time intervals
and other features
25-May-14
15
ALIGNING INCENTIVES
IS ESSENTIAL
Motivation: driving force that
makes humans achieve their
goals
Incentives: ‘rewards’ assigned
by an external ‘judge’ to a
performer for undertaking a
specific task
• Common belief (among
economists): incentives can be
translated into a sum of money
for all practical purposes.
Incentives can be related to
both extrinsic and intrinsic
motivations.
Extrinsic motivation if task is
considered boring, dangerous,
useless, socially undesirable,
dislikable by the performer.
Intrinsic motivation is driven by
an interest or enjoyment in the
task itself.
16
EXAMPLE: DIFFERENT
CROWDS FOR DIFFERENT
TASKS
Contest
Linked Data experts
Difficult task
Final prize
Find Verify
Microtasks
Workers
Easy task
Micropayments
TripleCheckMate
[Kontoskostas2013] MTurk
Adapted from [Bernstein2010]
http://mturk.com
See also [Acosta et al., 2013]
17
IT‘S NOT ALWAYS
JUST ABOUT MONEY
25-May-14
18
http://www.crowdsourcing.org/editorial/how-to-motivate-the-crowd-infographic/
http://www.oneskyapp.com/blog/tips-to-motivate-participants-of-crowdsourced-
translation/
[Kaufmann, Schulze, Viet, 2011]
USEWOD EXPERIMENT:
OTHER INCENTIVES
MODELS
• Twitter-based contest
• ‘Which version of DBpedia does this paper use?’
• One question a day, prizes
• If question is not answered correctly, increase the prize
• If low participation, re-focus the audience or change the
incentive.
• Altruism: for each ten papers annotated we send a
student to ESWC…
25-May-14
19
PRICING ON MTURK: AFFORDABLE,
BUT SCALE OF EXPERIMENTS
DOES MATTER
25-May-14
20
[Ipeirotis, 2008]
USEWOD EXPERIMENT:
HYBRID APPROACH
• Use IE
algorithm to
select best
candidates
• Use different
types of
crowds
• Publish
results as
Linked Data
25-May-14
21
See also [Demartini et al., 2012]
25-May-14
22
SUMMARY
SUMMARY AND FINAL
REMARKS
• There are things computers do
better than humans  hybrid
approaches are the ultimate
solution
• There is crowdsourcing and
crowdsourcing  pick your faves
and mix them
• Human intelligence is a valuable
resource  experiment design is
key
25-May-14
23

Más contenido relacionado

Similar a A brief introduction to crowdsourcing for data collection

Crowdsourcing for research libraries
Crowdsourcing for research librariesCrowdsourcing for research libraries
Crowdsourcing for research librariesElena Simperl
 
Social machines: theory design and incentives
Social machines: theory design and incentivesSocial machines: theory design and incentives
Social machines: theory design and incentivesElena Simperl
 
Fundamentals of human computation
Fundamentals of human computationFundamentals of human computation
Fundamentals of human computationElena Simperl
 
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014 Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014 eswcsummerschool
 
Demystifying Digital Scholarship Slides: Big Project, Small Project: Steps in...
Demystifying Digital Scholarship Slides: Big Project, Small Project: Steps in...Demystifying Digital Scholarship Slides: Big Project, Small Project: Steps in...
Demystifying Digital Scholarship Slides: Big Project, Small Project: Steps in...Paige Morgan
 
Social Network Analysis & an Introduction to Tools
Social Network Analysis & an Introduction to ToolsSocial Network Analysis & an Introduction to Tools
Social Network Analysis & an Introduction to ToolsPatti Anklam
 
Embracing student innovation in the age of Generative AI
Embracing student innovation in the age of Generative AIEmbracing student innovation in the age of Generative AI
Embracing student innovation in the age of Generative AICharles Darwin University
 
UX Burlington 2017: Exploratory Research in UX Design
UX Burlington 2017: Exploratory Research in UX DesignUX Burlington 2017: Exploratory Research in UX Design
UX Burlington 2017: Exploratory Research in UX DesignSarah Fathallah
 
Learning from Complex Online Behavior with Andy Edmonds - Big Brains
Learning from Complex Online Behavior with Andy Edmonds - Big BrainsLearning from Complex Online Behavior with Andy Edmonds - Big Brains
Learning from Complex Online Behavior with Andy Edmonds - Big BrainsBloomReach
 
Incentives-driven technology design
Incentives-driven technology designIncentives-driven technology design
Incentives-driven technology designElena Simperl
 
Evaluation and User Study in HCI
Evaluation and User Study in HCIEvaluation and User Study in HCI
Evaluation and User Study in HCIByungkyu (Jay) Kang
 
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsCrowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsMatthew Lease
 
Human computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspectiveHuman computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspectiveoralonso
 
Crowdsourcing the Semantic Web
Crowdsourcing the Semantic WebCrowdsourcing the Semantic Web
Crowdsourcing the Semantic WebElena Simperl
 
Did you mean crowdsourcing for recommender systems?
Did you mean crowdsourcing for recommender systems?Did you mean crowdsourcing for recommender systems?
Did you mean crowdsourcing for recommender systems?oralonso
 
Reimagining authentic curriculum in the age of AI
Reimagining authentic curriculum in the age of AIReimagining authentic curriculum in the age of AI
Reimagining authentic curriculum in the age of AICharles Darwin University
 

Similar a A brief introduction to crowdsourcing for data collection (20)

Crowdsourcing for research libraries
Crowdsourcing for research librariesCrowdsourcing for research libraries
Crowdsourcing for research libraries
 
Social machines: theory design and incentives
Social machines: theory design and incentivesSocial machines: theory design and incentives
Social machines: theory design and incentives
 
We are the data
We are the dataWe are the data
We are the data
 
Fundamentals of human computation
Fundamentals of human computationFundamentals of human computation
Fundamentals of human computation
 
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014 Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
 
Social Network Analysis Applications and Approach
Social Network Analysis Applications and ApproachSocial Network Analysis Applications and Approach
Social Network Analysis Applications and Approach
 
Demystifying Digital Scholarship Slides: Big Project, Small Project: Steps in...
Demystifying Digital Scholarship Slides: Big Project, Small Project: Steps in...Demystifying Digital Scholarship Slides: Big Project, Small Project: Steps in...
Demystifying Digital Scholarship Slides: Big Project, Small Project: Steps in...
 
Social Network Analysis & an Introduction to Tools
Social Network Analysis & an Introduction to ToolsSocial Network Analysis & an Introduction to Tools
Social Network Analysis & an Introduction to Tools
 
Embracing student innovation in the age of Generative AI
Embracing student innovation in the age of Generative AIEmbracing student innovation in the age of Generative AI
Embracing student innovation in the age of Generative AI
 
UTS CIC2 Briefing, 17 June 2016
UTS CIC2 Briefing, 17 June 2016UTS CIC2 Briefing, 17 June 2016
UTS CIC2 Briefing, 17 June 2016
 
UX Burlington 2017: Exploratory Research in UX Design
UX Burlington 2017: Exploratory Research in UX DesignUX Burlington 2017: Exploratory Research in UX Design
UX Burlington 2017: Exploratory Research in UX Design
 
Learning from Complex Online Behavior with Andy Edmonds - Big Brains
Learning from Complex Online Behavior with Andy Edmonds - Big BrainsLearning from Complex Online Behavior with Andy Edmonds - Big Brains
Learning from Complex Online Behavior with Andy Edmonds - Big Brains
 
Incentives-driven technology design
Incentives-driven technology designIncentives-driven technology design
Incentives-driven technology design
 
Evaluation and User Study in HCI
Evaluation and User Study in HCIEvaluation and User Study in HCI
Evaluation and User Study in HCI
 
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsCrowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
 
Tell me what you want and I’ll show you what you can have: who drives design ...
Tell me what you want and I’ll show you what you can have: who drives design ...Tell me what you want and I’ll show you what you can have: who drives design ...
Tell me what you want and I’ll show you what you can have: who drives design ...
 
Human computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspectiveHuman computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspective
 
Crowdsourcing the Semantic Web
Crowdsourcing the Semantic WebCrowdsourcing the Semantic Web
Crowdsourcing the Semantic Web
 
Did you mean crowdsourcing for recommender systems?
Did you mean crowdsourcing for recommender systems?Did you mean crowdsourcing for recommender systems?
Did you mean crowdsourcing for recommender systems?
 
Reimagining authentic curriculum in the age of AI
Reimagining authentic curriculum in the age of AIReimagining authentic curriculum in the age of AI
Reimagining authentic curriculum in the age of AI
 

Más de Elena Simperl

This talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing scienceThis talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing scienceElena Simperl
 
Knowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generationKnowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generationElena Simperl
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so farElena Simperl
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringElena Simperl
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactElena Simperl
 
Ten myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdfTen myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdfElena Simperl
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringElena Simperl
 
Data commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdfData commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdfElena Simperl
 
Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?Elena Simperl
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?Elena Simperl
 
Crowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart citiesCrowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart citiesElena Simperl
 
Pie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterPie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterElena Simperl
 
High-value datasets: from publication to impact
High-value datasets: from publication to impactHigh-value datasets: from publication to impact
High-value datasets: from publication to impactElena Simperl
 
The story of Data Stories
The story of Data StoriesThe story of Data Stories
The story of Data StoriesElena Simperl
 
The human face of AI: how collective and augmented intelligence can help sol...
The human face of AI:  how collective and augmented intelligence can help sol...The human face of AI:  how collective and augmented intelligence can help sol...
The human face of AI: how collective and augmented intelligence can help sol...Elena Simperl
 
Qrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart citiesQrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart citiesElena Simperl
 
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...Elena Simperl
 
Inclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approachInclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approachElena Simperl
 

Más de Elena Simperl (20)

This talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing scienceThis talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing science
 
Knowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generationKnowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generation
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impact
 
Ten myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdfTen myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdf
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
Data commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdfData commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdf
 
Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?
 
Crowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart citiesCrowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart cities
 
Pie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterPie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on Twitter
 
High-value datasets: from publication to impact
High-value datasets: from publication to impactHigh-value datasets: from publication to impact
High-value datasets: from publication to impact
 
The story of Data Stories
The story of Data StoriesThe story of Data Stories
The story of Data Stories
 
The human face of AI: how collective and augmented intelligence can help sol...
The human face of AI:  how collective and augmented intelligence can help sol...The human face of AI:  how collective and augmented intelligence can help sol...
The human face of AI: how collective and augmented intelligence can help sol...
 
Qrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart citiesQrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart cities
 
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
 
Qrowd and the city
Qrowd and the cityQrowd and the city
Qrowd and the city
 
Inclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approachInclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approach
 

Último

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

A brief introduction to crowdsourcing for data collection

  • 1. A BRIEF INTRODUCTION TO CROWDSOURCED DATA COLLECTION ELENA SIMPERL UNIVERSITY OF SOUTHAMPTON 25-May-14 USEWOD@ESWC2014 1
  • 2. EXECUTIVE SUMMARY • Crowdsourcing can help with research data forensics • But • There are things computers do better than humans  hybrid approaches are the ultimate solution • There is crowdsourcing and crowdsourcing  pick your faves and mix them • Human intelligence is a valuable resource  experiment design is key 2
  • 3. CROWDSOURCING: PROBLEM SOLVING VIA OPEN CALLS "Simply defined, crowdsourcing represents the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call. This can take the form of peer-production (when the job is performed collaboratively), but is also often undertaken by sole individuals. The crucial prerequisite is the use of the open call format and the large network of potential .“ [Howe, 2006] 25-May-14 3
  • 4. CROWDSOURCING COMES IN DIFFERENT FORMS AND FLAVORS 25-May-14 4
  • 6. DIMENSIONS OF CROWDSOURCING WHAT IS OUTSOURCED • Tasks based on human skills not easily replicable by machines • Visual recognition • Language understanding • Knowledge acquisition • Basic human communication • ... WHO IS THE CROWD • Open call (crowd accessible through a platform) • Call may target specific skills and expertise (qualification tests) • Requester typically knows less about the ‘workers’ than in other ‘work’ environments 25-May-14 6 See also [Quinn & Bederson, 2012]
  • 7. USEWOD EXPERIMENT: TASK AND CROWD WHAT IS OUTSOURCED • Annotating research papers with data set information. • Alternative representations of the domain • Bibliographic reference • Abstract + title • Paragraph • Full paper • What if the domain is not known in advance or is infinite? • Do we know the list of potential answers? • Is there only one correct solution to each atomic task? • How many people would solve the same task? WHO IS THE CROWD • People who know the papers or the data sets • Experts in the (broader ) field • Casual gamers • Librarians • Anyone (knowledgeable of English, with a computer/cell phone…) • Combinations thereof… 25-May-14 7
  • 8. 25-May-14 Tutorial@ISWC2013 CROWDSOURCING AS ‚HUMAN COMPUTATION‘ Outsourcing tasks that machines find difficult to solve to humans 8
  • 9. DIMENSIONS OF CROWDSOURCING (2) HOW IS THE TASK OUTSOURCED • Explicit vs. implicit participation • Tasks broken down into smaller units undertaken in parallel by different people • Coordination required to handle cases with more complex workflows • Partial or independent answers consolidated and aggregated into complete solution 25-May-14 9 See also [Quinn & Bederson, 2012]
  • 10. EXAMPLE: CITIZEN SCIENCE WHAT IS OUTSOURCED • Object recognition, labeling, categorization in media content WHO IS THE CROWD • Anyone HOW IS THE TASK OUTSOURCED • Highly parallelizable tasks • Every item is handled by multiple annotators • Every annotator provides an answer • Consolidated answers solve scientific problems 25-May-14 10
  • 11. Users aware of how their input contributes to the achievement of application’s goal (and identify themselves with it) vs. Tasks are hidden behind the application narratives. Engagement ensured through other incentives 25-May-14 11 EXPLICIT VS. IMPLICIT CONTRIBUTION - AFFECTS MOTIVATION AND ENGAGEMENT
  • 12. USEWOD EXPERIMENT: TASK DESIGN HOW IS THE TASK OUTSOURCED: ALTERNATIVE MODELS • Use the data collected here to train a IE algorithm • Use paid microtask workers to go a first screening, then expert crowd to sort out challenging cases • What if you have very long documents potentially mentioning different/unknown data sets? • Competition via Twitter • ‘Which version of DBpedia does this paper use?’ • One question a day, prizes • Needs golden standard to bootstrap and redundancy • Involve the authors • Use crowdsourcing to find out Twitter accounts, then launch campaign on Twitter • Write an email to the authors… • Change the task • Which papers use Dbpedia 3.X? • Competition to find all papers 25-May-14 12
  • 13. EXAMPLE: SOYLENT AND COMPLEX WORKFLOWS 25-May-14 13 http://www.youtube.com/watch?v=n_miZqsPwsc WHAT IS OUTSOURCED • Text shortening, proof- reading, open editing WHO IS THE CROWD • MTurk HOW IS THE TASK OUTSOURCED • Text divided into paragraphs • Select-fix-verify pattern • Multiple workers in each step See also [Bernstein et al., 2010]
  • 14. DIMENSIONS OF CROWDSOURCING (3) HOW ARE THE RESULTS VALIDATED • Solutions space closed vs. open • Performance measurements/ground truth • Statistical techniques employed to predict accurate solutions • May take into account confidence values of algorithmically generated solutions HOW CAN THE PROCESS BE OPTIMIZED • Incentives and motivators • Assigning tasks to people based on their skills and performance (as opposed to random assignments) • Symbiotic combinations of human- and machine- driven computation, including combinations of different forms of crowdsourcing 25-May-14 14 See also [Quinn & Bederson, 2012]
  • 15. USEWOD EXPERIMENT: VALIDATION • Domain is fairly restricted • Spam and obvious wrong answers can be detected easily • When are two answers the same? Can there be more than one correct answer per question? • Redundancy may not be the final answer • Most people will be able to identify the data set, but sometimes the actual version is not trivial to reproduce • Make educated version guess based on time intervals and other features 25-May-14 15
  • 16. ALIGNING INCENTIVES IS ESSENTIAL Motivation: driving force that makes humans achieve their goals Incentives: ‘rewards’ assigned by an external ‘judge’ to a performer for undertaking a specific task • Common belief (among economists): incentives can be translated into a sum of money for all practical purposes. Incentives can be related to both extrinsic and intrinsic motivations. Extrinsic motivation if task is considered boring, dangerous, useless, socially undesirable, dislikable by the performer. Intrinsic motivation is driven by an interest or enjoyment in the task itself. 16
  • 17. EXAMPLE: DIFFERENT CROWDS FOR DIFFERENT TASKS Contest Linked Data experts Difficult task Final prize Find Verify Microtasks Workers Easy task Micropayments TripleCheckMate [Kontoskostas2013] MTurk Adapted from [Bernstein2010] http://mturk.com See also [Acosta et al., 2013] 17
  • 18. IT‘S NOT ALWAYS JUST ABOUT MONEY 25-May-14 18 http://www.crowdsourcing.org/editorial/how-to-motivate-the-crowd-infographic/ http://www.oneskyapp.com/blog/tips-to-motivate-participants-of-crowdsourced- translation/ [Kaufmann, Schulze, Viet, 2011]
  • 19. USEWOD EXPERIMENT: OTHER INCENTIVES MODELS • Twitter-based contest • ‘Which version of DBpedia does this paper use?’ • One question a day, prizes • If question is not answered correctly, increase the prize • If low participation, re-focus the audience or change the incentive. • Altruism: for each ten papers annotated we send a student to ESWC… 25-May-14 19
  • 20. PRICING ON MTURK: AFFORDABLE, BUT SCALE OF EXPERIMENTS DOES MATTER 25-May-14 20 [Ipeirotis, 2008]
  • 21. USEWOD EXPERIMENT: HYBRID APPROACH • Use IE algorithm to select best candidates • Use different types of crowds • Publish results as Linked Data 25-May-14 21 See also [Demartini et al., 2012]
  • 23. SUMMARY AND FINAL REMARKS • There are things computers do better than humans  hybrid approaches are the ultimate solution • There is crowdsourcing and crowdsourcing  pick your faves and mix them • Human intelligence is a valuable resource  experiment design is key 25-May-14 23