Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Getting Semantics from the Crowd
1. Ge#ng
Seman*cs
from
the
Crowd
Gianluca
Demar*ni
eXascale
Infolab,
University
of
Fribourg
Switzerland
2. Seman<c
Web
2.0
• not
the
Web
3.0
• GeDng
seman<cs
from
(non-‐expert)
people
– From
few
publishers
and
many
consumers
(SW
1.0)
– To
many
publishers
and
many
consumers
(SW
2.0)
27-‐Apr-‐12
Gianluca
Demar<ni,
eXascale
Infolab
2
3. read/write
SW
• Wikidata
hQp://meta.wikimedia.org/wiki/Wikidata
• Seman<cs
is
about
the
meaning
• Get
people
in
the
loop!
• Social
compu<ng
for
SemWeb
applica<ons
27-‐Apr-‐12
Gianluca
Demar<ni,
eXascale
Infolab
3
4. Crowdsourcing
• Exploit
human
intelligence
to
solve
– Tasks
simple
for
humans,
complex
for
machines
– With
a
large
number
of
humans
(the
Crowd)
– Small
problems:
micro-‐tasks
(Amazon
MTurk)
• Examples
– Wikipedia,
Flickr
• Incen<ves
– Financial,
fun,
visibility
27-‐Apr-‐12
Gianluca
Demar<ni,
eXascale
Infolab
4
5. Crowdsourcing
• Success
Stories
– Training
set
for
ML
– Image
tagging
– Document
annota<on/transla<on
– IR
evalua<on
[Blanco
et
al.
SIGIR
2011]
– CrowdDB
[Franklin
et
al.
SIGMOD
2011]
27-‐Apr-‐12
Gianluca
Demar<ni,
eXascale
Infolab
5
7. ZenCrowd
• Combine
both
algorithmic
and
manual
linking
• Automate
manual
linking
via
crowdsourcing
• Dynamically
assess
human
workers
with
a
probabilis<c
reasoning
framework
27-‐Apr-‐12
7
Crowd
Algorithms
Machines
8. ZenCrowd
Architecture
Micro
Matching
Tasks
HTML
Pages
HTML+ RDFa
Pages
LOD Open Data Cloud
Crowdsourcing
Platform
ZenCrowd
Entity
Extractors
LOD Index Get Entity
Input Output
Probabilistic
Network
Decision Engine
Micro-
TaskManager
Workers Decisions
Algorithmic
Matchers
27-‐Apr-‐12
Gianluca
Demar<ni,
eXascale
Infolab
8
13. Challenges
for
Crowd-‐SW
• How
to
design
the
micro-‐task
• Where
to
find
the
crowd
– MTurk,
Facebook
(900M
users)
• Evalua<on
– Which
ground
truth?!
• Quality
control
/
Spam
– Need
for
spam
benchmarks
in
Crowdsourcing
[Mechanical
Cheat
at
CrowdSearch
2012]
27-‐Apr-‐12
Gianluca
Demar<ni,
eXascale
Infolab
13