SlideShare a Scribd company logo
1 of 47
Download to read offline
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing
Maribel Acosta, Elena Simperl, Fabian Flöck, Maria-Esther Vidal!
?x	
  
dbp:producer	
  dbr:	
  
Bad_Hair	
  
Motivation (1)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
2	
  
Motivation (1)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
Due to the semi-structured nature of RDF,
incomplete values cannot be easily detected. !
3	
  
Motivation (2)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SELECT	
  DISTINCT	
  ?movie	
  WHERE	
  {	
  
	
  ?movie	
  rdf:type	
  schema.org:Movie	
  .	
  
	
  ?movie	
  dbp:producer	
  ?producer	
  .	
  
	
  ?movie	
  dct:subject	
  dbc:Universal_Pictures_film	
  .	
  
	
  ?movie	
  dct:subject	
  dbc:Films_shot_in_New_York_City	
  .	
  
}	
   	
   	
  	
  
Retrieve	
  movies	
  that	
  have	
  producers	
  and	
  have	
  been	
  filmed	
  in	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
New	
  York	
  City	
  by	
  Universal	
  Pictures.	
  	
  
39 movies!
(v. 2015-04)!
4	
  
Motivation (2)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SELECT	
  DISTINCT	
  ?movie	
  WHERE	
  {	
  
	
  ?movie	
  rdf:type	
  schema.org:Movie	
  .	
  
	
  ?movie	
  dbp:producer	
  ?producer	
  .	
  
	
  ?movie	
  dct:subject	
  dbc:Universal_Pictures_film	
  .	
  
	
  ?movie	
  dct:subject	
  dbc:Films_shot_in_New_York_City	
  .	
  
}	
   	
   	
  	
  
46 movies!
(There are 7 movies
without producers)!
Retrieve	
  movies	
  that	
  have	
  producers	
  and	
  have	
  been	
  filmed	
  in	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
New	
  York	
  City	
  by	
  Universal	
  Pictures.	
  	
  
5	
  
(v. 2015-04)!
Motivation
Movies (shot in NYC by Universal Pictures) with no producers in!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
All images licensed under Fair use via Wikipedia.!
dbr:Legal_Eagles	

6	
  
dbr:Wanderlust	

 dbr:Barney’s_	

Version_(film)	

dbr:Non_Stop_	

(film)	

dbr:The_Wolf_of_Wall_
Street_(2013_film)	

dbr:Broadway_Love	

 dbr:Trainwreck_(film)	

(v. 2015-04)!
Leonardo
DiCaprio is
a producer!
[[(?movie, dbp:producer, ?producer)]]D [[(?movie, dbp:producer, ?producer)]]D*
Problem Definition
Given an RDF data set D and a SPARQL query Q against
D. Consider D* the virtual data set that contains all the data
that should be in D. !
!
P1) Identifying portions of Q that yield missing values
!
P2) Resolving missing values
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
⊂
µ={movieàdbr:The_Wolf_of_Wall_Street_(2013)_film, produceràdbr:Leonardo_DiCaprio}
[[(?movie, dbp:producer, ?producer)]]D ∧∉
µ={movieàdbr:The_Wolf_of_Wall_Street_(2013)_film, produceràdbr:Leonardo_DiCaprio}
[[(?movie, dbp:producer, ?producer)]]D*∈
7	
  
Does not belong to DBpedia!
Should belong to DBpedia!
OUR APPROACH: HARE
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
8	
  
HARE
•  A hybrid machine/human SPARQL query engine that
is able to enhance the size of query answers. !
•  Based on a novel RDF completeness model, HARE
implements query optimization and execution techniques:!
P1) Identifying portions of queries that yield missing values.
•  HARE resorts to microtask crowdsourcing:!
P2) Resolving missing values.
!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
9	
  
HARE Architecture
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SPARQL Query Q, τ"
RDF
Completeness
Model !
Tasks!
Human
input!
Crowd Knowledge!
Query Engine!
Crowd!
CKB+! CKB-! CKB~!
Query
Optimizer!
Microtask
Manager!
LOD Cloud!
Query plan!
Crowdsourcing triple patterns!
RDF !
Data Set!
Input!
Results for Q"
Bindings from
the crowd!
RDF
data!
Output!
Aggregated!
Human Input!
10	
  
HARE Architecture
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SPARQL Query Q, τ"
RDF
Completeness
Model !
Tasks!
Human
input!
Crowd Knowledge!
Query Engine!
Crowd!
CKB+! CKB-! CKB~!
Query
Optimizer!
Microtask
Manager!
LOD Cloud!
Query plan!
Crowdsourcing triple patterns!
RDF !
Data Set!
Input!
Results for Q"
Bindings from
the crowd!
RDF
data!
Output!
Aggregated!
Human Input!
11	
  
RDF Completeness Model (1)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
dbr:!
Eric_Fellner!
dbr:!
Tim_Bevan!
dbr:!
Kevin_Misher!
dbp:producer!rdf:type!
rdf:type!
schema.org:!
Movie!
rdf:type!
dbr:!
Bad_Hair!
?!
?!
dbp:producer!
dbp:producer!
Movies have producers (e.g. db:The_Interpreter).!
dbr:!
Tower_Heist!
dbr:!
The_Interpreter!
…	
  
12	
  
RDF Completeness Model (2)
①  Predicate multiplicity of an RDF resource!
Number of different objects that a resource has for a certain predicate.!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
MD(dbr:The_Interpreter | dbp:producer) = 3
dbr:!
Eric_Fellner!
dbr:!
Tim_Bevan!
dbr:!
Kevin_Misher!
dbp:producer!
dbr:!
The_Interpreter!
13	
  
RDF Completeness Model (3)
②  Aggregated predicate multiplicity of a class!
Given a predicate, median number of distinct objects that have all the
resources that belong to a class. !
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
AMD(schema.org:Movies | dbp:producer) = 3
MD(dbr:The_Interpreter | dbp:producer) = 3
MD(dbr:Legal_Eagles | dbp:producer) = 2
14	
  
RDF Completeness Model (4)
③  Completeness of an RDF resource
(with respect to a predicate)!
Given a predicate, the completeness of an RDF resource is determined
by the aggregated predicate multiplicity of the classes that it belongs to.!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
CompD(dbr:The_Interpreter | dbp:producer) =
CompD(dbr:Legal_Eagles | dbp:producer) =
CompD(dbr:Bad_Hair) | dbp:producer) =
3
3
2
3
0
3
① 	
  	
  Computed in !
Computed in !② 	
  	
  
15	
  
HARE Architecture
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SPARQL Query Q, τ"
RDF
Completeness
Model !
Tasks!
Human
input!
Crowd Knowledge!
Query Engine!
Crowd!
CKB+! CKB-! CKB~!
Query
Optimizer!
Microtask
Manager!
LOD Cloud!
Query plan!
Crowdsourcing triple patterns!
RDF !
Data Set!
Input!
Results for Q"
Bindings from
the crowd!
RDF
data!
Output!
Aggregated!
Human Input!
16	
  
Crowd Knowledge
•  The knowledge collected from the crowd is captured in
three knowledge bases:!
•  CKB+, CKB–, CKB~ are fuzzy sets over RDF data
composed of 4-tuples of the form:!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
CKB = ( , , )
CKB+! CKB–! CKB~!
(subject, predicate, object, membership_degree)
RDF triple
17	
  
Types of Crowd Knowledge Bases!
Crowd Knowledge
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
(dbr:Bad_Hair, dbp:producer, _:o2, 0.78)!
“Brian Grazer is a producer of Tower Heist.”!
(dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)!
“Tower Heist does not have a producer.”!
(dbr:Tower_Heist, dbp:producer, _:o1, 0.05)!
“I am not sure if Bad Hair has a producer.”!
CKB+!
CKB-!
CKB~!
18	
  
Types of Crowd Knowledge Bases!
Crowd Knowledge
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
(dbr:Bad_Hair, dbp:producer, _:o2, 0.78)!
“Brian Grazer is a producer of Tower Heist.”!
(dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)!
“Tower Heist does not have a producer.”!
(dbr:Tower_Heist, dbp:producer, _:o1, 0.05)!
“I am not sure if Bad Hair has a producer.”!
CKB+!
CKB-!
CKB~!
Contradiction"
Uncertainty!
19	
  
Measuring Contradiction!
!
•  Contradiction occurs when triples with the same subject
and predicate belong to CKB+ and CKB–.!
•  It is measured as follows:!
•  Contradiction values close to 0.0 indicate high consensus.!
!
Contradiction(dbr:Tower_Heist | dbp:producer) = 1 - | 0.9 – 0.05 | !
= 0.15!
Crowd Knowledge
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
(dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)!
(dbr:Tower_Heist, dbp:producer, _:o1, 0.05)!
CKB+!
CKB–!
20	
  
Measuring Uncertainty!
!
•  When a triple belongs to CKB~, the value of the triple
object is unknown or uncertain.!
!
•  Uncertainty is measured as follows:!
•  Uncertainty values close to 1.0 indicate that the crowd has
shown to be unknowledgeable about the fact to be vetted.!
!
Uncertainty(dbr:Bad_Hair| dbp:producer) = avg({0.78})!
= 0.78!
Crowd Knowledge
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
(dbr:Bad_Hair, dbp:producer, _:o2, 0.78)!
CKB~!
21	
  
HARE Architecture
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SPARQL Query Q, τ"
RDF
Completeness
Model !
Tasks!
Human
input!
Crowd Knowledge!
Query Engine!
Crowd!
CKB+! CKB-! CKB~!
Query
Optimizer!
Microtask
Manager!
LOD Cloud!
Query plan!
Crowdsourcing triple patterns!
RDF !
Data Set!
Input!
Results for Q"
Bindings from
the crowd!
RDF
data!
Output!
Aggregated!
Human Input!
22	
  
Query Optimizer (1)
•  Heuristic-based optimizer that decomposes the BGPs of
a SPARQL query into two subsets:!
–  SQD: triples patterns executed against the data set D,"
–  SQCROWD: triple patterns to be crowdsourced.!
!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
23	
  
Query Optimizer (2)
•  Given a SPARQL query Q:!
–  Triple patterns in Q with variables in the subject position
and object position are added to SQCROWD.!
–  The rest of the triple patterns in Q are added to to SQD.!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SELECT	
  DISTINCT	
  ?movie	
  WHERE	
  {	
  
	
  ?movie	
  rdf:type	
  schema.org:Movie	
  .	
  
	
  ?movie	
  dbp:producer	
  ?producer	
  .	
  
	
  ?movie	
  dct:subject	
  dbc:Universal_Pictures_film	
  .	
  
	
  ?movie	
  dct:subject	
  dbxFilms_shot_in_New_York_City	
  .	
  
}	
   	
   	
  	
  
t1	
  
t2	
  
t3	
  
t4	
  
SQCROWD	
  
SQD	
  
SQD	
  
SQD	
  
24	
  
•  The optimizer builds a query plan TQ for query Q.!
•  Triple patterns from SQD are grouped into star-shaped
sub-queries in a bushy tree [Vidal et al.].!
•  Triple patterns in SQCROWD are added to the plan TQ in a
left-linear fashion.!
!
!
Query Optimizer (3)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
t1	
   t3	
  
t4	
  
t2	
  
SQD	
  
SQCROWD	
  
25	
  
Query Engine (1)
•  Executes the query plan TQ.!
•  Sub-queries that are part of SQD are executed against
the data set:!
•  For each mapping contained in Ω, the engine instantiates
the triple patterns in SQCROWD.!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
t1	
   t3	
  
t4	
  
SQD	
  
Ω = {{movieà dbr:Tower_Heist},	

{movieà dbr:Legal_Eagles},	

…}	

26	
  
Query Engine (2)
Example of an Iteration !
•  The engine processes {movieà dbr:Tower_Heist}. !
•  Following the running example:!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
Comp (dbr:Tower_Heist) | dbp:producer) = = 0.33
1
3
Contradiction (dbr:Tower_Heist) | dbp:producer) = 0.15
Uncertainty(dbr:Tower_Heist) | dbp:producer) = 0.0
27	
  
(dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)!
(dbr:Tower_Heist, dbp:producer, _:o1, 0.05)!
CKB+!
CKB–!
(dbr:Bad_Hair, dbp:producer, _:o2, 0.78)!CKB~!
Query Engine (3)
Example of an Iteration !
•  The algorithm computes the probability of crowdsourcing
the triple pattern (dbr:Tower_Heist, dbp:producer, ?producer):!
•  α is a score weight between 0.0 and 1.0 (in example 0.5)!
•  If P(CROWD | μ(s), p) is greater than a user threshold τ,
then algorithm crowdsources the triple pattern (μ(s), p, o).!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
P(CROWD | μ(s), p) =	

	

α (1 – 0.33) + (1 – α) min{0.15, 1 – 0.0} = 0.41	

Estimated
incompleteness
Crowd
reliability
28	
  
•  The engine combines mappings obtained from the data
set D and mappings from the crowd stored in CKB+.!
•  The query evaluation terminates when all the sub-
queries are executed. !
Query Engine (4)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
The HARE query engine does not increase the
time complexity of executing a SPARQL query.!
(Theorem 1)
29	
  
HARE Architecture
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SPARQL Query Q, τ"
RDF
Completeness
Model !
Tasks!
Human
input!
Crowd Knowledge!
Query Engine!
Crowd!
CKB+! CKB-! CKB~!
Query
Optimizer!
Microtask
Manager!
LOD Cloud!
Query plan!
Crowdsourcing triple patterns!
RDF !
Data Set!
Input!
Results for Q"
Bindings from
the crowd!
RDF
data!
Output!
Aggregated!
Human Input!
30	
  
Microtask Manager (1)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
• Receives triple patterns to
crowdsource, for example:!
• Creates human tasks.!
!
• Submits tasks to the
crowdsourcing platform.!
(dbr:Tower_Heist, dbp:producer, ?p)
31	
  
Microtask Manager (2)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
dbr:Tower_Heist, rdfs:label,
dbp:producer, rdfs:label,
dbr:Tower_Heist, foaf:depiction,
dbr:Tower_Heist, dbo:abstract,
dbr:Tower_Heis, foaf:primaryTopic,
HARE exploits the semantics
encoded in RDF resources!
32	
  
Microtask Manager (3)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
33	
  
CKB+! CKB-! CKB~!
EXPERIMENTAL STUDY
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
34	
  
•  Benchmark: 50 queries against (v. 2014).!
–  Ten queries in different knowledge domains: !
History, Life Sciences, Movies, Music, and Sports.!
•  Implementation details:!
–  HARE is implemented in Python 2.7.6.!
–  CrowdFlower is used as crowdsourcing platform.!
•  Crowdsourcing configuration:!
–  Four different RDF triples per task, 0.07 US$ per task.!
–  At least three judgments were collected per task.!
•  Total RDF triple patterns crowdsourced: 502!
•  Total answers collected from the crowd: 1,609!
Experimental Set-Up
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
35	
  
Results: Size of Query Answer (1)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
0
5
10
15
20
25
30
35
40
45
Q1 Q2 Q5 Q6 Q3 Q4 Q10 Q8 Q9 Q7
#Answers
Queries
Crowd Answers
Data Set Answers
Sports!
0
10
20
30
40
50
60
70
80
Q4 Q2 Q3 Q1 Q5 Q4 Q7 Q8 Q9 Q10
#Answers
Queries
Crowd Answers
Data Set Answers
Music! Life Sciences!
0
20
40
60
80
100
120
140
160
180
Q2 Q4 Q1 Q3 Q5 Q8 Q7 Q9 Q6 Q10
#Answers
Queries
Crowd Answers
Data Set Answers
1.25 – 2.00! 1.50 – 2.00! 1.08 – 1.92!
HARE identifies sub-queries that produce incomplete answers.
Crowdsourcing is a feasible solution to resolve missing values. !
36	
  
Metric: Number of answers when queries are executed.!
Results: Size of Query Answer (2)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
0
100
200
300
400
500
Q1 Q2 Q3 Q5 Q6 Q4 Q7 Q8 Q10 Q9
#Answers Queries
Crowd Answers
Data Set Answers
0
20
40
60
80
100
120
140
160
Q8 Q3 Q7 Q6 Q5 Q4 Q1 Q2 Q9 Q10
#Answers
Queries
Crowd Answers
Data Set Answers
Movies! History!
1.05 – 3.13! 1.10 – 1.89!
HARE identifies sub-queries that produce incomplete answers.
Crowdsourcing is a feasible solution to resolve missing values. !
37	
  
Metric: Number of answers when queries are executed.!
Metric: Elapsed time since the first task until the last answer is retrieved.!
Results: Crowd Response Time (1)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60
Q1 Q2
Q3 Q4
Q5 Q6
Q7 Q8
Q9 Q10
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90100
Q1 Q2
Q3 Q4
Q5 Q6
Q7 Q8
Q9 Q10
Judgmentscompleted(%)!
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60
Time (min)
Q1 Q2
Q3 Q4
Q5 Q6
Q7 Q8
Q9 Q10
Sports! Music! Life Sciences!
(12th min.): 77%!
Time (min)Time (min)
(12th min.): 82%! (12th min.): 97%!
At the 12th minute after the first task is submitted
the crowd produces at least 75% of the answers.!
38	
  
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60
Q1 Q2
Q3 Q4
Q5 Q6
Q7 Q8
Q9 Q10
Results: Crowd Response Time (2)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
Judgmentscompleted(%)!
Movies! History!
(12th min.): 98%!
Time (min)
(12th min.): 75%!
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Q1 Q2
Q3 Q4
Q5 Q6
Q7 Q8
Q9 Q10
Time (min)
At the 12th minute after the first task is submitted
the crowd produces at least 75% of the answers.!
39	
  
Metric: Elapsed time since the first task until the last answer is retrieved.!
Metric: A true positive is a mapping that belongs to the query answer.!
Sports Music
Life
Sciences Movies History
Q1 1.00 1.00 0.67 0.88 1.00
Q2 1.00 1.00 1.00 0.96 1.00
Q3 1.00 1.00 0.89 0.79 0.67
Q4 0.55 0.67 1.00 1.00 0.96
Q5 0.86 0.67 1.00 1.00 0.95
Q6 0.69 0.83 1.00 1.00 0.96
Q7 1.00 0.63 0.71 1.00 0.57
Q8 1.00 0.67 0.88 0.94 0.72
Q9 0.46 0.73 1.00 1.00 0.64
Q10 0.92 0.49 1.00 1.00 0.95
Avg 0.85 0.77 0.91 0.96 0.84
Results: Quality of Crowd Answers
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
Sports Music
Life
Sciences Movies History
Q1 1.00 1.00 1.00 0.47 1.00
Q2 1.00 0.29 1.00 1.00 1.00
Q3 1.00 1.00 1.00 1.00 1.00
Q4 0.83 1.00 1.00 1.00 1.00
Q5 1.00 0.86 1.00 1.00 1.00
Q6 1.00 1.00 1.00 1.00 0.96
Q7 1.00 1.00 1.00 1.00 0.84
Q8 1.00 1.00 1.00 1.00 0.78
Q9 1.00 1.00 1.00 1.00 0.92
Q10 1.00 1.00 1.00 1.00 0.98
Avg 0.98 0.91 1.00 0.95 0.95
Recall! Precision!
The crowd exhibits heterogeneous performance within domains.
This supports the importance of HARE triple-based approach.!
40	
  
RELATED WORK
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
41	
  
Human/computer query processing architectures!
Summary of Related Work
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
Manual
specification
Automatically
HARE
CrowdDB [Franklin et al.]: Tables, columns
Deco [Park and Widom]: Rules
Qurk [Marcus et al.]: Microtask I/O
HARE relies on the RDF graph and crowd
knowledge to resort to crowdsourcing !
Crowdsourcing
42	
  
Crowdsourcing in other contexts of Data Management
(SPARQL- or RDF-based)
Summary of Related Work
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
HARE
OASSIS
[Amsterdamer et al.]
KATARA
[Chu et al.]
SPARQL
Query Processing
Tabular Data
Cleansing
Recommendation
System
Mines crowdsourced
patterns specified in a
SPARQL-like language
Compares tabular data
against RDF data sets via
crowdsourced mappings
Resorts to crowdsourcing
to complete missing
values in RDF data sets
43	
  
CONCLUSIONS &
FUTURE WORK
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
44	
  
Conclusions
•  HARE: Hybrid query engine against RDF data sets.!
•  Supports microtasks to enhance query answers on-the-fly.!
!
!
•  Experimental results confirmed that:!
!
!
Future work
•  Study further approaches to capture crowd reliability.!
•  Consider other quality dimensions on the knowledge collected
from the crowd.!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
3.13 times!
Size of query answer!
Crowd response time!
(12th min.): 98%!
Accuracy!
0.84 – 0.96!
45	
  
References
•  [Amsterdamer et al.] Y. Amsterdamer, S. B. Davidson, T. Milo, S.
Novgorodov, and A. Somech. OASSIS: query driven crowd mining. In
SIGMOD, pages 589–600, 2014. !
•  [Chu et al.] X. Chu, J. Morcos, I. F. Ilyas, M. Ouzzani, P. Papotti, N. Tang,
and Y. Ye. Katara: A data cleaning system powered by knowledge bases
and crowdsourcing. In SIGMOD, pages 1247–1261, 2015. !
•  [Marcus et al.] A. Marcus, D. R. Karger, S. Madden, R. Miller, and S. Oh.
Counting with the crowd. PVLDB, 6(2):109–120, 2012. !
•  [Park and Widom] H. Park and J.Widom. Query optimization over
crowdsourced data. PVLDB, 6(10):781–792, 2013. !
•  [Vidal et al.] M.E. Vidal, E. Ruckhaus, T. Lampo, A. Martínez, J. Sierra, and
A. Polleres. Efficiently joining group patterns in SPARQL queries. In ESWC,
pages 228–242, 2010. !
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
46	
  
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing
Maribel Acosta, Elena Simperl, Fabian Flöck, Maria-Esther Vidal!
SPARQL Query Q, τ"
RDF
Completeness
Model !
Tasks!
Human
input!
Crowd Knowledge!
Query Engine!
Crowd!
CKB+! CKB-! CKB~!
Query
Optimizer!
Microtask
Manager!
LOD Cloud!
Query plan!
Crowdsourcing triple patterns!
RDF !
Data Set!
Input!
Results for Q"
Bindings from
the crowd!
RDF
data!
Output!
Aggregated!
Human Input!

More Related Content

What's hot

Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFSNilesh Wagmare
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...Amazon Web Services
 
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Jeff Z. Pan
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchNeo4j
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkSlim Baltagi
 
Google Knowledge Graph
Google Knowledge GraphGoogle Knowledge Graph
Google Knowledge Graphkarthikzinavo
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Machine Learning project presentation
Machine Learning project presentationMachine Learning project presentation
Machine Learning project presentationRamandeep Kaur Bagri
 
Web ontology language (owl)
Web ontology language (owl)Web ontology language (owl)
Web ontology language (owl)Ameer Sameer
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Collibra : Designing Workflows
Collibra : Designing WorkflowsCollibra : Designing Workflows
Collibra : Designing WorkflowsElse Kuipers
 
Credit Suisse, Reference Data Management on a Global Scale
Credit Suisse, Reference Data Management on a Global ScaleCredit Suisse, Reference Data Management on a Global Scale
Credit Suisse, Reference Data Management on a Global ScaleOrchestra Networks
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)Dan Brickley
 
Knowledge Graph Reasoning Techniques through Studies on Mystery Stories - Rep...
Knowledge Graph Reasoning Techniques through Studies on Mystery Stories - Rep...Knowledge Graph Reasoning Techniques through Studies on Mystery Stories - Rep...
Knowledge Graph Reasoning Techniques through Studies on Mystery Stories - Rep...KnowledgeGraph
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 

What's hot (20)

Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFS
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
 
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
Google Knowledge Graph
Google Knowledge GraphGoogle Knowledge Graph
Google Knowledge Graph
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
Machine Learning project presentation
Machine Learning project presentationMachine Learning project presentation
Machine Learning project presentation
 
Web ontology language (owl)
Web ontology language (owl)Web ontology language (owl)
Web ontology language (owl)
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Collibra : Designing Workflows
Collibra : Designing WorkflowsCollibra : Designing Workflows
Collibra : Designing Workflows
 
Credit Suisse, Reference Data Management on a Global Scale
Credit Suisse, Reference Data Management on a Global ScaleCredit Suisse, Reference Data Management on a Global Scale
Credit Suisse, Reference Data Management on a Global Scale
 
Data modelling 101
Data modelling 101Data modelling 101
Data modelling 101
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)
 
Knowledge Graph Reasoning Techniques through Studies on Mystery Stories - Rep...
Knowledge Graph Reasoning Techniques through Studies on Mystery Stories - Rep...Knowledge Graph Reasoning Techniques through Studies on Mystery Stories - Rep...
Knowledge Graph Reasoning Techniques through Studies on Mystery Stories - Rep...
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 

Similar to HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing

SPARQL in the Semantic Web
SPARQL in the Semantic WebSPARQL in the Semantic Web
SPARQL in the Semantic WebJan Beeck
 
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...Maribel Acosta Deibe
 
Data translation with SPARQL 1.1
Data translation with SPARQL 1.1Data translation with SPARQL 1.1
Data translation with SPARQL 1.1andreas_schultz
 
Context-Aware Access Control for RDF Graph Stores
Context-Aware Access Control for RDF Graph StoresContext-Aware Access Control for RDF Graph Stores
Context-Aware Access Control for RDF Graph StoresSerena Villata
 
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016Chris Fregly
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsJean-Paul Calbimonte
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationCraig Chao
 
Dependency Parsing-based QA System for RDF and SPARQL
Dependency Parsing-based QA System for RDF and SPARQLDependency Parsing-based QA System for RDF and SPARQL
Dependency Parsing-based QA System for RDF and SPARQLFariz Darari
 
CliqueSquare processing
CliqueSquare processingCliqueSquare processing
CliqueSquare processingINRIA-OAK
 
Querying data on the Web – client or server?
Querying data on the Web – client or server?Querying data on the Web – client or server?
Querying data on the Web – client or server?Ruben Verborgh
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Webebiquity
 
The Lonesome LOD Cloud
The Lonesome LOD CloudThe Lonesome LOD Cloud
The Lonesome LOD CloudRuben Verborgh
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraStratio
 
Towards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIsTowards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIsSpeck&Tech
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02eswcsummerschool
 
Clojure at BackType
Clojure at BackTypeClojure at BackType
Clojure at BackTypenathanmarz
 
Semantic web assignment 2
Semantic web assignment 2Semantic web assignment 2
Semantic web assignment 2BarryK88
 
Applying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesApplying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesData Ninja API
 

Similar to HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing (20)

SPARQL in the Semantic Web
SPARQL in the Semantic WebSPARQL in the Semantic Web
SPARQL in the Semantic Web
 
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
 
Data translation with SPARQL 1.1
Data translation with SPARQL 1.1Data translation with SPARQL 1.1
Data translation with SPARQL 1.1
 
Linked Data Fragments
Linked Data FragmentsLinked Data Fragments
Linked Data Fragments
 
Context-Aware Access Control for RDF Graph Stores
Context-Aware Access Control for RDF Graph StoresContext-Aware Access Control for RDF Graph Stores
Context-Aware Access Control for RDF Graph Stores
 
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of Semantics
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimization
 
Querying Linked Data
Querying Linked DataQuerying Linked Data
Querying Linked Data
 
Dependency Parsing-based QA System for RDF and SPARQL
Dependency Parsing-based QA System for RDF and SPARQLDependency Parsing-based QA System for RDF and SPARQL
Dependency Parsing-based QA System for RDF and SPARQL
 
CliqueSquare processing
CliqueSquare processingCliqueSquare processing
CliqueSquare processing
 
Querying data on the Web – client or server?
Querying data on the Web – client or server?Querying data on the Web – client or server?
Querying data on the Web – client or server?
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
 
The Lonesome LOD Cloud
The Lonesome LOD CloudThe Lonesome LOD Cloud
The Lonesome LOD Cloud
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and Cassandra
 
Towards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIsTowards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIs
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02
 
Clojure at BackType
Clojure at BackTypeClojure at BackType
Clojure at BackType
 
Semantic web assignment 2
Semantic web assignment 2Semantic web assignment 2
Semantic web assignment 2
 
Applying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesApplying large scale text analytics with graph databases
Applying large scale text analytics with graph databases
 

More from Maribel Acosta Deibe

A Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia MappingsA Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia MappingsMaribel Acosta Deibe
 
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs:A DBpedia StudyCrowdsourcing the Quality of Knowledge Graphs:A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia StudyMaribel Acosta Deibe
 
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...Maribel Acosta Deibe
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsMaribel Acosta Deibe
 
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 TutorialSemantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 TutorialMaribel Acosta Deibe
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentMaribel Acosta Deibe
 
Semantic Data Management in Graph Databases
Semantic Data Management in Graph DatabasesSemantic Data Management in Graph Databases
Semantic Data Management in Graph DatabasesMaribel Acosta Deibe
 

More from Maribel Acosta Deibe (7)

A Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia MappingsA Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia Mappings
 
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs:A DBpedia StudyCrowdsourcing the Quality of Knowledge Graphs:A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
 
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
 
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 TutorialSemantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
 
Semantic Data Management in Graph Databases
Semantic Data Management in Graph DatabasesSemantic Data Management in Graph Databases
Semantic Data Management in Graph Databases
 

Recently uploaded

COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oManavSingh202607
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxBhagirath Gogikar
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONrouseeyyy
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 

Recently uploaded (20)

COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing

  • 1. HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing Maribel Acosta, Elena Simperl, Fabian Flöck, Maria-Esther Vidal! ?x   dbp:producer  dbr:   Bad_Hair  
  • 2. Motivation (1) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 2  
  • 3. Motivation (1) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! Due to the semi-structured nature of RDF, incomplete values cannot be easily detected. ! 3  
  • 4. Motivation (2) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SELECT  DISTINCT  ?movie  WHERE  {    ?movie  rdf:type  schema.org:Movie  .    ?movie  dbp:producer  ?producer  .    ?movie  dct:subject  dbc:Universal_Pictures_film  .    ?movie  dct:subject  dbc:Films_shot_in_New_York_City  .   }         Retrieve  movies  that  have  producers  and  have  been  filmed  in                         New  York  City  by  Universal  Pictures.     39 movies! (v. 2015-04)! 4  
  • 5. Motivation (2) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SELECT  DISTINCT  ?movie  WHERE  {    ?movie  rdf:type  schema.org:Movie  .    ?movie  dbp:producer  ?producer  .    ?movie  dct:subject  dbc:Universal_Pictures_film  .    ?movie  dct:subject  dbc:Films_shot_in_New_York_City  .   }         46 movies! (There are 7 movies without producers)! Retrieve  movies  that  have  producers  and  have  been  filmed  in                         New  York  City  by  Universal  Pictures.     5   (v. 2015-04)!
  • 6. Motivation Movies (shot in NYC by Universal Pictures) with no producers in! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! All images licensed under Fair use via Wikipedia.! dbr:Legal_Eagles 6   dbr:Wanderlust dbr:Barney’s_ Version_(film) dbr:Non_Stop_ (film) dbr:The_Wolf_of_Wall_ Street_(2013_film) dbr:Broadway_Love dbr:Trainwreck_(film) (v. 2015-04)! Leonardo DiCaprio is a producer!
  • 7. [[(?movie, dbp:producer, ?producer)]]D [[(?movie, dbp:producer, ?producer)]]D* Problem Definition Given an RDF data set D and a SPARQL query Q against D. Consider D* the virtual data set that contains all the data that should be in D. ! ! P1) Identifying portions of Q that yield missing values ! P2) Resolving missing values HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! ⊂ µ={movieàdbr:The_Wolf_of_Wall_Street_(2013)_film, produceràdbr:Leonardo_DiCaprio} [[(?movie, dbp:producer, ?producer)]]D ∧∉ µ={movieàdbr:The_Wolf_of_Wall_Street_(2013)_film, produceràdbr:Leonardo_DiCaprio} [[(?movie, dbp:producer, ?producer)]]D*∈ 7   Does not belong to DBpedia! Should belong to DBpedia!
  • 8. OUR APPROACH: HARE HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 8  
  • 9. HARE •  A hybrid machine/human SPARQL query engine that is able to enhance the size of query answers. ! •  Based on a novel RDF completeness model, HARE implements query optimization and execution techniques:! P1) Identifying portions of queries that yield missing values. •  HARE resorts to microtask crowdsourcing:! P2) Resolving missing values. ! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 9  
  • 10. HARE Architecture HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SPARQL Query Q, τ" RDF Completeness Model ! Tasks! Human input! Crowd Knowledge! Query Engine! Crowd! CKB+! CKB-! CKB~! Query Optimizer! Microtask Manager! LOD Cloud! Query plan! Crowdsourcing triple patterns! RDF ! Data Set! Input! Results for Q" Bindings from the crowd! RDF data! Output! Aggregated! Human Input! 10  
  • 11. HARE Architecture HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SPARQL Query Q, τ" RDF Completeness Model ! Tasks! Human input! Crowd Knowledge! Query Engine! Crowd! CKB+! CKB-! CKB~! Query Optimizer! Microtask Manager! LOD Cloud! Query plan! Crowdsourcing triple patterns! RDF ! Data Set! Input! Results for Q" Bindings from the crowd! RDF data! Output! Aggregated! Human Input! 11  
  • 12. RDF Completeness Model (1) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! dbr:! Eric_Fellner! dbr:! Tim_Bevan! dbr:! Kevin_Misher! dbp:producer!rdf:type! rdf:type! schema.org:! Movie! rdf:type! dbr:! Bad_Hair! ?! ?! dbp:producer! dbp:producer! Movies have producers (e.g. db:The_Interpreter).! dbr:! Tower_Heist! dbr:! The_Interpreter! …   12  
  • 13. RDF Completeness Model (2) ①  Predicate multiplicity of an RDF resource! Number of different objects that a resource has for a certain predicate.! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! MD(dbr:The_Interpreter | dbp:producer) = 3 dbr:! Eric_Fellner! dbr:! Tim_Bevan! dbr:! Kevin_Misher! dbp:producer! dbr:! The_Interpreter! 13  
  • 14. RDF Completeness Model (3) ②  Aggregated predicate multiplicity of a class! Given a predicate, median number of distinct objects that have all the resources that belong to a class. ! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! AMD(schema.org:Movies | dbp:producer) = 3 MD(dbr:The_Interpreter | dbp:producer) = 3 MD(dbr:Legal_Eagles | dbp:producer) = 2 14  
  • 15. RDF Completeness Model (4) ③  Completeness of an RDF resource (with respect to a predicate)! Given a predicate, the completeness of an RDF resource is determined by the aggregated predicate multiplicity of the classes that it belongs to.! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! CompD(dbr:The_Interpreter | dbp:producer) = CompD(dbr:Legal_Eagles | dbp:producer) = CompD(dbr:Bad_Hair) | dbp:producer) = 3 3 2 3 0 3 ①     Computed in ! Computed in !②      15  
  • 16. HARE Architecture HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SPARQL Query Q, τ" RDF Completeness Model ! Tasks! Human input! Crowd Knowledge! Query Engine! Crowd! CKB+! CKB-! CKB~! Query Optimizer! Microtask Manager! LOD Cloud! Query plan! Crowdsourcing triple patterns! RDF ! Data Set! Input! Results for Q" Bindings from the crowd! RDF data! Output! Aggregated! Human Input! 16  
  • 17. Crowd Knowledge •  The knowledge collected from the crowd is captured in three knowledge bases:! •  CKB+, CKB–, CKB~ are fuzzy sets over RDF data composed of 4-tuples of the form:! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! CKB = ( , , ) CKB+! CKB–! CKB~! (subject, predicate, object, membership_degree) RDF triple 17  
  • 18. Types of Crowd Knowledge Bases! Crowd Knowledge HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! (dbr:Bad_Hair, dbp:producer, _:o2, 0.78)! “Brian Grazer is a producer of Tower Heist.”! (dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)! “Tower Heist does not have a producer.”! (dbr:Tower_Heist, dbp:producer, _:o1, 0.05)! “I am not sure if Bad Hair has a producer.”! CKB+! CKB-! CKB~! 18  
  • 19. Types of Crowd Knowledge Bases! Crowd Knowledge HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! (dbr:Bad_Hair, dbp:producer, _:o2, 0.78)! “Brian Grazer is a producer of Tower Heist.”! (dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)! “Tower Heist does not have a producer.”! (dbr:Tower_Heist, dbp:producer, _:o1, 0.05)! “I am not sure if Bad Hair has a producer.”! CKB+! CKB-! CKB~! Contradiction" Uncertainty! 19  
  • 20. Measuring Contradiction! ! •  Contradiction occurs when triples with the same subject and predicate belong to CKB+ and CKB–.! •  It is measured as follows:! •  Contradiction values close to 0.0 indicate high consensus.! ! Contradiction(dbr:Tower_Heist | dbp:producer) = 1 - | 0.9 – 0.05 | ! = 0.15! Crowd Knowledge HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! (dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)! (dbr:Tower_Heist, dbp:producer, _:o1, 0.05)! CKB+! CKB–! 20  
  • 21. Measuring Uncertainty! ! •  When a triple belongs to CKB~, the value of the triple object is unknown or uncertain.! ! •  Uncertainty is measured as follows:! •  Uncertainty values close to 1.0 indicate that the crowd has shown to be unknowledgeable about the fact to be vetted.! ! Uncertainty(dbr:Bad_Hair| dbp:producer) = avg({0.78})! = 0.78! Crowd Knowledge HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! (dbr:Bad_Hair, dbp:producer, _:o2, 0.78)! CKB~! 21  
  • 22. HARE Architecture HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SPARQL Query Q, τ" RDF Completeness Model ! Tasks! Human input! Crowd Knowledge! Query Engine! Crowd! CKB+! CKB-! CKB~! Query Optimizer! Microtask Manager! LOD Cloud! Query plan! Crowdsourcing triple patterns! RDF ! Data Set! Input! Results for Q" Bindings from the crowd! RDF data! Output! Aggregated! Human Input! 22  
  • 23. Query Optimizer (1) •  Heuristic-based optimizer that decomposes the BGPs of a SPARQL query into two subsets:! –  SQD: triples patterns executed against the data set D," –  SQCROWD: triple patterns to be crowdsourced.! ! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 23  
  • 24. Query Optimizer (2) •  Given a SPARQL query Q:! –  Triple patterns in Q with variables in the subject position and object position are added to SQCROWD.! –  The rest of the triple patterns in Q are added to to SQD.! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SELECT  DISTINCT  ?movie  WHERE  {    ?movie  rdf:type  schema.org:Movie  .    ?movie  dbp:producer  ?producer  .    ?movie  dct:subject  dbc:Universal_Pictures_film  .    ?movie  dct:subject  dbxFilms_shot_in_New_York_City  .   }         t1   t2   t3   t4   SQCROWD   SQD   SQD   SQD   24  
  • 25. •  The optimizer builds a query plan TQ for query Q.! •  Triple patterns from SQD are grouped into star-shaped sub-queries in a bushy tree [Vidal et al.].! •  Triple patterns in SQCROWD are added to the plan TQ in a left-linear fashion.! ! ! Query Optimizer (3) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! t1   t3   t4   t2   SQD   SQCROWD   25  
  • 26. Query Engine (1) •  Executes the query plan TQ.! •  Sub-queries that are part of SQD are executed against the data set:! •  For each mapping contained in Ω, the engine instantiates the triple patterns in SQCROWD.! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! t1   t3   t4   SQD   Ω = {{movieà dbr:Tower_Heist}, {movieà dbr:Legal_Eagles}, …} 26  
  • 27. Query Engine (2) Example of an Iteration ! •  The engine processes {movieà dbr:Tower_Heist}. ! •  Following the running example:! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! Comp (dbr:Tower_Heist) | dbp:producer) = = 0.33 1 3 Contradiction (dbr:Tower_Heist) | dbp:producer) = 0.15 Uncertainty(dbr:Tower_Heist) | dbp:producer) = 0.0 27   (dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)! (dbr:Tower_Heist, dbp:producer, _:o1, 0.05)! CKB+! CKB–! (dbr:Bad_Hair, dbp:producer, _:o2, 0.78)!CKB~!
  • 28. Query Engine (3) Example of an Iteration ! •  The algorithm computes the probability of crowdsourcing the triple pattern (dbr:Tower_Heist, dbp:producer, ?producer):! •  α is a score weight between 0.0 and 1.0 (in example 0.5)! •  If P(CROWD | μ(s), p) is greater than a user threshold τ, then algorithm crowdsources the triple pattern (μ(s), p, o).! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! P(CROWD | μ(s), p) = α (1 – 0.33) + (1 – α) min{0.15, 1 – 0.0} = 0.41 Estimated incompleteness Crowd reliability 28  
  • 29. •  The engine combines mappings obtained from the data set D and mappings from the crowd stored in CKB+.! •  The query evaluation terminates when all the sub- queries are executed. ! Query Engine (4) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! The HARE query engine does not increase the time complexity of executing a SPARQL query.! (Theorem 1) 29  
  • 30. HARE Architecture HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SPARQL Query Q, τ" RDF Completeness Model ! Tasks! Human input! Crowd Knowledge! Query Engine! Crowd! CKB+! CKB-! CKB~! Query Optimizer! Microtask Manager! LOD Cloud! Query plan! Crowdsourcing triple patterns! RDF ! Data Set! Input! Results for Q" Bindings from the crowd! RDF data! Output! Aggregated! Human Input! 30  
  • 31. Microtask Manager (1) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! • Receives triple patterns to crowdsource, for example:! • Creates human tasks.! ! • Submits tasks to the crowdsourcing platform.! (dbr:Tower_Heist, dbp:producer, ?p) 31  
  • 32. Microtask Manager (2) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! dbr:Tower_Heist, rdfs:label, dbp:producer, rdfs:label, dbr:Tower_Heist, foaf:depiction, dbr:Tower_Heist, dbo:abstract, dbr:Tower_Heis, foaf:primaryTopic, HARE exploits the semantics encoded in RDF resources! 32  
  • 33. Microtask Manager (3) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 33   CKB+! CKB-! CKB~!
  • 34. EXPERIMENTAL STUDY HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 34  
  • 35. •  Benchmark: 50 queries against (v. 2014).! –  Ten queries in different knowledge domains: ! History, Life Sciences, Movies, Music, and Sports.! •  Implementation details:! –  HARE is implemented in Python 2.7.6.! –  CrowdFlower is used as crowdsourcing platform.! •  Crowdsourcing configuration:! –  Four different RDF triples per task, 0.07 US$ per task.! –  At least three judgments were collected per task.! •  Total RDF triple patterns crowdsourced: 502! •  Total answers collected from the crowd: 1,609! Experimental Set-Up HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 35  
  • 36. Results: Size of Query Answer (1) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 0 5 10 15 20 25 30 35 40 45 Q1 Q2 Q5 Q6 Q3 Q4 Q10 Q8 Q9 Q7 #Answers Queries Crowd Answers Data Set Answers Sports! 0 10 20 30 40 50 60 70 80 Q4 Q2 Q3 Q1 Q5 Q4 Q7 Q8 Q9 Q10 #Answers Queries Crowd Answers Data Set Answers Music! Life Sciences! 0 20 40 60 80 100 120 140 160 180 Q2 Q4 Q1 Q3 Q5 Q8 Q7 Q9 Q6 Q10 #Answers Queries Crowd Answers Data Set Answers 1.25 – 2.00! 1.50 – 2.00! 1.08 – 1.92! HARE identifies sub-queries that produce incomplete answers. Crowdsourcing is a feasible solution to resolve missing values. ! 36   Metric: Number of answers when queries are executed.!
  • 37. Results: Size of Query Answer (2) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 0 100 200 300 400 500 Q1 Q2 Q3 Q5 Q6 Q4 Q7 Q8 Q10 Q9 #Answers Queries Crowd Answers Data Set Answers 0 20 40 60 80 100 120 140 160 Q8 Q3 Q7 Q6 Q5 Q4 Q1 Q2 Q9 Q10 #Answers Queries Crowd Answers Data Set Answers Movies! History! 1.05 – 3.13! 1.10 – 1.89! HARE identifies sub-queries that produce incomplete answers. Crowdsourcing is a feasible solution to resolve missing values. ! 37   Metric: Number of answers when queries are executed.!
  • 38. Metric: Elapsed time since the first task until the last answer is retrieved.! Results: Crowd Response Time (1) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90100 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Judgmentscompleted(%)! 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 Time (min) Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Sports! Music! Life Sciences! (12th min.): 77%! Time (min)Time (min) (12th min.): 82%! (12th min.): 97%! At the 12th minute after the first task is submitted the crowd produces at least 75% of the answers.! 38  
  • 39. 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Results: Crowd Response Time (2) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! Judgmentscompleted(%)! Movies! History! (12th min.): 98%! Time (min) (12th min.): 75%! 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Time (min) At the 12th minute after the first task is submitted the crowd produces at least 75% of the answers.! 39   Metric: Elapsed time since the first task until the last answer is retrieved.!
  • 40. Metric: A true positive is a mapping that belongs to the query answer.! Sports Music Life Sciences Movies History Q1 1.00 1.00 0.67 0.88 1.00 Q2 1.00 1.00 1.00 0.96 1.00 Q3 1.00 1.00 0.89 0.79 0.67 Q4 0.55 0.67 1.00 1.00 0.96 Q5 0.86 0.67 1.00 1.00 0.95 Q6 0.69 0.83 1.00 1.00 0.96 Q7 1.00 0.63 0.71 1.00 0.57 Q8 1.00 0.67 0.88 0.94 0.72 Q9 0.46 0.73 1.00 1.00 0.64 Q10 0.92 0.49 1.00 1.00 0.95 Avg 0.85 0.77 0.91 0.96 0.84 Results: Quality of Crowd Answers HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! Sports Music Life Sciences Movies History Q1 1.00 1.00 1.00 0.47 1.00 Q2 1.00 0.29 1.00 1.00 1.00 Q3 1.00 1.00 1.00 1.00 1.00 Q4 0.83 1.00 1.00 1.00 1.00 Q5 1.00 0.86 1.00 1.00 1.00 Q6 1.00 1.00 1.00 1.00 0.96 Q7 1.00 1.00 1.00 1.00 0.84 Q8 1.00 1.00 1.00 1.00 0.78 Q9 1.00 1.00 1.00 1.00 0.92 Q10 1.00 1.00 1.00 1.00 0.98 Avg 0.98 0.91 1.00 0.95 0.95 Recall! Precision! The crowd exhibits heterogeneous performance within domains. This supports the importance of HARE triple-based approach.! 40  
  • 41. RELATED WORK HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 41  
  • 42. Human/computer query processing architectures! Summary of Related Work HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! Manual specification Automatically HARE CrowdDB [Franklin et al.]: Tables, columns Deco [Park and Widom]: Rules Qurk [Marcus et al.]: Microtask I/O HARE relies on the RDF graph and crowd knowledge to resort to crowdsourcing ! Crowdsourcing 42  
  • 43. Crowdsourcing in other contexts of Data Management (SPARQL- or RDF-based) Summary of Related Work HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! HARE OASSIS [Amsterdamer et al.] KATARA [Chu et al.] SPARQL Query Processing Tabular Data Cleansing Recommendation System Mines crowdsourced patterns specified in a SPARQL-like language Compares tabular data against RDF data sets via crowdsourced mappings Resorts to crowdsourcing to complete missing values in RDF data sets 43  
  • 44. CONCLUSIONS & FUTURE WORK HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 44  
  • 45. Conclusions •  HARE: Hybrid query engine against RDF data sets.! •  Supports microtasks to enhance query answers on-the-fly.! ! ! •  Experimental results confirmed that:! ! ! Future work •  Study further approaches to capture crowd reliability.! •  Consider other quality dimensions on the knowledge collected from the crowd.! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 3.13 times! Size of query answer! Crowd response time! (12th min.): 98%! Accuracy! 0.84 – 0.96! 45  
  • 46. References •  [Amsterdamer et al.] Y. Amsterdamer, S. B. Davidson, T. Milo, S. Novgorodov, and A. Somech. OASSIS: query driven crowd mining. In SIGMOD, pages 589–600, 2014. ! •  [Chu et al.] X. Chu, J. Morcos, I. F. Ilyas, M. Ouzzani, P. Papotti, N. Tang, and Y. Ye. Katara: A data cleaning system powered by knowledge bases and crowdsourcing. In SIGMOD, pages 1247–1261, 2015. ! •  [Marcus et al.] A. Marcus, D. R. Karger, S. Madden, R. Miller, and S. Oh. Counting with the crowd. PVLDB, 6(2):109–120, 2012. ! •  [Park and Widom] H. Park and J.Widom. Query optimization over crowdsourced data. PVLDB, 6(10):781–792, 2013. ! •  [Vidal et al.] M.E. Vidal, E. Ruckhaus, T. Lampo, A. Martínez, J. Sierra, and A. Polleres. Efficiently joining group patterns in SPARQL queries. In ESWC, pages 228–242, 2010. ! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 46  
  • 47. HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing Maribel Acosta, Elena Simperl, Fabian Flöck, Maria-Esther Vidal! SPARQL Query Q, τ" RDF Completeness Model ! Tasks! Human input! Crowd Knowledge! Query Engine! Crowd! CKB+! CKB-! CKB~! Query Optimizer! Microtask Manager! LOD Cloud! Query plan! Crowdsourcing triple patterns! RDF ! Data Set! Input! Results for Q" Bindings from the crowd! RDF data! Output! Aggregated! Human Input!