These are slides of a tutorial at ECIR by Gerard de Melo and Katja Hose.
Search is currently undergoing a major paradigm shift away from the traditional document-centric “10 blue links” towards more explicit and actionable information. Recent advances in this area are Google’s Knowledge Graph, Virtual Personal Assistants such as Siri and Google Now, as well as the now ubiquitous entity-oriented vertical search results for places, products, etc. Apart from novel query understanding methods, these developments are largely driven by structured data that is blended into the Web Search experience. We discuss efficient indexing and query processing techniques to work with large amounts of structured data. Finally, we present query interpretation and understanding methods to map user queries to these structured data sources.
23. Searching the Web of Data
Outline I
1 Structured data on the Web
Semantic markup
Semantic Web and Linked Open Data
Data management
2 Querying Linked Open Data
Browser-based link traversal
Keyword search for Linked Data
Structured queries
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 2 / 73
24. Searching the Web of Data
Structured data on the Web
Structured data on the Web
Semantic Markup (metadata, tags) embedded in HTML
Microformats, hCard, hCalendar, RDFa
Knowledge bases
Large collections of RDF data
Linked (Open) Data
References between collections of RDF data
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 3 / 73
25. Searching the Web of Data
Structured data on the Web
Semantic markup
Semantic markup
If the search engine gets some help in better “understanding” the content
of a web page, rich snippets highlighting and displaying certain information
can be created.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=99170
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 4 / 73
26. Searching the Web of Data
Structured data on the Web
Semantic markup
Microformats
Efforts started in 2003
Fixed formats for specific type of information
hCard: people, companies, organizations, and places
hCalendar: calendaring and events
hReview: reviews of products, companies, events
. . .
Cannot represent arbitrary data
Indexed by Google and Yahoo since 2009
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 5 / 73
27. Searching the Web of Data
Structured data on the Web
Semantic markup
hCard example
<div>
<img src="www.example.com/bobsmith.jpg" />
<strong>Bob Smith</strong>
Senior editor at ACME Reviews
200 Main St
Desertville, AZ 12345
</div>
<div class="vcard">
<img class="photo" src="www.example.com/bobsmith.jpg" />
<strong class="fn">Bob Smith</strong>
<span class="title">Senior editor</span> at
<span class="org">ACME Reviews</span>
<span class="adr">
<span class="street-address">200 Main St</span>
<span class="locality">Desertville</span>,
<span class="region">AZ</span>
<span class="postal-code">12345</span>
</span>
</div> http://support.google.com/webmasters/bin/answer.py?hl=en&answer=146897
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 6 / 73
28. Searching the Web of Data
Structured data on the Web
Semantic markup
RDFa
Proposed in 2004, W3C recommendation
Can be used together with any vocabulary (no restriction on schema)
Can assign URIs as global primary keys to entities
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 7 / 73
29. Searching the Web of Data
Structured data on the Web
Semantic markup
RDFa example
<div>
My name is Bob Smith but people call me Smithy. Here is my home page:
<a href="http://www.example.com">www.example.com</a>.
I live in Albuquerque, NM and work as an engineer at ACME Corp.
</div>
<div xmlns:v="http://rdf.data-vocabulary.org/#" typeof="v:Person">
My name is <span property="v:name">Bob Smith</span>,
but people call me <span property="v:nickname">Smithy</span>.
Here is my homepage:
<a href="http://www.example.com" rel="v:url">www.example.com</a>.
I live in Albuquerque, NM and work as an
<span property="v:title">engineer</span>
at <span property="v:affiliation">ACME Corp</span>.
</div>
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=146898
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 8 / 73
30. Searching the Web of Data
Structured data on the Web
Semantic markup
Facebook’s Open Graph Protocol
Introduction of “like” buttons in 2010
Allows site owners to determine how entities are displayed in Facebook
Relies on RDFa for encoding data in HTML pages
http://developers.facebook.com/docs/opengraphprotocol/
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 9 / 73
31. Searching the Web of Data
Structured data on the Web
Semantic markup
Microdata
Proposed in 2009 as part of HTML5
Alternative technique for embedding structured data
Tries to be simpler than RDFa
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 10 / 73
32. Searching the Web of Data
Structured data on the Web
Semantic markup
Microdata
<div>
My name is Bob Smith but people call me Smithy. Here is my home page:
<a href="http://www.example.com">www.example.com</a>
I live in Albuquerque, NM and work as an engineer at ACME Corp.
</div>
<div itemscope itemtype="http://data-vocabulary.org/Person">
My name is <span itemprop="name">Bob Smith</span>
but people call me <span itemprop="nickname">Smithy</span>.
Here is my homepage:
<a href="http://www.example.com" itemprop="url">www.example.com</a>
I live in Albuquerque, NM and work as an
<span itemprop="title">engineer</span>
at <span itemprop="affiliation">ACME Corp</span>.
</div>
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=176035
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 11 / 73
33. Searching the Web of Data
Structured data on the Web
Semantic markup
schema.org
“Standardized” vocabulary 2011, supported by Bing, Google, Yahoo
and Yandex
Ask site owners to embed data to enrich search results
200+ types: event, organization, person, place, product, review,. . .
Encoding: basically microdata (RDFa)
Main usage: highlighting and enriching data snippets in search results
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=99170
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 12 / 73
34. Searching the Web of Data
Structured data on the Web
Semantic markup
Embedded data in HTML
RDFa and Microdata usage grows, microformats are still present
A rather small set of vocabularies is used
The content and the vocabularies are very focused towards the major
consumers
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 13 / 73
35. Searching the Web of Data
Structured data on the Web
Semantic Web and Linked Open Data
Google Knowledge Graph
Currently1, more than 500 million entities, such as celebrities, cities,
movies,. . .
Consists of commercial third-party data and Web data
Enriching search results with summaries
Is increasingly being used by Google to answer queries
1
Google Official Blog http://googleblog.blogspot.co.uk/2012/05/
introducing-knowledge-graph-things-not.html
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 14 / 73
36. Searching the Web of Data
Structured data on the Web
Semantic Web and Linked Open Data
Google Knowledge Graph
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 15 / 73
37. Searching the Web of Data
Structured data on the Web
Semantic Web and Linked Open Data
Google Knowledge Graph
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 15 / 73
38. Searching the Web of Data
Structured data on the Web
Semantic Web and Linked Open Data
1 Structured data on the Web
Semantic markup
Semantic Web and Linked Open Data
Data management
2 Querying Linked Open Data
Browser-based link traversal
Keyword search for Linked Data
Structured queries
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 16 / 73
39. Searching the Web of Data
Structured data on the Web
Semantic Web and Linked Open Data
Semantic Web
41. Searching the Web of Data
Structured data on the Web
Semantic Web and Linked Open Data
Semantic Web
Web 1.0 (standard Web)
Web 2.0 (data generated by users)
Web 3.0 (Semantic Web)
Machine-readable data
URIs for documents and concepts (entities)
“Web of data”
Web server Web 1.0:
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 18 / 73
42. Searching the Web of Data
Structured data on the Web
Semantic Web and Linked Open Data
http://www.slideshare.net/cloudofdata/
toward-the-data-cloud
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 19 / 73
43. Searching the Web of Data
Structured data on the Web
Semantic Web and Linked Open Data
Standards
Sharing structured data across the Web relies on standards
Standard graph-based data model
RDF
Different syntaxes and formats
RDF/XML, RDFa
Powerful, logic-based schema languages and reasoning
OWL
Query languages and protocols
HTTP, SPARQL
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 20 / 73
44. Searching the Web of Data
Structured data on the Web
Semantic Web and Linked Open Data
Linked Open Data: design issues
Design issues (rules)
1 Use URIs as names for things
2 Use HTTP URIs so that people can look up those names
3 When someone looks up a URI, provide useful information, using the
standards (RDF, SPARQL)
4 Include links to URIs in other datasets
Goal: linking URIs in different data sets describing the same real
world entity
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 21 / 73
45. Searching the Web of Data
Structured data on the Web
Semantic Web and Linked Open Data
Growth: Semantic Web and Linked Open Data
May 2007
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 22 / 73
46. Searching the Web of Data
Structured data on the Web
Semantic Web and Linked Open Data
Growth: Semantic Web and Linked Open Data
March 2008
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 22 / 73
47. Searching the Web of Data
Structured data on the Web
Semantic Web and Linked Open Data
Growth: Semantic Web and Linked Open Data
July 2009
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 22 / 73
48. Searching the Web of Data
Structured data on the Web
Semantic Web and Linked Open Data
Growth: Semantic Web and Linked Open Data
September 2011
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 22 / 73
49. Searching the Web of Data
Structured data on the Web
Data management
1 Structured data on the Web
Semantic markup
Semantic Web and Linked Open Data
Data management
2 Querying Linked Open Data
Browser-based link traversal
Keyword search for Linked Data
Structured queries
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 23 / 73
50. Searching the Web of Data
Structured data on the Web
Data management
Knowledge bases
Large collections of semantic data
YAGO [9], Freebase [10], DBpedia [2],. . .
Mostly result of information extraction
Data format in general RDF
Often participate as sources in the Linked Open Data cloud
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 24 / 73
51. Searching the Web of Data
Structured data on the Web
Data management
RDF
Each resource (entity) is identified by a globally unique URI
Data is stored in the form of facts
Triple: (subject, property, object)
Subject: URI
Predicate/Property: URI
Object: URI or literal (strings, integers, booleans, etc.)
http://dbpedia.org/resource/Aalborg
http://dbpedia.org/ontology/country
http://dbpedia.org/resource/Denmark
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 25 / 73
52. Searching the Web of Data
Structured data on the Web
Data management
RDF
Each resource (entity) is identified by a globally unique URI
Data is stored in the form of facts
Triple: (subject, property, object)
Subject: URI
Predicate/Property: URI
Object: URI or literal (strings, integers, booleans, etc.)
Using prefixes
dbpedia:Aalborg
dbpedia-owl:country
dbpedia:Denmark
(dbpedia:Aalborg, dbpedia-owl:country, dbpedia:Denmark)
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 25 / 73
53. Searching the Web of Data
Structured data on the Web
Data management
RDF
Triples connect to graphs
dbpedia-owl:country
dbpedia:Aalborg
dbpedia-owl:country
dbpedia:Denmark
dbpedia-owl:isPartOf
dbpedia:North_Denmark_Region
123432
dbpedia:populationTotal
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 26 / 73
54. Searching the Web of Data
Structured data on the Web
Data management
RDF
Triples connect to graphs. . . and possibly other sources
dbpedia-owl:country
dbpedia:Aalborg
dbpedia-owl:country
dbpedia:Denmark
dbpedia-owl:isPartOf
dbpedia:North_Denmark_Region
123432
dbpedia:populationTotal
yago:Denmark
geonames:2624886
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 26 / 73
55. Searching the Web of Data
Structured data on the Web
Data management
Schema
The schema (vocabulary, ontology) can be expressed in OWL
Definition of classes, properties, restrictions,. . .
Allows for validation and reasoning
Schema information is also represented as RDF triples
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 27 / 73
56. Searching the Web of Data
Structured data on the Web
Data management
Managing large amounts of RDF data
Relational RDF data management
A single relational table
Three columns (subject, property, object)
Property tables
n-ary table columns for the same subject
Binary tables
One two-column table for each property
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 28 / 73
57. Searching the Web of Data
Structured data on the Web
Data management
A single relational table
Example triples
(dbpedia:Aalborg, dbpedia-owl:country, dbpedia:Denmark)
(dbpedia:Aalborg, dbpedia-owl:isPartOf dbpedia:North Denmark Region)
(dbpedia:North Denmark Region, dbpedia-owl:country, dbpedia:Denmark)
(dbpedia:Aalborg, dbpedia-owl:populationTotal, 123432)
subject property object
dbpedia:Aalborg dbpedia-owl:country dbpedia:Denmark
dbpedia:Aalborg dbpedia-owl:isPartOf dbpedia:North Denmark Region
dbpedia:North Denmark Region dbpedia-owl:country dbpedia:Denmark
dbpedia:Aalborg dbpedia-owl:populationTotal 123432
Works with standard relational DBMS and SQL
Problems: self joins, query optimization
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 29 / 73
58. Searching the Web of Data
Structured data on the Web
Data management
A single relational table
Example triples
(dbpedia:Aalborg, dbpedia-owl:country, dbpedia:Denmark)
(dbpedia:Aalborg, dbpedia-owl:isPartOf dbpedia:North Denmark Region)
(dbpedia:North Denmark Region, dbpedia-owl:country, dbpedia:Denmark)
(dbpedia:Aalborg, dbpedia-owl:populationTotal, 123432)
subject property object
dbpedia:Aalborg dbpedia-owl:country dbpedia:Denmark
dbpedia:Aalborg dbpedia-owl:isPartOf dbpedia:North Denmark Region
dbpedia:North Denmark Region dbpedia-owl:country dbpedia:Denmark
dbpedia:Aalborg dbpedia-owl:populationTotal 123432
Works with standard relational DBMS and SQL
Problems: self joins, query optimization
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 29 / 73
59. Searching the Web of Data
Structured data on the Web
Data management
Property tables
Example triples
(dbpedia:Aalborg, dbpedia-owl:country, dbpedia:Denmark)
(dbpedia:Aalborg, dbpedia-owl:isPartOf dbpedia:North Denmark Region)
(dbpedia:North Denmark Region, dbpedia-owl:country, dbpedia:Denmark)
(dbpedia:Aalborg, dbpedia-owl:populationTotal, 123432)
city
subject country isPartOf populationTotal
dbpedia:Aalborg dbpedia:Denmark dbpedia:North Denmark Region 123432
region
subject country
dbpedia:North Denmark Region dbpedia:Denmark
Grouping information about entities with similar properties
n-ary tables for the same subject
Difficult to create a proper layout
Null values
Problems with multi-valued attributes
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 30 / 73
60. Searching the Web of Data
Structured data on the Web
Data management
Property tables
Example triples
(dbpedia:Aalborg, dbpedia-owl:country, dbpedia:Denmark)
(dbpedia:Aalborg, dbpedia-owl:isPartOf dbpedia:North Denmark Region)
(dbpedia:North Denmark Region, dbpedia-owl:country, dbpedia:Denmark)
(dbpedia:Aalborg, dbpedia-owl:populationTotal, 123432)
city
subject country isPartOf populationTotal
dbpedia:Aalborg dbpedia:Denmark dbpedia:North Denmark Region 123432
dbpedia:Kassel dbpedia:Germany 195530
region
subject country
dbpedia:North Denmark Region dbpedia:Denmark
Grouping information about entities with similar properties
n-ary tables for the same subject
Difficult to create a proper layout
Null values
Problems with multi-valued attributesKatja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 30 / 73
61. Searching the Web of Data
Structured data on the Web
Data management
Property tables
city
subject country isPartOf populationTotal
dbpedia:Aalborg dbpedia:Denmark dbpedia:North Denmark Region 123432
dbpedia:Kassel dbpedia:Germany 195530
region
subject country
dbpedia:North Denmark Region dbpedia:Denmark
Grouping information about entities with similar properties
n-ary tables for the same subject
Difficult to create a proper layout
Null values
Problems with multi-valued attributes
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 30 / 73
62. Searching the Web of Data
Structured data on the Web
Data management
Binary tables
Example triples
(dbpedia:Aalborg, dbpedia-owl:country, dbpedia:Denmark)
(dbpedia:Aalborg, dbpedia-owl:isPartOf dbpedia:North Denmark Region)
(dbpedia:North Denmark Region, dbpedia-owl:country, dbpedia:Denmark)
(dbpedia:Aalborg, dbpedia-owl:populationTotal, 123432)
dbpedia-owl:country
subject object
dbpedia:Aalborg dbpedia:Denmark
dbpedia:North Denmark Region dbpedia:Denmark
dbpedia-owl:isPartOf
subject object
dbpedia:Aalborg dbpedia:North Denmark Region
dbpedia-owl:populationTotal
subject object
dbpedia:Aalborg 123432
Create a seperate table for each property
Can become inefficient for queries involving many common properties
Becomes inefficient if there are too many different properties
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 31 / 73
63. Searching the Web of Data
Structured data on the Web
Data management
Binary tables
dbpedia-owl:country
subject object
dbpedia:Aalborg dbpedia:Denmark
dbpedia:North Denmark Region dbpedia:Denmark
dbpedia-owl:isPartOf
subject object
dbpedia:Aalborg dbpedia:North Denmark Region
dbpedia-owl:populationTotal
subject object
dbpedia:Aalborg 123432
Create a seperate table for each property
Can become inefficient for queries involving many common properties
Becomes inefficient if there are too many different properties
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 31 / 73
64. Searching the Web of Data
Structured data on the Web
Data management
Native triple stores
RDF-3X [7]
Dictionary encoding to reduce storage space
Extensive use of B+-tree indexes
(SPO, OPS, PSO, SOP, OSP, POS)
Aggregated indexes: S, P, O, SP, SO, PO, PS, OP, OS
Triples are materialized in the indexes
Histograms provide the query optimizer with further statistcis
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 32 / 73
65. Searching the Web of Data
Structured data on the Web
Data management
Column stores
SW-Store [1]
Binary tables in combination with a column-oriented DBMS (C-store)
Sorted tables
Supports multi-valued attribues (listed in a successive row)
Increased costs for updates and tuple reconstruction
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 33 / 73
66. Searching the Web of Data
Structured data on the Web
Data management
More alternatives
Store RDF data in a matrix with bit-vector compression
Store RDF as XML and use XML technology
Graph databases
. . .
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 34 / 73
67. Searching the Web of Data
Querying Linked Open Data
1 Structured data on the Web
Semantic markup
Semantic Web and Linked Open Data
Data management
2 Querying Linked Open Data
Browser-based link traversal
Keyword search for Linked Data
Structured queries
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 35 / 73
68. Searching the Web of Data
Querying Linked Open Data
Browser-based link traversal
Who is Carlo Pedersoli?
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 36 / 73
69. Searching the Web of Data
Querying Linked Open Data
Browser-based link traversal
Who is Carlo Pedersoli?
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 37 / 73
70. Searching the Web of Data
Querying Linked Open Data
Browser-based link traversal
Who is Carlo Pedersoli?
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 37 / 73
71. Searching the Web of Data
Querying Linked Open Data
Browser-based link traversal
Who is Carlo Pedersoli?
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 37 / 73
72. Searching the Web of Data
Querying Linked Open Data
Browser-based link traversal
Who is Carlo Pedersoli?
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 37 / 73
73. Searching the Web of Data
Querying Linked Open Data
Browser-based link traversal
Who is Carlo Pedersoli?
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 37 / 73
74. Searching the Web of Data
Querying Linked Open Data
Browser-based link traversal
Who is Carlo Pedersoli?
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 37 / 73
75. Searching the Web of Data
Querying Linked Open Data
Browser-based link traversal
Who is Carlo Pedersoli?
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 37 / 73
76. Searching the Web of Data
Querying Linked Open Data
Browser-based link traversal
Who is Carlo Pedersoli?
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 37 / 73
77. Searching the Web of Data
Querying Linked Open Data
Browser-based link traversal
Who is Carlo Pedersoli?
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 37 / 73
78. Searching the Web of Data
Querying Linked Open Data
Browser-based link traversal
Who is Carlo Pedersoli?
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 37 / 73
79. Searching the Web of Data
Querying Linked Open Data
Browser-based link traversal
Who is Carlo Pedersoli?
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 37 / 73
80. Searching the Web of Data
Querying Linked Open Data
Browser-based link traversal
Browser-based link traversal
Browser-based link traversal is the most “natural” way of looking up
information using Linked Data
Might be very tedious and frustrating
Takes much time
But you will discover much information that you never intended to
search for
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 38 / 73
81. Searching the Web of Data
Querying Linked Open Data
Keyword search for Linked Data
1 Structured data on the Web
Semantic markup
Semantic Web and Linked Open Data
Data management
2 Querying Linked Open Data
Browser-based link traversal
Keyword search for Linked Data
Structured queries
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 39 / 73
82. Searching the Web of Data
Querying Linked Open Data
Keyword search for Linked Data
Keyword search for Linked Data
Given a set of keywords, find all relevant information.
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 40 / 73
83. Searching the Web of Data
Querying Linked Open Data
Keyword search for Linked Data
Falcons
Focus: entities
Crawling
Follow links contained in RDF documents
Construct virtual documents for entities
Literals
Human-readable names and descriptions (rdfs:label, rdfs:comment)
Create indexes
Terms in virtual documents
Entity classes
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 41 / 73
84. Searching the Web of Data
Querying Linked Open Data
Keyword search for Linked Data
Falcons
Query processing
Keywords
Filtering based on classes/types
Output: entities with snippets
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 42 / 73
85. Searching the Web of Data
Querying Linked Open Data
Keyword search for Linked Data
Falcons
http://ws.nju.edu.cn/falcons/objectsearch/index.jsp
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 43 / 73
86. Searching the Web of Data
Querying Linked Open Data
Keyword search for Linked Data
Falcons
http://ws.nju.edu.cn/falcons/objectsearch/index.jsp
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 43 / 73
87. Searching the Web of Data
Querying Linked Open Data
Keyword search for Linked Data
Falcons
http://ws.nju.edu.cn/falcons/objectsearch/index.jsp
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 43 / 73
88. Searching the Web of Data
Querying Linked Open Data
Keyword search for Linked Data
Sindice
Original focus: documents and sources
Indexes
URI (Unified Resource Identifier)
IFP (Inverse Functional Properties)
Literal
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 44 / 73
89. Searching the Web of Data
Querying Linked Open Data
Keyword search for Linked Data
Sindice
http://sindice.com
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 45 / 73
90. Searching the Web of Data
Querying Linked Open Data
Keyword search for Linked Data
Sindice
http://sindice.com
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 45 / 73
91. Searching the Web of Data
Querying Linked Open Data
Keyword search for Linked Data
Indexing
Indexes on virtual documents
Creating virtual documents based on the data (entity,
source,triple,. . . )
Create inverted indexes on URIs, properties, classes, literals,. . . or
combinations
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 46 / 73
92. Searching the Web of Data
Querying Linked Open Data
Structured queries
1 Structured data on the Web
Semantic markup
Semantic Web and Linked Open Data
Data management
2 Querying Linked Open Data
Browser-based link traversal
Keyword search for Linked Data
Structured queries
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 47 / 73
93. Searching the Web of Data
Querying Linked Open Data
Structured queries
Structured queries
What are the movies that both Carlo Pedersoli (Bud Spencer) and
Mario Girotti (Terence Hill) acted in?
Or what are the movies that only one of them (without the other)
acted in?
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 48 / 73
94. Searching the Web of Data
Querying Linked Open Data
Structured queries
Structured queries
Structured query language
SPARQL
Query processing strategies
Materialized query processing
Lookup-based query processing
Federated query processing
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 49 / 73
95. Searching the Web of Data
Querying Linked Open Data
Structured queries
SPARQL
Example query
PREFIX dbpedia: http://dbpedia.org/property/
PREFIX dbpedia-owl: http://dbpedia.org/ontology/
SELECT ?city, ?pop WHERE {
?city dbpedia-owl:country dbpedia:Denmark .
?city dbpedia:populationTotal ?pop .
FILTER (?pop 100000)
}
SPARQL
Similar to SQL
Variables start with “?”
Queries consist of triple patterns, e.g.:
?city dbpedia-owl:country dbpedia:Denmark
Joins between triple patterns are expressed by common variables
Filters express additional constraints
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 50 / 73
96. Searching the Web of Data
Querying Linked Open Data
Structured queries
SPARQL
Example query
PREFIX dbpedia: http://dbpedia.org/property/
PREFIX dbpedia-owl: http://dbpedia.org/ontology/
SELECT ?city, ?pop WHERE {
?city dbpedia-owl:country dbpedia:Denmark .
?city dbpedia:populationTotal ?pop .
FILTER (?pop 100000)
}
SPARQL
Similar to SQL
Variables start with “?”
Queries consist of triple patterns, e.g.:
?city dbpedia-owl:country dbpedia:Denmark
Joins between triple patterns are expressed by common variables
Filters express additional constraints
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 50 / 73
97. Searching the Web of Data
Querying Linked Open Data
Structured queries
SPARQL
Example query
PREFIX dbpedia: http://dbpedia.org/property/
PREFIX dbpedia-owl: http://dbpedia.org/ontology/
SELECT ?city, ?pop WHERE {
?city dbpedia-owl:country dbpedia:Denmark .
?city dbpedia:populationTotal ?pop .
FILTER (?pop 100000)
}
SPARQL
Similar to SQL
Variables start with “?”
Queries consist of triple patterns, e.g.:
?city dbpedia-owl:country dbpedia:Denmark
Joins between triple patterns are expressed by common variables
Filters express additional constraints
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 50 / 73
98. Searching the Web of Data
Querying Linked Open Data
Structured queries
SPARQL
Query
dbpedia-owl:country
dbpedia:Denmark
dbpedia:populationTotal
?pop ?city
Data
dbpedia-owl:country
dbpedia:Aalborg
dbpedia-owl:country
dbpedia:Denmark
dbpedia-owl:isPartOf
dbpedia:North_Denmark_Region
123432
dbpedia:populationTotal
yago:Denmark
geonames:2624886
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 51 / 73
99. Searching the Web of Data
Querying Linked Open Data
Structured queries
SPARQL
Query
dbpedia-owl:country
dbpedia:Denmark
dbpedia:populationTotal
?pop ?city
Data
dbpedia-owl:country
dbpedia:Aalborg
dbpedia-owl:country
dbpedia:Denmark
dbpedia-owl:isPartOf
dbpedia:North_Denmark_Region
123432
dbpedia:populationTotal
yago:Denmark
geonames:2624886
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 51 / 73
100. Searching the Web of Data
Querying Linked Open Data
Structured queries
Query processing strategies
Query processing strategies
Materialized query processing
Lookup-based query processing
Federated query processing
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 52 / 73
101. Searching the Web of Data
Querying Linked Open Data
Structured queries
Query processing strategies
Query processing strategies
Materialized query processing
Lookup-based query processing
Federated query processing
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 52 / 73
102. Searching the Web of Data
Querying Linked Open Data
Structured queries
Materialized query processing
Characteristics
Centralized storage
Crawl and download the data
Evaluate queries locally
As an alternative to evaluating queries on huge data sets on a single
machine, we can make use of distributed architectures and parallel
processing, e.g., MapReduce, NoSQL, P2P, Grid,. . . .
Problem
Hash partitioning is not optimal for complex queries
Possible solution
Clustered RDF management using graph partitioning [6]
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 53 / 73
103. Searching the Web of Data
Querying Linked Open Data
Structured queries
Materialized query processing
Characteristics
Centralized storage
Crawl and download the data
Evaluate queries locally
As an alternative to evaluating queries on huge data sets on a single
machine, we can make use of distributed architectures and parallel
processing, e.g., MapReduce, NoSQL, P2P, Grid,. . . .
Problem
Hash partitioning is not optimal for complex queries
Possible solution
Clustered RDF management using graph partitioning [6]
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 53 / 73
104. Searching the Web of Data
Querying Linked Open Data
Structured queries
Materialized query processing
Characteristics
Centralized storage
Crawl and download the data
Evaluate queries locally
As an alternative to evaluating queries on huge data sets on a single
machine, we can make use of distributed architectures and parallel
processing, e.g., MapReduce, NoSQL, P2P, Grid,. . . .
Problem
Hash partitioning is not optimal for complex queries
Possible solution
Clustered RDF management using graph partitioning [6]
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 53 / 73
105. Searching the Web of Data
Querying Linked Open Data
Structured queries
Clustered RDF management
Data graph
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 54 / 73
106. Searching the Web of Data
Querying Linked Open Data
Structured queries
Clustered RDF management
Graph partitioning
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 54 / 73
107. Searching the Web of Data
Querying Linked Open Data
Structured queries
Clustered RDF management
Assigning triples to partitions – 1-hop guarantee
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 54 / 73
108. Searching the Web of Data
Querying Linked Open Data
Structured queries
Clustered RDF management
2-hop guarantee
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 54 / 73
109. Searching the Web of Data
Querying Linked Open Data
Structured queries
Clustered RDF management
Query execution is more efficient in RDF stores than in Hadoop [6]
Goals
Pushing as much of the processing as possible into RDF stores
Minimizing the number of Hadoop jobs
The larger the hop guarantee, the more work is done in RDF stores
Query processing
Choose center of the query graph
Calculate distance from the center to the furthest edge
If distance = n: query can be handled by nodes independently
without communication
If distance n: communication is needed, split up into smaller
subqueries, Hadoop
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 55 / 73
110. Searching the Web of Data
Querying Linked Open Data
Structured queries
Clustered RDF management
Query execution is more efficient in RDF stores than in Hadoop [6]
Goals
Pushing as much of the processing as possible into RDF stores
Minimizing the number of Hadoop jobs
The larger the hop guarantee, the more work is done in RDF stores
Query processing
Choose center of the query graph
Calculate distance from the center to the furthest edge
If distance = n: query can be handled by nodes independently
without communication
If distance n: communication is needed, split up into smaller
subqueries, Hadoop
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 55 / 73
111. Searching the Web of Data
Querying Linked Open Data
Structured queries
Clustered RDF management
Find football players playing for clubs in a region where they were born
SELECT ?player ?club ?region WHERE {
?player rdf:type ex:footballer .
?player ex:playsFor ?club .
?player ex:bornIn ?region .
?club ex:region ?region .
?region ex:population ?pop .
}
ex:footballer
?pop
?club
?player
?region
rdf:type
ex:bornIn
ex:population
ex:playsFor
ex:region
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 56 / 73
112. Searching the Web of Data
Querying Linked Open Data
Structured queries
Clustered RDF management
Find football players playing for clubs in a region where they were born
SELECT ?player ?club ?region WHERE {
?player rdf:type ex:footballer .
?player ex:playsFor ?club .
?player ex:bornIn ?region .
?club ex:region ?region .
?region ex:population ?pop .
}
ex:footballer
?pop
?club
?player
?region
rdf:type
ex:bornIn
ex:population
ex:playsFor
ex:region
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 56 / 73
113. Searching the Web of Data
Querying Linked Open Data
Structured queries
Clustered RDF management
Find football players playing for clubs in a region where they were born
ex:footballer
?pop
?club
?player
?region
rdf:type
ex:bornIn
ex:population
ex:playsFor
ex:region
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 57 / 73
114. Searching the Web of Data
Querying Linked Open Data
Structured queries
Clustered RDF management
Find football players playing for clubs in a region where they were born
ex:footballer
?pop
?club
?player
?region
rdf:type
ex:bornIn
ex:population
ex:playsFor
ex:region
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 57 / 73
115. Searching the Web of Data
Querying Linked Open Data
Structured queries
Clustered RDF management
If the query is too “big”, the query is decomposed into multiple smaller
subqueries, and the results are combined using MapReduce.
ex:cites
ex:writtenBy
ex:hasTitle
ex:writtenBy
ex:hasName
ex:hasName
isOwned
isOwned
?name2?author2
?title2
?name1?author1
?art2?art1
Workload-Aware Replication [5]
Replicate additional queries at the boundaries
Avoid MapReduce by using a designated coordinator node
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 58 / 73
116. Searching the Web of Data
Querying Linked Open Data
Structured queries
Clustered RDF management
If the query is too “big”, the query is decomposed into multiple smaller
subqueries, and the results are combined using MapReduce.
ex:cites
ex:writtenBy
ex:hasTitle
ex:writtenBy
ex:hasName
ex:hasName
isOwned
isOwned
?name2?author2
?title2
?name1?author1
?art2?art1
Workload-Aware Replication [5]
Replicate additional queries at the boundaries
Avoid MapReduce by using a designated coordinator node
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 58 / 73
117. Searching the Web of Data
Querying Linked Open Data
Structured queries
Query processing strategies
Query processing strategies
Materialized query processing
Lookup-based query processing
No statistics or indexes
Evaluate parts of the query locally
Dereference URIs in intermediate solutions, download the data
Use downloaded data to compute other parts of the query,
dereference. . .
Federated query processing
Based on technologies originally developed for distributed database
systems, P2P systems, and data integration
Data is stored
Evaluate parts of the query on remote sources
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 59 / 73
118. Searching the Web of Data
Querying Linked Open Data
Structured queries
Query processing strategies
Query processing strategies
Materialized query processing
Lookup-based query processing
No statistics or indexes
Evaluate parts of the query locally
Dereference URIs in intermediate solutions, download the data
Use downloaded data to compute other parts of the query,
dereference. . .
Federated query processing
Based on technologies originally developed for distributed database
systems, P2P systems, and data integration
Data is stored
Evaluate parts of the query on remote sources
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 59 / 73
119. Searching the Web of Data
Querying Linked Open Data
Structured queries
Federated query processing
Federated SPARQL query processing [8]
SPARQL Request Query Result
Parsing Source Selection
Query Execution
(Bound Joins)
Global Optimizations
(Groupings + Join Order)
SPARQL
Endpoint 1 . . .
Subquery Generation:
Evaluation at
Relevant Endpoints
Local
Aggregation of
Partial ResultsCache
Per Triple Pattern
SPARQL ASK queries
SPARQL
Endpoint 2
SPARQL
Endpoint N
Assumption
The sources are capable (and willing) to evaluate SPARQL queries
(SPARQL endpoints).
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 60 / 73
120. Searching the Web of Data
Querying Linked Open Data
Structured queries
Source selection
Goal
Identify sources that might contribute to the query result
Approaches
Naive
SPARQL ASK requests and caching
Statistics and indexes
Keyword indexes
Predicate URIs, types of instances
URI indexes
Frequent paths
Service-level descriptions
VoiD statistics
Histograms
. . .
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 61 / 73
121. Searching the Web of Data
Querying Linked Open Data
Structured queries
Source selection
Goal
Identify sources that might contribute to the query result
Approaches
Naive
SPARQL ASK requests and caching
Statistics and indexes
Keyword indexes
Predicate URIs, types of instances
URI indexes
Frequent paths
Service-level descriptions
VoiD statistics
Histograms
. . .
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 61 / 73
122. Searching the Web of Data
Querying Linked Open Data
Structured queries
Naive federated query processing
Scenario
SPARQL endpoints
No statistics/indexes
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 62 / 73
123. Searching the Web of Data
Querying Linked Open Data
Structured queries
Naive federated query processing
Scenario
SPARQL endpoints
No statistics/indexes
Example query
SELECT ?Country ?Capital
?CountryPop ?CapitalPop WHERE {
?Country ex:capital ?Capital .
?Country ex:population ?CountryPop .
?Capital ex:population ?CapitalPop .
}
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 62 / 73
124. Searching the Web of Data
Querying Linked Open Data
Structured queries
Naive federated query processing
Scenario
SPARQL endpoints
No statistics/indexes
Example query
SELECT ?Country ?Capital
?CountryPop ?CapitalPop WHERE {
?Country ex:capital ?Capital .
?Country ex:population ?CountryPop .
?Capital ex:population ?CapitalPop .
}
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 62 / 73
125. Searching the Web of Data
Querying Linked Open Data
Structured queries
Naive federated query processing
Example query
SELECT ?Country ?Capital
?CountryPop ?CapitalPop WHERE {
?Country ex:capital ?Capital .
?Country ex:population ?CountryPop .
?Capital ex:population ?CapitalPop .
}
Send message with triple pattern
to all 4 sources → 4 requests
Receive 200 mappings for
?Country and ?Capital
e.g., ?Country=ex:Germany,
?Capital=ex:Berlin
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 62 / 73
126. Searching the Web of Data
Querying Linked Open Data
Structured queries
Naive federated query processing
Example query
SELECT ?Country ?Capital
?CountryPop ?CapitalPop WHERE {
?Country ex:capital ?Capital .
?Country ex:population ?CountryPop .
?Capital ex:population ?CapitalPop .
}
Use results to evaluate 2nd
triple pattern (nested loop)
200 × 4 requests
e.g., SELECT ?CountryPop WHERE
{ex:Germany ex:population
?CountryPop .}
150 mappings
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 62 / 73
127. Searching the Web of Data
Querying Linked Open Data
Structured queries
Naive federated query processing
Example query
SELECT ?Country ?Capital
?CountryPop ?CapitalPop WHERE {
?Country ex:capital ?Capital .
?Country ex:population ?CountryPop .
?Capital ex:population ?CapitalPop .
}
Use results to evaluate 3rd triple
pattern (nested loop)
150 × 4 requests
e.g., SELECT ?CapitalPop WHERE
{ex:Berlin ex:population
?CapitalPop .}
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 62 / 73
128. Searching the Web of Data
Querying Linked Open Data
Structured queries
Naive federated query processing
Example query
SELECT ?Country ?Capital
?CountryPop ?CapitalPop WHERE {
?Country ex:capital ?Capital .
?Country ex:population ?CountryPop .
?Capital ex:population ?CapitalPop .
}
In total:
4 + 200 × 4 + 150 × 4 = 1404
requests
Many (unnecessary) requests sent to the sources!
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 62 / 73
129. Searching the Web of Data
Querying Linked Open Data
Structured queries
Source selection
Goal
Identify sources that might contribute to the query result
Approaches
Naive
SPARQL ASK requests and caching
Statistics and indexes
Keyword indexes
Predicate URIs, types of instances
URI indexes
Frequent paths
Service-level descriptions
VoiD statistics
Histograms
. . .
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 63 / 73
130. Searching the Web of Data
Querying Linked Open Data
Structured queries
Source selection with SPARQL ASK
Does not require special cooperation from the sources
Example query
SELECT ?Country ?Capital
?CountryPop ?CapitalPop WHERE {
?Country ex:capital ?Capital .
?Country ex:population ?CountryPop .
?Capital ex:population ?CapitalPop .
}
Each source receives a message for each triple pattern
Response: true/false
ASK {
?Country ex:capital ?Capital .
}
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 64 / 73
131. Searching the Web of Data
Querying Linked Open Data
Structured queries
Source selection with SPARQL ASK
Does not require special cooperation from the sources
Example query
SELECT ?Country ?Capital
?CountryPop ?CapitalPop WHERE {
?Country ex:capital ?Capital .
?Country ex:population ?CountryPop .
?Capital ex:population ?CapitalPop .
}
Each source receives a message for each triple pattern
Response: true/false
ASK {
?Country ex:capital ?Capital .
}
ASK {
?Country ex:population ?CountryPop .
}
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 64 / 73
132. Searching the Web of Data
Querying Linked Open Data
Structured queries
Source selection with SPARQL ASK
Does not require special cooperation from the sources
Example query
SELECT ?Country ?Capital
?CountryPop ?CapitalPop WHERE {
?Country ex:capital ?Capital .
?Country ex:population ?CountryPop .
?Capital ex:population ?CapitalPop .
}
Each source receives a message for each triple pattern
Response: true/false
ASK {
?Country ex:capital ?Capital .
}
ASK {
?Capital ex:population ?CapitalPop .
}
ASK {
?Country ex:population ?CountryPop .
}
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 64 / 73
133. Searching the Web of Data
Querying Linked Open Data
Structured queries
Source selection
Goal
Identify sources that might contribute to the query result
Approaches
Naive
SPARQL ASK requests and caching
Statistics and indexes
Keyword indexes
Predicate URIs, types of instances
URI indexes
Frequent paths
Service-level descriptions
VoiD statistics
Histograms
. . .
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 65 / 73
134. Searching the Web of Data
Querying Linked Open Data
Structured queries
Source selection with VoID statistics
Example DBpedia2
Prefixes
@prefix owl: http://www.w3.org/2002/07/owl#.
@prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# .
@prefix dbpedia: http://dbpedia.org/resource/ .
@prefix dbpo: http://dbpedia.org/ontology/ .
...
General information
Basic statistics
Predicate statistics
Class statistics
2
http://code.google.com/p/fbench/source/browse/trunk/EvalBenchmark/suites/SPLENDID/void/
dbpedia3.5.1_subset-void.n3?r=119
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 66 / 73
135. Searching the Web of Data
Querying Linked Open Data
Structured queries
Source selection with VoID statistics
Example DBpedia2
Prefixes
General information
void:sparqlEndpoint http://dbpedia.org/sparql ;
...
Basic statistics
Predicate statistics
Class statistics
2
http://code.google.com/p/fbench/source/browse/trunk/EvalBenchmark/suites/SPLENDID/void/
dbpedia3.5.1_subset-void.n3?r=119
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 66 / 73
136. Searching the Web of Data
Querying Linked Open Data
Structured queries
Source selection with VoID statistics
Example DBpedia2
Prefixes
General information
Basic statistics
void:triples 43620475 xsd:integer ;
void:entities 2222456 xsd:integer ;
void:properties 1063 xsd:integer ;
void:distinctSubjects 9495865 xsd:integer ;
void:distinctObjects 13636604 xsd:integer ;
Predicate statistics
Class statistics
2
http://code.google.com/p/fbench/source/browse/trunk/EvalBenchmark/suites/SPLENDID/void/
dbpedia3.5.1_subset-void.n3?r=119
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 66 / 73
137. Searching the Web of Data
Querying Linked Open Data
Structured queries
Source selection with VoID statistics
Example DBpedia2
Prefixes
General information
Basic statistics
Predicate statistics
void:propertyPartition [
void:property dbpo:aSide ;
void:triples 1600 xsd:integer ;
void:distinctSubjects 1552 xsd:integer ;
void:distinctObjects 1554 xsd:integer
] , [
void:property dbpo:abbreviation ;
void:triples 1144 xsd:integer ;
void:distinctSubjects 1141 xsd:integer ;
void:distinctObjects 1096 xsd:integer
] , [
...
];
Class statistics
2
http://code.google.com/p/fbench/source/browse/trunk/EvalBenchmark/suites/SPLENDID/void/
dbpedia3.5.1_subset-void.n3?r=119
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 66 / 73
138. Searching the Web of Data
Querying Linked Open Data
Structured queries
Source selection with VoID statistics
Example DBpedia2
Prefixes
General information
Basic statistics
Predicate statistics
Class statistics
void:classPartition [
void:class dbpo:Activity ;
void:entities 1234 xsd:integer
] , [
void:class dbpo:Actor ;
void:entities 37898 xsd:integer
] , [
...
]
2
http://code.google.com/p/fbench/source/browse/trunk/EvalBenchmark/suites/SPLENDID/void/
dbpedia3.5.1_subset-void.n3?r=119
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 66 / 73
139. Searching the Web of Data
Querying Linked Open Data
Structured queries
Source selection
Goal
Identify sources that might contribute to the query result
Approaches
Naive
SPARQL ASK requests and caching
Statistics and indexes
Keyword indexes
Predicate URIs, types of instances
URI indexes
Frequent paths
Service-level descriptions
VoiD statistics
Histograms
. . .
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 67 / 73
140. Searching the Web of Data
Querying Linked Open Data
Structured queries
Histogram-based indexing
Index complete triples
Index (s,p,o) in combinations capturing correlation
Based on multidimensional histograms
Identify relevant sources for triple patterns
Identify relevant sources for joins
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 68 / 73
141. Searching the Web of Data
Querying Linked Open Data
Structured queries
Histogram-based indexing
Histograms
Transform triple statements into numerical space (hash functions)
(ex:BudSpencer ex:actedIn ex:m1234) → (323, 232, 124)
Insert into the matching bucket
Lookup: transform triple pattern into numerical space
(ex:BudSpencer ex:actedIn ?m)
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 69 / 73
142. Searching the Web of Data
Querying Linked Open Data
Structured queries
Histogram-based indexing
Histograms
Transform triple statements into numerical space (hash functions)
(ex:BudSpencer ex:actedIn ex:m1234) → (323, 232, 124)
Insert into the matching bucket
Lookup: transform triple pattern into numerical space
(ex:BudSpencer ex:actedIn ?m)
s
o
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 69 / 73
143. Searching the Web of Data
Querying Linked Open Data
Structured queries
Histogram-based indexing
Histograms
Transform triple statements into numerical space (hash functions)
(ex:BudSpencer ex:actedIn ex:m1234) → (323, 232, 124)
Insert into the matching bucket
Lookup: transform triple pattern into numerical space
(ex:BudSpencer ex:actedIn ?m)
s
o
s
o 15
15
16
1
0 0
0 0
0
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 69 / 73
144. Searching the Web of Data
Querying Linked Open Data
Structured queries
Histogram-based indexing
Histograms
Transform triple statements into numerical space (hash functions)
(ex:BudSpencer ex:actedIn ex:m1234) → (323, 232, 124)
Insert into the matching bucket
Lookup: transform triple pattern into numerical space
(ex:BudSpencer ex:actedIn ?m)
s
o
s
o 15
15
16
1
0 0
0 0
0
s
o 15
15
16
1
0 0
0 0
0
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 69 / 73
145. Searching the Web of Data
Querying Linked Open Data
Structured queries
Histogram-based indexing
Histograms
Transform triple statements into numerical space (hash functions)
(ex:BudSpencer ex:actedIn ex:m1234) → (323, 232, 124)
Insert into the matching bucket
Lookup: transform triple pattern into numerical space
(ex:BudSpencer ex:actedIn ?m)
Alternative to histograms (QTree or clustering) [11]:
s
o A1
A2 A
B
C
B1
B2
s
o A1
A2 A
B
C
B1
B2
s
o
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 69 / 73
146. Searching the Web of Data
Querying Linked Open Data
Structured queries
Histogram-based indexing
Join cardinality estimation and source selection
Determine buckets for 1st triple pattern
Determine buckets for 2nd triple pattern
Determine buckets that overlap in the join dimension (e.g., subject)
Estimate cardinality based on the degree of overlap
1st BGP
2nd BGP
2nd BGP
subject
object
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 70 / 73
147. Searching the Web of Data
Querying Linked Open Data
Summary: querying Linked Open Data
Querying Linked Open Data (LOD)
Browser-based link traversal
Most natural way of looking up Linked Data
Keyword search for Linked Open Data
Several search engines available coming in different flavors
SPARQL query processing
Materialized query processing
Lookup-based query processing
Federated query processing
Relies on powerful and available sources
Statistics require additional cooperation
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 71 / 73
148. Searching the Web of Data
Querying Linked Open Data
References I
[1] Daniel J. Abadi, Adam Marcus, Samuel Madden, and Kate Hollenbach. SW-Store:
a vertically partitioned DBMS for Semantic Web data management. VLDB J.,
18(2):385–406, 2009.
[2] S¨oren Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak,
and Zachary G. Ives. DBpedia: A nucleus for a Web of open data. In ISWC, 2007.
[3] Christian Bizer. Topology of the Web of Data, 2012. Keynote LWDM 2012.
http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/
publications/Bizer-TopologyWoD-LWDM2012-BEWEB2012.pdf?1361005355.
[4] Gianluca Demartini, Peter Mika, Thanh Tran, and Arjen P. de Vries. From Expert
Finding to Entity Search on the Web, 2012. Tutorial ECIR 2012.
http://diuf.unifr.ch/main/xi/EntitySearchTutorial.
[5] Katja Hose and Ralf Schenkel. WARP: Workload-Aware Replication and
Partitioning for RDF. In DESWEB’13, 2013.
[6] Jiewen Huang, Daniel J. Abadi, and Kun Ren. Scalable SPARQL Querying of Large
RDF Graphs. PVLDB, 4(11):1123–1134, 2011.
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 72 / 73
149. Searching the Web of Data
Querying Linked Open Data
References II
[7] Thomas Neumann and Gerhard Weikum. RDF-3X: a RISC-style engine for RDF.
PVLDB, 1(1):647–659, 2008.
[8] Andreas Schwarte, Peter Haase, Katja Hose, Ralf Schenkel, and Michael Schmidt.
Fedx: Optimization techniques for federated query processing on linked data. In
ISWC 2011, pages 601–616, 2011.
[9] Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. Yago: a core of
semantic knowledge. In WWW, 2007.
[10] Metaweb Technologies. The freebase project. http://freebase.com.
[11] J¨urgen Umbrich, Katja Hose, Marcel Karnstedt, Andreas Harth, and Axel Polleres.
Comparing data summaries for processing live queries over linked data. World Wide
Web, 14:495–544, 2011.
Katja Hose Searching the Web of Data March 24, 2013 – ECIR 2013 73 / 73