SlideShare una empresa de Scribd logo
1 de 184
DBpedia and the Emerging Web of
Linked Data
Sören Auer
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 2 http://lod2.eu
• 2000 Mathematics and Computer Science studies in
Hagen, Dresden and Екатеринбург
• Managing director of adVIS GmbH – SME focused
on Web-Application and Content Management technology
• IT consultant for various companies (T-Mobile AG, RDL Corp., Science
Computing AG)
• 2006 doctorate in Information Systems / Computer Science at Universität Leipzig
• 2006-2008 post-doctoral researcher at the DB Group at University of
Pennsylvania (USA)
• Head of AKSW research group – DBpedia, OntoWiki, LinkedGeoData, Triplify
• Research interests: Information Systems, Database and Web Technologies,
Semantic Web and Knowledge Engineering, Adaptive Methodologies, HCI, E-
Science, Digital Libraries
• Coordinator of the EU FP7 IP Project “LOD2 – Creating Knowledge out of
Interlinked Data”
• Work as expert for W3C, EU FP6/FP7/CIP, University City Keystone Innovation
Zone, Swiss National Science Foundation
Dr. Sören Auer
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 3 http://lod2.eu
1. The Vision & Big Picture
2. Linked Data 101
3. The Linked Data Life-cycle
Agenda
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 4 http://lod2.eu
1. Reasoning does not scale on the Web
• IR / one dimensional indexing scales (Google)
• Next step conjunctive querying (OWL-QL?, dynamic
scale-out / clustering)
• Web scalable DL reasoning is out-of-sight (maybe fragment,
fuzzy reasoning has some chances)
2. If it would scale it would not be affordable
• “What is the only former Yugoslav republic in the
European Union?”
• 2880 POWER7 cores, 16 Terabytes memory, 4 Terabytes
clustered storage (IBM Watson) still can not answer this
question
Why the Semantic Web won‘t work (soon)
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 5 http://lod2.eu
Web
server
Web
server
Problem: Try to search for these things on the current Web:
• Apartments near German-Russian bilingual childcare in Berlin.
• ERP service providers with offices in Vienna and London.
• Researchers working on multimedia topics in Eastern Europe.
Information is available on the Web, but opaque to current search.
Why do we need the Data Web?
berlin.de
Has everything about
childcare in Berlin.
Immobilienscout.de
Knows all about real estate
offers in GermanyDB
Web
server
DB
Web
server
Search engineHTML HTML
RDF
RDF
Solution: complement text on Web pages with structured linked
open data & intelligently combine/integrate such structured
information from different sources:
From the Document Web to the
Semantic Data Web
Web (since 1992)
• HTTP
• HTML/CSS/JavaScript
Semantic Web
(Vision 1998, starting ???)
• Reasoning
• Logic, Rules
• Trust
Social Web (since 2003)
• Folksonomies/Tagging
• Reputation, sharing
• Groups, relationships
Data Web (since 2006)
• URI de-referencability
• Web Data integration
• RDF serializations
Web 1.0 Web 2.0 Web 3.0
Many Web sites
containing unstructured,
textual content
Few large Web sites
are specialized on
specific content types
Many Web sites containing
& semantically syndicating
arbitrarily structured
content
Pictures
Video
Encyclopedic
articles
+ +
The Long Tail of Information Domains
Pictures
News
Video
Recipes
Calendar
Currently
supported
structured
content types
SemWeb supported structured content
Gene
sequences
Itinerary of
King George
Talent
management
Popularity
Not or insufficiently supported content types
The Long Tail by Chris Anderson
(Wired, Oct. ´04) adopted to
information domains
… …
Requirements-
Engineering
…
…
Special interest
communities
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 9 http://lod2.eu
1. Uses RDF Data Model
Linked Data in a Nutshell
SBBD2011
Florianopolis
3.10.2011
SBC
organizes
starts
takesPlaceIn
2. Is serialised in triples:
SBC organizes SBBD2011
SBBD2011 starts “20111003”^^xsd:date
SBBD2011 takesPlaceAt Florianopolis
3. Uses Content-negotiation
The emerging Web of Data
20082007
2008
2008
2008
2009
2009
Virtouso
SemMF
SILK
poolparty
DL-Learner
Sindice
Sigma
ORE
OntoWiki
MonetDB
DXX Engine
WiQA
repair
interlink
fuse
classify
enrich
create
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 11 http://lod2.eu
Conceptual Level
Data Access and Integration
Object-relational mappings (ORM)
• NeXT’s EOF / WebObjects
• ADO.NET Entity Framework
• Hibernate
Entity-attribute-value
(EAV)
• HELP medical record
system, TrialDB
Column-oriented DBMS
• Collocates column
values rather than row
values
• Vertica, C-Store,
MonetDB
Data Web
• URIs as entity identifiers
• HTTP as data access
protocol
• Local-As-View (LAV)
RDBMS
• Organize data in
relations, rows, cells
• Oracle, DB2, MS-
SQL
Triple/Quad Stores
• RDF data model
• Virtuoso, Oracle,
Sesame
DataModels
Others
• XML, hierachical,
tree, graph-oriented
DBMS
Procedural APIs
• ODBC
• JDBC
DataAccess
Query Languages
• Datalog, SQL
• SPARQL
• XPATH/XQuery
DataIntegration
Linked Data
• de-referencable URIs
• RDF serialization
formats
Enterprise Information
Integration
sets of heterogeneous data
sources appear as a single,
homogeneous data source
Data Warehousing
• Based on extract,
transform load (ETL)
• Global-As-View (GAV)
Research
Mediators
Ontology-based
P2P
Web service-based
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 12 http://lod2.eu
1. The Vision & Big Picture
2. Linked Data 101
(based on Michael Hausenblas‘ slides)
3. The Linked Data Life-cycle
Agenda
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 13 http://lod2.eu
Orientation
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 14 http://lod2.eu
Linked Data 101
Linked Data provides a standardised API for:
 Data and metadata discovery
 Data integration
 Distributed query
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 15 http://lod2.eu
Linked Data principles
1. Use URIs to identify the “things” in your data
2. Use http:// URIs so people (and machines) can
look them up on the web
3. When a URI is looked up, return a description of
the thing (in RDF format)
4. Include links to related things
http://www.w3.org/DesignIssues/LinkedData.html
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 16 http://lod2.eu
Linked Data principles
 They are principles, not implementation advices
 Not humans or machines but humans and machines!
 Content negotiation (e.g. HTML and RDF/XML)
 HTML+ RDFa
 Metcalfe’s Law
http://en.wikipedia.org/wiki/Metcalfe%27s_law
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 17 http://lod2.eu
Linked Data example
17
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 18 http://lod2.eu
HTTP URIs
 A Uniform Resource Identifier (URI) is a compact
sequence of characters that identifies an abstract or
physical resource. [RFC3986]
 Syntax
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
 Example
foo://example.com:8042/over/there?name=ferret#nose
_/ _________________/_________/ __________/ __/
| | | | |
scheme authority path query fragment
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 19 http://lod2.eu
HTTP URIs
 URI references
An RDF URI reference is a Unicode string does not contain any
control characters (#x00 - #x1F, #x7F-#x9F) and would produce
a valid URI character sequence representing an absolute URI
when subjected to an UTF-8 encoding along with %-escaping
non-US-ASCII octets.
 Qualified Names (QNames)
XML’s way to allow namespaced elements/attributes as of
QName = Prefix ‘:‘ LocalPart
 Compact URIs (CURIEs)
Generic, abbreviated syntax for expressing URIs
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 20 http://lod2.eu
HTTP
The Hypertext Transfer Protocol (HTTP) is an application-
level protocol for distributed, collaborative, hypermedia
information systems.
It is a generic, stateless, protocol which can be used for
many tasks beyond its use for hypertext, such as name
servers and distributed object management systems,
through extension of its request methods, error codes
and headers.
A feature of HTTP is the typing and negotiation of data
representation, allowing systems to be built independently
of the data being transferred.
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 21 http://lod2.eu
HTTP
 HTTP messages consist of requests from client to
server and responses from server to client
 Set of methods is predefined
 GET
 POST
 PUT
 DELETE
 HEAD
 (OPTIONS)
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 22 http://lod2.eu
HTTP
Status codes
 Informational 1xx, provisional response, (100 Continue)
 Successful 2xx, request successfully received, understood,
and accepted (201 Created)
 Redirection 3xx, further action needs to be taken by user
agent to fulfill the request (301 Moved Permanently)
 Client Error 4xx, client erred (405 Method Not Allowed)
 Server Error 5xx, server encountered an unexpected
condition (501 Not Implemented)
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 23 http://lod2.eu
HTTP
GET /html/rfc2616 HTTP/1.1
Host: tools.ietf.org
User-Agent: Mozilla/5.0
Accept:
text/html,application/xhtml+xml,application/xml
;q=0.9,*/*;q=0.8
HTTP/1.x 200 OK
Date: Thu, 05 Mar 2009 08:17:33 GMT
Server: Apache/2.2.11
Content-Location: rfc2616.html
Last-Modified: Tue, 20 Jan 2009 09:16:04 GMT
Content-Type: text/html; charset=UTF-8
REQUESTRESPONSE
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 24 http://lod2.eu
HTTP
 Content Negotiation: selecting representation for a given
response when multiple representations available
 Three types of CN: server-driven, agent-driven CN,
transparent CN
 Example:
curl -I -H "Accept: application/rdf+xml" http://dbpedia.org/resource/Galway
HTTP/1.1 303 See Other
Content-Type: application/rdf+xml
Location: http://dbpedia.org/data/Galway.rdf
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 25 http://lod2.eu
HTTP
 Caching (see Cache–Control header field) is
essential for scalability
http://webofdata.wordpress.com/2009/11/23/linked-open-data-http-caching/
 HTTPbis IETF WG chaired by Mark Nottingham, mainly
about: patches, clarifications, deprecate non-used
features, documentation of security properties
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 26 http://lod2.eu
REST - HTTP
Representational State Transfer (REST)
resource intended conceptual target of a hypertext reference
resource identifier URL, URN
representation HTML document, JPEG image
representation media type, last-modified time
metadata
resource source link, alternates, vary
metadata
control data if-modified-since, cache-control
http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm
http://webofdata.wordpress.com/2009/10/09/linked-data-for-restafarians/
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 27 http://lod2.eu
Web's Standard Retrieval Algorithm
1. parse URI and find HTTP protocol
2. look up DNS name to determine the associated
IP address
3. open a TCP stream to port 80 at the IP address
determined above
4. format an HTTP GET request for resource and
sends that to the server
5. read response from the server
6. from the status code (200) determine that a
representation of the resource is available
7. inspect the returned Content-Type
8. pass the entity-body to its HTML rendering
engine
http://www.w3.org/2001/tag/doc/selfDescribingDocuments
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 28 http://lod2.eu
RDF
 A data model - directed, labeled graph
 Triple: (subject predicate object)
 subject … URIref or bNode
 predicate … URIref
 object … URIref or bNode or literal
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 29 http://lod2.eu
RDF Triple
•
• Inspired by linguistic categories
• Allowed usage:
Subject : URI or blank node
Predicate: URI (also called properties)
Object : URI or blank nodes or literal
Burkhard Jung Leipzig
isMayorOf
Subject Predicate Object
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 30 http://lod2.eu
Example RDF Graph

0341Leipzig
hasAreaCode
Burkhard Jung
hasMayor
Saxony
locatedIn
51.3333
latitude
12.3833
longitude
Germany
Social Democratic Party
1958-03-07 isMemberOf
locatedIn
born
isMayorOf
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 31 http://lod2.eu
Literals
• Representation of data values
• Serialization as strings
• Interpretation based on the datatype
• Literals without Datatype are treated as strings
Leipzig
Burkhard Jung
51.3333latitude
12.3833
longitude
1958-03-07
born
isMayorOf hasMayor
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 32 http://lod2.eu
RDF Serialization
N3: "Notation 3" - extensive formalism
N-Triples: part of N3
Turtle: Extension of N-Triples (shortcuts)
Quelle:http://www.w3.org/DesignIssues/Notation3.html
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 33 http://lod2.eu
Turtle Syntax
• URIs in angle brackets
• Literals in quotes
• Triples separated by dot
• Whitespace is ignored
3
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 34 http://lod2.eu
Turtle Syntax:
Shortcuts
http://dbpedia.org/resource/Leipzig http://dbpedia.org/property/hasMayor http://dbpedia.org/resource/Burkhard_Jung ;
http://www.w3.org/2000/01/rdf-schema#label "Leipzig"@de ;
http://www.w3.org/2003/01/geo/wgs84_pos#lat "51.333332"^^xsd:float ;
http://www.w3.org/2003/01/geo/wgs84_pos#lon "12.383333"^^xsd:float .
Shortcuts for namespace prefixes:
@prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#> .
@prefix dbp:<http://dbpedia.org/resource/> .
@prefix dbpp:<http://dbpedia.org/property/> .
@prefix geo:<http://www.w3.org/2003/01/geo/wgs84_pos#> .
dbp:Leipzig dbpp:hasMayor dbp:Burkhard_Jung .
dbp:Leipzig rdfs:label "Leipzig"@de .
dbp:Leipzig geo:lat "51.333332"^^xsd:float .
dbp:Leipzig geo:lon "12.383333"^^xsd:float .
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 35 http://lod2.eu
Turtle Syntax: Shortcuts
Group triples with same subject using “;” instead of “.”:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs="http://www.w3.org/2000/01/rdf-schema#> .
@prefix dbp="http://dbpedia.org/resource/> .
@prefix dbpp="http://dbpedia.org/property/> .
@prefix geo="http://www.w3.org/2003/01/geo/wgs84_pos#> .
dbp:Leipzig dbpp:hasMayor dbp:Burkhard_Jung ;
rdfs:label "Leipzig"@de ;
geo:lat "51.333332"^^xsd:float ;
geo:lon "12.383333"^^xsd:float .
also Triple with same subject and predicate:
@prefix dbp="http://dbpedia.org/resource/> .
@prefix dbpp="http://dbpedia.org/property/> .
dbp:Leipzig dbp:locatedIn dbp:Saxony, dbp:Germany;
dbpp:hasMayor dbp:Burkhard_Jung .
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 36 http://lod2.eu
XML-Syntax von RDF
• Turtle intuitively readable and machine processable
• but: better tool support and programming libraries for XML
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:dbpp="http://dbpedia.org/property/"
xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#">
<rdf:Description rdf:about="http://dbpedia.org/resource/Leipzig">
<property:hasMayor
rdf:resource="http://dbpedia.org/resource/Burkhard_Jung" />
<rdfs:label xml:lang="de">Leipzig</rdfs:label>
<geo:lat rdf:datatype="float">51.3333</geo:lat>
<geo:lon rdf:datatype="float">12.3833</geo:lon>
</rdf:Description>
</rdf:RDF>
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 37 http://lod2.eu
RDF/JSON
• JSON = JavaScript Object Notation
• Compact format for data exchange between applications
• JSON documents are valid JavaScript
• Programming language independent, since parser exist for all popular
programming languages
• Less overhead when parsing and serialising than XML
{ "S" : { "P" : [ O ] } }
• Subject: URI, BNode
• Predicate: URI
• Object:
Type: „URI“, „Literal“ or „bnode“
Value: data value
Lang: language tag
Datatype: URI of the datatype.
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 38 http://lod2.eu
JSON Example
{
"http://dbpedia.org/resource/Leipzig" : {
"http://dbpedia.org/property/hasMayor":
[ { "type":"uri", "value":"http://dbpedia.org/resource/Burkhard_Jung" } ],
"http://www.w3.org/2000/01/rdf-schema#label":
[ { "type":"literal", "value":"Leipzig", "lang":"en" } ] ,
"http://www.w3.org/2003/01/geo/wgs84_pos#lat":
[ { "type":"literal", "value":"51.3333",
"datatype":"http://www.w3.org/2001/XMLSchema#float" } ]
"http://www.w3.org/2003/01/geo/wgs84_pos#lon":
% [ { "type":"literal", "value":"12.3833",
"datatype":"http://www.w3.org/2001/XMLSchema#float" } ]
}
}
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 39 http://lod2.eu
RDFa Syntax
• RDFa = Resource Description Framework – in –attributes
• Embedding RDF in XHTML
• UTF-8 and UTF-16, since Extension of XML based XHTML
• Due to embedding in HTML more overhead than other serialisations
• Less readable
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html version="XHTML+RDFa 1.0" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:dbpp="http://dbpedia.org/property/"
xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#">
<head><title>Leipzig</title></head>
<body about="http://dbpedia.org/resource/Leipzig">
<h1 property="rdfs:label" xml:lang="de">Leipzig</h1>
<p>Leipzig is a city in Germany. Leipzig's mayor is
<a href="Burkhard_Jung" rel="dbpp:hasMayor">Burkhard Jung</a>. It is located
at latitude <span property="geo:lat" datatype="xsd:float">51.3333</span>
and longitude <span property="geo:lon" datatype="xsd:float">12.3833</span>.</p>
</body>
</html>
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 40 http://lod2.eu
Vocabularies
 Schema layer of RDF
 Defines terms (classes and properties)
 Typically RDFS or OWL family
 Common vocabularies:
 Dublin Core, SKOS
 FOAF, SIOC, vCard
 DOAP
 Core Organization Ontology
 VoID
http://www.slideshare.net/prototypo/introduction-to-linked-data-rdf-vocabularies
SS2011 41
Vokabulare: Friend-of-a-Friend (FOAF)
defines classes and properties for representing
information about people and their
relationships
Soeren rdf:type foaf:Person .
Soeren currentProject http://OntoWiki.net .
Soeren foaf:homepage http://aksw.org/Soeren .
Soeren foaf:knows http://sembase.at/Tassilo .
Soeren foaf:sha1 09ac456515dee .
SS2011 42
Vokabulare: Semantically
Interlinked Online Communities.
Represent content from Blogs, Wikis, Forums,
Mailinglists, Chats etc.
SS2011 43
Vokabulare: Simple Knowledge
Organization System (SKOS)
support the use of thesauri, classification schemes, subject
heading systems and taxonomies
SS2011
Instance data
Instances are associated with one or several classes:
Boddingtons rdf:type Ale .
Grafentrunk rdf:type Bock .
Hoegaarden rdf:type White .
Jever rdf:type Pilsner .
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 45 http://lod2.eu
The
Linked Open Data cloud
20082007
2008
2008
2008
2009
20092010
46
Linked Open Data cloud
Linked Open Data cloud
http://lod-cloud.net/
Media
Government
Geo
Publications
User-generated
Life sciences
Cross-domain
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 48 http://lod2.eu
LOD cloud stats
triples distribution
links distribution
http://lod-cloud.net/state/
TimBL’s 5-star plan for open data
★ Make your data available on the
Web under an open license
★★ Make it available as structured
data
(Excel sheet instead of image scan of a table)
★★★ Use a non-proprietary format
(CSV file instead of an Excel sheet)
★★★★ Use Linked Data format
(URIs to identify things, RDF to represent data)
★★★★★ Link your data to other
people’s data to provide
context
More: http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/
Why going for the 5th star?
Central Contractor Registration (CCR)
Geonames
http://webofdata.wordpress.com/2011/05/22/why-we-link/
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 51 http://lod2.eu
Effort distribution
Third
Party
Effort
Consumer‘s
Effort
Publisher‘s
Effort
Fix
Overall Data
Integration
Effort
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 52 http://lod2.eu
Datasets
A dataset is a set of RDF triples that are published,
maintained or aggregated by a single provider
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 53 http://lod2.eu
Linksets
 An RDF link is an RDF triple whose subject and object
are described in different datasets
 A linkset is a collection of such RDF links between two
datasets
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 54 http://lod2.eu
Describing Datasets - VoID
 General dataset metadata
 Access metadata
 Structural metadata
 Describing linksets
 Deployment and discovery of voiD files
http://www.w3.org/TR/void/
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 55 http://lod2.eu
General dataset metadata
 Dataset homepage
 Publisher
 Title and description
 Categorisation
 Licensing
 Technical features
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 56 http://lod2.eu
General dataset metadata
:DBpedia a void:Dataset ;
dcterms:title "DBpedia” ;
dcterms:description "RDF data extracted from Wikipedia” ;
dcterms:contributor :FU_Berlin ;
dcterms:contributor :Uni_Leipzig ;
dcterms:contributor :Openlink ;
dcterms:source <http://dbpedia.org/resource/Wikipedia> ;
void:feature <http://www.w3.org/ns/formats/RDF_XML> ;
dcterms:modified "2008-11-17"^^xsd:date .
:Geonames a void:Dataset ;
dcterms:subject <http://dbpedia.org/resource/Location> .
:GeoSpecies a void:Dataset ;
dcterms:license <http://creativecommons.org/licenses/by-sa/3.0/us/> .
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 57 http://lod2.eu
Access metadata
 SPARQL endpoints
 RDF data dumps
 Root resources
 URI lookup endpoints
 OpenSearch description documents
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 58 http://lod2.eu
Access metadata
:exampleDS void:Dataset ;
void:sparqlEndpoint <http://example.org/sparql> ;
void:dataDump <http://example.org/dump1.rdf> ;
void:uriLookupEndpoint <http://api.example.org/search?qt=term> .
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 59 http://lod2.eu
Structural metadata
 Provides high-level information about the schema and
internal structure of a dataset and can be helpful when
exploring or querying datasets:
 Example resources
 Patterns for resource URIs
 Vocabularies
 Dataset partitions
 Statistics
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 60 http://lod2.eu
Structural metadata
:DBpedia a void:Dataset;
void:exampleResource <http://dbpedia.org/resource/Berlin> .
:LiveJournal a void:Dataset;
void:vocabulary <http://xmlns.com/foaf/0.1/> .
:DBpedia a void:Dataset;
void:classPartition [
void:class foaf:Person;
void:entities 312000;
];
void:propertyPartition [
void:property foaf:name;
void:triples 312000;
];
.
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 61 http://lod2.eu
Describing linksets
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 62 http://lod2.eu
Describing linksets
:DBpedia a void:Dataset ;
void:subset :DBpedia2Geonames .
:Geonames a void:Dataset .
:DBpedia2Geonames a void:Linkset ;
void:target :DBpedia ;
void:target :Geonames ;
void:linkPredicate owl:sameAs .
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 63 http://lod2.eu
Deployment and discovery
 Choosing URIs for datasets
 Publishing a VoID file alongside a dataset
 Turtle
 RDFa
 SPARQL Service Description Vocabulary
http://www.w3.org/TR/sparql11-service-description/
 Discovery (well-known URI), based on of RFC5758],
registered with IANA
http://www.example.com/.well-known/void
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 64 http://lod2.eu
Consumption - Essentials
 Linked Data provides for a global data-space with a
uniform API (due to RDF as the data model)
 Access methods
 Dereference URIs via HTTP GET (RDF/XML, RDFa, etc.)
 SPARQL (‘the SQL of RDF’)
 Data dumps (RDF/XML, etc.)
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 65 http://lod2.eu
Consumption - Technologies
 Linked Data access mechanisms widely supported
 all major platforms and languages (HTTP interface & RDF
parsing), such as Java, Python, PHP, C/C++/.NET, etc.
 Command line tools (curl, rapper, etc.)
 Online tools
– http://redbot.org/ (HTTP/low-level)
– http://sindice.com/developers/inspector (RDF/data-level)
 Structured query: SPARQL (more later)
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 66 http://lod2.eu
Consumption - Technologies
 Distributed setup  need for central point of access
(indexer, aggregator)
 Sindice, an index of the Web of Data
 http://sindice.com/
 Sig.ma, Web of Data aggregator & browser
 http://sig.ma/
 Relationship discovery
 http://relfinder.semanticweb.org/
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 67 http://lod2.eu
Technologies – FYN
http://dbpedia.org/resource/Galway
67
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 68 http://lod2.eu
Technologies – Sig.ma
http://sig.ma/search?q=Galway
Sig.ma is a Web of
Data platform
enabling entity
visualisation and
consolidation both for
humans and
machines (API)
68
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 69 http://lod2.eu
Technologies – sameas.org
Sameas.org is a
service to find co-
references on the
Web of Data
http://sameas.org/html?q=Galway
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 70 http://lod2.eu
• All Linked Data datasets share a uniform data model,
the RDF statement data model
• Information is represented in facts expressed as
(subject, predicate, object) triples
• Components: globally unique IRI/URI entity identifiers
& typed data values (literals) as objects
Linked Data Benefits: Uniformity
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 71 http://lod2.eu
• URIs not just used for identifying entities, but also (as
URLs) for locating and retrieving resources that
describe these entities on the Web
Linked Data Benefits: De-referencability
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 72 http://lod2.eu
• triples containing URIs from different namespaces as
subject and object, establish a link between (the entity
identified by the) subject with (the entity identified by the) object
(typed RDF links)
Linked Data Benefits: Coherence
Berlin Germany
European Union
isCapitalOf
isMemberOfKnowledge base 1
Knowledge
base 2
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 73 http://lod2.eu
• RDF data model, is based on a single mechanism for representing
information (triples) -> very easy to attain a syntactic and simple semantic
integration of different Linked Data sets.
• higher level semantic integration can be achieved by employing schema and
instance matching techniques and expressing found matches again as
additional triple facts
Linked Data Benefits: Integrateability
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 74 http://lod2.eu
• Publishing and updating Linked Data is relatively simple
thus facilitating a timely availability
• once a Linked Data source is updated it is straightforward to
access and use the updated data source (time consuming
and error prune extraction, transformation and loading not
required)
Linked Data Benefits: Timeliness
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 75 http://lod2.eu
1. The Vision & Big Picture
2. Linked Data 101
3. The Linked Data Life-cycle
Agenda
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 76 http://lod2.eu
Achievements
1. Extension of the Web with a
data commons (25B facts
2. vibrant, global RTD
community
3. Industrial uptake begins (e.g.
BBC, Thomson Reuters, Eli
Lilly)
4. Emerging governmental
adoption in sight
5. Establishing Linked Data as a
deployment path for the
Semantic Web.
What works now? What has to be done?
 Challenges
1. Coherence: Relatively few,
expensively maintained links
2. Quality: partly low quality data
and inconsistencies
3. Performance: Still substantial
penalties compared to relational
4. Data consumption: large-scale
processing, schema mapping
and data fusion still in its infancy
5. Usability: Establishing direct
end-user tools and network
effect
• Web - a global, distributed platform for data, information and knowledge integration
• exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web
using URIs and RDF
July 2007 April 2008 September 2008
July 2009
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 77 http://lod2.eu
Inter-
linking/
Fusing
Classifi-
cation/
Enrichment
Quality
Analysis
Evolution /
Repair
Search/
Browsing/
Exploration
Extraction
Storage/
Querying
Manual
revision/
authoring
Linked Data
Lifecycle
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 78 http://lod2.eu
Extraction
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 79 http://lod2.eu
From unstructured sources
• NLP, text mining, annotation
From semi-structured sources
• DBpedia, LinkedGeoData, SCOVO/DataCube
From structured sources
• RDB2RDF
Extraction
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 80 http://lod2.eu
extract structured information from Wikipedia
& make this information available on the Web as LOD:
• ask sophisticated queries against Wikipedia (e.g.
universities in brandenburg, mayors of elevated towns, soccer
players),
• link other data sets on the Web to Wikipedia data
• Represents a community consensus
Recently launched DBpedia Live transforms Wikipedia
into a structured knowledge base
Transforming Wikipedia into an
Knowledge Base
Structure in Wikipedia
• Title
• Abstract
• Infoboxes
• Geo-coordinates
• Categories
• Images
• Links
– other language versions
– other Wikipedia pages
– To the Web
– Redirects
– Disambiguations
Infobox templates
{{Infobox Korean settlement
| title = Busan Metropolitan City
| img = Busan.jpg
| imgcaption = A view of the [[Geumjeong]] district in Busan
| hangul = 부산 광역시
...
| area_km2 = 763.46
| pop = 3635389
| popyear = 2006
| mayor = Hur Nam-sik
| divs = 15 wards (Gu), 1 county (Gun)
| region = [[Yeongnam]]
| dialect = [[Gyeongsang]]
}}
http://dbpedia.org/resource/Busan
dbp:Busan dbpp:title ″Busan Metropolitan City″
dbp:Busan dbpp:hangul ″부산 광역시″@Hang
dbp:Busan dbpp:area_km2 ″763.46“^xsd:float
dbp:Busan dbpp:pop ″3635389“^xsd:int
dbp:Busan dbpp:region dbp:Yeongnam
dbp:Busan dbpp:dialect dbp:Gyeongsang
...
Wikitext-Syntax
RDF representation
A vast multi-lingual, multi-domain
knowledge base
DBpedia extraction results in:
• descriptions of ca. 3.4 million things (1.5 million classified in a consistent
ontology, including 312,000 persons, 413,000 places, 94,000 music albums,
49,000 films, 15,000 video games, 140,000 organizations, 146,000
species, 4,600 diseases
• labels and abstracts for these 3.2 million things in up to 92 different languages;
1,460,000 links to images and 5,543,000 links to external web pages;
4,887,000 external links into other RDF datasets, 565,000 Wikipedia categories,
and 75,000 YAGO categories
• altogether over 1 billion pieces of information (i.e. RDF triples): 257M from
English edition, 766M from other language editions
• DBpedia Live (http://live.dbpedia.org/sparql/) &
Mappings Wiki (http://mappings.dbpedia.org)
integrate the community into a refinement cycle
• Upcomming DBpedia inline
2011/05/12 CONSEGI - Sören Auer: DBpedia 84
DBpedia Architecture
Extraction Job
Extraction Manager
PageCollections
Destinations
N-Triple
Dumps
Wikipedia
Dumps
Wikipedia
OAI-PMH
Database
Wikipedia
Live
Wikipedia
N-Triple
Serializer
SPARQL-
Update
Destination
Extractors
Generic Infobox
Label
Geo
Redirect Disambiguation
Image
Abstract Pagelink
Parsers
DateTime Units
Ontology-
Mappings
Mapping-based Infobox
String-List Numbers
Geo
SPARQL
endpoint
Linked
Data
The Web
RDF browser
HTML browserSPARQL clients
DBpedia apps
Triple Store
Virtuoso
Update
Stream
Article-
Queue
Wikipedia
Category
2011/05/12 CONSEGI - Sören Auer: DBpedia 85
Hierarchies
DBpedia Ontology Schema:
manually created for DBpedia (infoboxes)
275 classes + 1335 properties; 20mio triples
YAGO:
large hierarchy linking Wikipedia leaf categories to WordNet
250,000 classes
UMBEL (Upper Mapping and Binding Exchange Layer):
20000 classes derived from OpenCyc
Wikipedia Categories:
Not a class hierarchy (e.g. cycles), represented using SKOS
415,000+ categories
2011/05/12 CONSEGI - Sören Auer: DBpedia 86
DBpedia SPARQL Endpoint
http://dbpedia.org/sparql
hosted on a OpenLink Virtuoso server
can answer SPARQL queries like
Give me all Sitcoms that are set in NYC?
All tennis players from Moscow?
All films by Quentin Tarentino?
All German musicians that were born in Berlin in the 19th
century?
All soccer players with tricot number 11, playing for a club
having a stadium with over 40,000 seats and is born in a
country with over 10 million inhabitants?
2011/05/12 CONSEGI - Sören Auer: DBpedia 87
DBpedia SPARQL Endpoint
SELECT ?name ?birth ?description ?person WHERE {
?person dbp:birthPlace dbp:Berlin .
?person skos:subject dbp:Cat:German_musicians .
?person dbp:birth ?birth .
?person foaf:name ?name .
?person rdfs:comment ?description .
FILTER (LANG(?description) = 'en') .
} ORDER BY ?name
2011/05/12 CONSEGI - Sören Auer: DBpedia 88
DBpedia Applications
DBpedia Mobile: location aware mobile client for DBpedia
Uses current location and DBpedia to display map
Can navigate into other knowledge bases
DBpedia Query Builder: user front end for building
queries
DBpedia Relationship Finder finds relation between two
objects in DBpedia
2011/05/12 CONSEGI - Sören Auer: DBpedia 89
DBpedia Applications
2011/05/12 CONSEGI - Sören Auer: DBpedia 90
DBpedia Applications: Relfinder
http://www.visualdataweb.org/relfinder.php
2011/05/12 CONSEGI - Sören Auer: DBpedia 91
DBpedia Applications: Zemanta
2011/05/12 CONSEGI - Sören Auer: DBpedia 92
DBpedia Applications: Faceted-Browser
2011/05/12 CONSEGI - Sören Auer: DBpedia 93
DBpedia Applications (3rd party)
Muddy Boots (BBC): Annotate actors in BBC News
with DBpedia identifiers
Open Calais (Reuters): named entity recognition;
entities are connected via owl:sameAs to DBpedia,
Freebase, Geonames
Faviki: Social Bookmarking Tool uses DBpedia in
backend to group tags etc. and multi-language
support
Topbraid Composer: ontology editor, which links
entities to DBpedia based on their labels
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann,
Slide 94
LinkedGeoData
Conversion, interlinking and publishing of
OpenStreetMap.org* data sets as RDF.
* ”Wikipedia for geographic data”
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann,
Slide 95
Motivation
● Ease information integration tasks that require spatial
knowledge, such as
● Offerings of bakeries next door
● Map of distributed branches of a company
● Historical sights along a bicycle track
● Therefore use RDF/OWL in order overcome structural and semantic
heterogeneity.
● Requires a vocabulary – which we try to establish.
● LOD cloud contains data sets with spatial features
● e.g. Geonames, DBpedia, US census, EuroStat
● But: they are restricted to popular or large entities like countries, famous
places etc.
● Therefore they lack buildings, roads, mailboxes, etc.
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann,
Slide 96
OpenStreetMap - Datamodel
● Basic entities are:
● Nodes Latitude, Longitude
● Ways Sequence of nodes
● Relations Associations between any number of nodes, ways and relations.
● Each entity may be described with tags (= key-value
pairs)
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann,
Slide 97
Example: Leipzig's zoo
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann,
Slide 98
Data/Mapping Example
node_id | k | v
-----------+------------------+---------------------
259212302 | name | Universität Leipzig,
Mathematik und Informatik
259212302 | amenity | university
259212302 | addr:street | Johannisgasse
259212302 | addr:postcode | 04103
259212302 | addr:housenumber | 26
259212302 | addr:city | Leipzig
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann,
Slide 99
Data/Mapping Example
node_id | k | v
-----------+------------------+---------------------
259212302 | name | Universität Leipzig,
Mathematik und Informatik
259212302 | amenity | university
259212302 | addr:street | Johannisgasse
259212302 | addr:postcode | 04103
259212302 | addr:housenumber | 26
259212302 | addr:city | Leipzig
lgd:node259212302
a lgdo:University ;
rdfs:label "Universität Leipzig,
Mathematik und
Informatik" ;
lgdo:hasCity "Leipzig" ;
lgdo:hasHouseNumber "26" ;
lgdo:hasPostalCode "04103" ;
lgdo:hasStreet "Johannisgasse" ;
georss:point "51.3369334 12.385401" ;
geo:lat 51.3369334 ;
geo:long 12.385401 .
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann,
Slide 100
Mapping Types
● Three Mapping Types
● Text
– (5, name, Leipzig) → lgd:node5 rdfs:label ”Leipzig”
– (5, name:de, Leipzig) → lgd:node5 rdfs:label ”Leipzig”@de
● Datatypes
– (6, seats, 4) → lgd:node6 lgdo:seats ”4”^^xsd:integer
● Classes/Object Properties
– (7, place, city) → lgdn:7 a lgdo:City
– (7, religion, pastafarian) → lgdn:7 lgdo:religion lgdo:Pastafarian
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann,
Slide 101
Access
● Rest Interface (based on Postgis DB, full osm dataset loaded, > 1billion triples)
● Supports limited queries (e.g. circular/rectangular
area, filtering by labels)
● Sparql Endpoints (based on Virtuoso DB, subset of osm dataset, ~222M triples)
● Static (http://linkedgeodata.org/sparql)
● Live (http://live.linkedgeodata.org/sparql)
● Downloads (http://downloads.linkedgeodata.org)
● Monthly updates on the above datasets envisioned
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann,
Slide 102
LinkedGeoData Live
● OpenStreetMap provides full dumps and minutely
changesets for download
● Changesets are numbered, e.g. ”001/234/567.osc.gz”
● We also convert the changesets to sets of added and
removed triples (relative to our store) and publish
them
● 001/234/567.added.nt.gz
● 001/234/567.removed.nt.gz
● Advantage: Other users could easily sync their RDF
store with LinkedGeoData
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann,
Slide 103
DBpedia Mapping – Step By Step
Given a DBpedia point, query LGD points within type specific
maximum distance
Basic idea (performed with Silk):
● Compute spatial score
● Compute name similarity (rdfs:label)
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann,
Slide 104
DBpedia Mapping – Step By Step
Given a DBpedia point, query LGD points within type specific
maximum distance
Basic idea (performed with Silk):
● Compute spatial score
● Compute name similarity (rdfs:label)
● Combine both scores
● Depending on final score, either
automatically accept/reject links or mark
for manual verification.
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann,
Slide 105
Statistics (2011-Feb-23)
● 222.539.712 Triples
● 6.666.865 Ways
● 5.882.306 Nodes
● Among them
● 352.673 PlaceOfWorship
● 60.573 RailwayStation
● 59.468 Recycling
● 50.955 Town
● 30.099 Toilet
● 7.222 City
Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann,
Slide 106
Conclusion
● OpenStreetMap
● immensely successful project for collaboratively creating free spatial data
● Community uses key value structures, which provide a rich source of information
● Key strength: broad coverage
● LGD Contributions
● Established mapping to Dbpedia
● Geonames mapping partially done (37 different entity types cities, churches, ...)
● Facet-based LGD Browser provides an interface for OSM/LGD, which highlights
its structural aspects
● Live sync
● Goal: Make LGD as useful (succesful) as DBpedia for the geospatial domain
Creating Knowledge
out of Interlinked Data
Sören Auer – SDDB: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 107 http://lod2.eu
Many different approaches (D2R, Virtuoso RDF
Views, Triplify, …)
No agreement on a formal
semantics of RDF2RDF
mapping
• LOD readiness,
SPARQL-SQL translation
W3C RDB2RDF WG
Extraction Relational Data
Tool Triplify D2RQ
Virtuoso RDF
Views
Technology
Scripting
languages
(PHP)
Java
Whole
middleware
solution
SPARQL
endpoint
- X X
Mapping
language
SQL RDF based RDF based
Mapping
generation
Manual
Semi-
automatic
Manual
Scalability
Medium-high
(but no
SPARQL)
Medium High
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 108 http://lod2.eu
From unstructured sources
• Deploy existing NLP approaches (OpenCalais, Ontos API)
• Develop standardized, LOD enabled interfaces between NLP tools
(NLP2RDF)
From semi-structured sources
• Efficient bi-directional synchronization
From structured sources
• Declarative syntax and semantics of data model transformations
(W3C WG RDB2RDF)
Orthogonal challenges
• Using LOD as background knowledge
• Provenance
Extraction Challenges
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 109 http://lod2.euStorage and Querying
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 110 http://lod2.eu
Still by a factor 5-50 slower than relational data management
(BSBM, DBpedia Benchmark)
Performance increases steadily
Comprehensive, well-supported open-soure and commercial
implementations are available:
• OpenLink’s Virtuoso (os+commercial)
• Big OWLIM (commercial), Swift OWLIM (os)
• 4store (os)
• Talis (hosted)
• Bigdata (distributed)
• Allegrograph (commercial)
• Mulgara (os)
RDF Data Management
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 111 http://lod2.eu
• Uses DBpedia as data and
a selection of 25 frequently
executed queries
• Can generate fractions and
multiples of DBpedia‘s size
• Does not resemble
relational data
Performance differences,
observed with other
benchmarks are amplified
DBpedia Benchmark
Geometric Mean
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 112 http://lod2.eu
• Reduce the performance gap between
relational and RDF data management
• SPARQL Query extensions
• Spatial/semantic/temporal data management
• More advanced query result caching
• View maintenance / adaptive reorganization
based on common access patterns
• More realistic benchmarks
Storage and Querying Challenges
Authoring
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 114 http://lod2.eu
1. Semantic (Text) Wikis
• Authoring of semantically
annotated texts
2. Semantic Data Wikis
• Direct authoring of
structured information
(i.e. RDF, RDF-Schema,
OWL)
Two Kinds of Semantic Wikis
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 115 http://lod2.eu
Versatile domain-independent tool
Serves as Linked Data / SPARQL endpoint on the Data
Web
Open-source project hosted at Google code
Not just a Wiki UI, but a whole framework for the
development of Semantic Web applications
Developed in PHP based on the Zend framework
Very active developer and user community
More than 500 downloads monthly
Large number of use cases
OntoWiki – a semantic data wiki
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 116 http://lod2.eu
OntoWiki Dynamic views on
knowledge bases
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 117 http://lod2.eu
OntoWiki
RDF triples on
resource details
page
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 118 http://lod2.eu
OntoWiki
Dynamische
Vorschläge aus dem
Daten Web
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 119 http://lod2.eu
Catalogus Professorum Lipsiensis
OntoWiki: Caucasian Spiders
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 121 http://lod2.eu
RDFauthor in OntoWiki
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 122 http://lod2.eu
Semantic Portal with OntoWiki: Vakantieland
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 123 http://lod2.eu
RDFaCE- RDFa Content Editor
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 124 http://lod2.eu
RDFaCE Architecture
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 125 http://lod2.eu
Integrating various NLP APIs
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 126 http://lod2.eu
© CC-BY-NC-ND by ~Dezz~ (residae on flickr)
Linking
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 127 http://lod2.eu
Automatic
Semi-automatic
• SILK
• LIMES
Manual
• Sindice integration into UIs
• Semantic Pingback
LOD Linking
LIMES 0.3: Basic Idea
 Uses the characteristics of metric
spaces
 Especially consequences of triangle
inequality
◦ d(x, y) < d(x, z) + d(z, y)
◦ d(x, z) - d(z, y) < d(x, y) < d(x, z) + d(z, y)
 Basic idea
◦ Use pessimistic approximations of
distances instead of computing them
◦ Only compute distances when needed
Overview
Computation
of exemplars
Filtering
Similarity
computation
Serialization
Knowledge
sources
Computation of Exemplars
 Assumption: number of exemplars is
given
 Goal: Segment target data set
Computation of Exemplars
Computation of Exemplars
Computation of Exemplars
Computation of Exemplars
Computation of Exemplars
NB: Distances from exemplars to all
other points are known
Filtering
x y
z
1. Measure distance from each x to
each exemplar
Filtering
x y
z
2. Apply d(x, y) - d(y, z) > t  d(x, z) >
t
Similarity Computation
x y
z
d(x, y) - d(y, z) < t  Compute d(x, z)
Serialization
 Results are returned as RDF
 For example mapping DBpedia and Drugbank
@prefix drugbank: <http://www4.wiwiss.fu-
berlin.de/drugbank/resource/drugbank/> .
@prefix dbpedia: <http://dbpedia.org/ontology/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
dbpedia:Cefaclor owl:sameAs drugbank:DB00833 .
dbpedia:Clortermine owl:sameAs
drugbank:DB01527 .
dbpedia:Prednicarbate owl:sameAs
drugbank:DB01130 .
dbpedia:Linezolid owl:sameAs drugbank:DB00601
.
dbpedia:Valaciclovir owl:sameAs
Experiments
 Q1: What is the best number of
exemplars?
 Q2: What is the relation between the
similarity threshold q and the total
number of comparisons?
 Q3: Does the assignment of S and T
matter?
 Q4: How does LIMES compare to
SILK?
Q1 and Q2
 Experiments on synthetic data
 Knowledge bases of sizes 2000, 3000,
5000, 7500 and 10000
 Varied number of exemplars
 Varied thresholds
 Experiments were repeated 5 times
 Average results are presented
Q1 and Q2
0
20000000
40000000
60000000
80000000
100000000
120000000
0 50 100 150 200 250 300
0.75
0.8
0.85
0.9
0.95
Brute force
Q1 and Q2
 Q1
◦ Best number of exemplars depends on q
◦ For q > 0.9, best number lies around |T|1/2
 Q2
◦ As expected, number of comparisons
diminishes with growing q
Q3 (order of S and T)
 Experiments on synthetic data
 Knowledge bases of sizes 1000, 2000,
3000, …, 10000
 Number of exemplars was |T|1/2
 Experiments were repeated 5 times
 Average results are presented
Q3
TS 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
1000 0.20 0.37 0.53 0.69 0.88 1.04 1.14 1.40 1.58 1.67
2000 0.36 0.64 0.88 1.24 1.37 1.63 1.97 2.25 2.50 2.70
3000 0.51 0.86 1.17 1.57 2.00 2.09 2.69 2.91 3.35 3.58
4000 0.70 1.11 1.59 2.00 2.45 2.88 3.10 3.61 3.94 4.50
5000 0.85 1.36 1.87 2.28 2.81 3.39 3.91 4.20 4.84 5.54
6000 1.02 1.60 2.14 2.81 3.29 3.93 4.44 4.96 5.39 6.08
7000 1.22 1.86 2.58 3.15 3.66 4.35 5.11 5.69 6.44 6.62
8000 1.41 2.04 2.78 3.43 4.06 4.98 5.51 6.55 7.14 7.53
9000 1.63 2.36 2.99 3.85 4.72 5.44 6.25 6.88 7.59 8.20
10000 1.80 2.62 3.51 4.25 4.97 6.01 6.33 7.81 8.31 9.15
 Green = S first is more time-efficient
 Overall less than 5% difference
Q4 (comparison with SILK)
 3 Experiments on real data
◦ Drugs
◦ Diseases
◦ SimCities
 Number of exemplars was |T|1/2
 Comparison of runtime with SILK
 Experiments were repeated thrice
 Best runtimes are presented
Q4
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Drugbank SimCities Diseases
LIMES (0.95)
LIMES (0.90)
LIMES (0.85)
LIMES (0.80)
LIMES (0.75)
SILK
Q4
 We outperform SILK 2 by 1.5 orders of
magnitude
 The larger the data sources, the
higher our speedup (64 for SimCities)
Creating Knowledge
out of Interlinked Data
Sören Auer – SDDB: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 151 http://lod2.eu
update and notification services for LOD
Downward compatible with Pingback (blogosphere)
http://aksw.org/Projects/SemanticPingBack
Creating a network effect around
Linking Data: Semantic Pingback
Creating Knowledge
out of Interlinked Data
Sören Auer – SDDB: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 152 http://lod2.eu
Visualizing Pingbacks in OntoWiki
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 153 http://lod2.eu
Only 5% of the information on the Data Web is actually linked
• Make sense of work in the de-duplication/record linkage
literature
• Consider the open world nature of Linked Data
• Use LOD background knowledge
• Zero-configuration linking
• Explore active learning approaches, which integrate users in a
feedback loop
• Maintain a 24/7 linking service: Linked Open Data Around-The-
Clock project (LATC-project.eu)
Interlinking Challenges
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 154 http://lod2.eu
Enrichment
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 155 http://lod2.eu
Linked Data is mainly instance data and !!!
ORE (Ontology Repair and Enrichment) tool allows to improve an
OWL ontology by fixing inconsistencies & making suggestions for
adding further axioms.
• Ontology Debugging: OWL reasoning to detect inconsistencies and
satisfiable classes + detect the most likely sources for the problems.
user can create a repair plan, while maintaining full control.
• Ontology Enrichment: uses the DL-Learner framework to suggest
definitions & super classes for existing classes in the KB. works if
instance data is available for harmonising schema and data.
http://aksw.org/Projects/ORE
Enrichment & Repair
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 156 http://lod2.euAnalysis
Quality
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 157 http://lod2.eu
Quality on the Data Web is varying a lot
• Hand crafted or expensively curated knowledge
base (e.g. DBLP, UMLS) vs. extracted from text
or Web 2.0 sources (DBpedia)
Research Challenge
• Establish measures for assessing the authority,
provenance, reliability of Data Web resources
Linked Data Quality Analysis
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 158 http://lod2.eu
Evolution © CC-BY-SA by alasis on flickr)
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 159 http://lod2.eu
• unified method, for both data evolution and ontology refactoring.
• modularized, declarative definition of evolution patterns is relatively
simple compared to an imperative description of evolution
• allows domain experts and knowledge engineers to amend the ontology
structure and modify data with just a few clicks
• Combined with RDF representation of evolution patterns and their
exposure on the Linked Data Web, EvoPat facilitates the development
of an evolution pattern ecosystem
• patterns can be shared and reused on the Data Web.
• declarative definition of bad smells and corresponding evolution
patterns promotes the (semi-)automatic improvement of information
quality.
EvoPat – Pattern based KB Evolution
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 160 http://lod2.eu
Evolution Patterns
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 162 http://lod2.eu
Exploration
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 163 http://lod2.eu
An ecosystem of LOD visualizations
LODExploration
Widgets
Spatial faceted-
browsing
Faceted-
browsing
Statistical
visualization
Entity-/faceted-
Based browsing
Domain specific
visualizations … …
LODDatasetsChoreography
layer
• Dataset analysis (size, vocabularies, property histograms etc.)
• Selection of suitable visualization widgets
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 164 http://lod2.eu
TODO: Put ULEI slides
Faceted spatial-semantic browsing
component
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 165 http://lod2.eu
Pure JavaScript, requires only SPARQL Endpoint for data access, Cross-Origin Resource
Sharing (CORS) enabled.
operates on local spatial regions, doed not depend on global meta-data about the data
Source code:
• https://github.com/AKSW/SpatialSemanticBrowsingWidgets
Online Demo - LinkedGeoData Browser:
• http://browser.linkedgeodata.org
Next steps
• Polygone/curve markers, domain specific visualization templates, integration of other
sources, mobile interface
Publication:
• Claus Stadler, Jens Lehmann, Konrad Höffner, Sören Auer: LinkedGeoData: A Core for a
Web of Spatial Open Data. To appear in Semantic Web Journal - Special Issue on Linked
Spatiotemporal Data and Geo-Ontologies.
Faceted spatial-semantic browsing - Availability
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 166 http://lod2.eu
Generic entity-based exploration with OntoWiki
http://fintrans.publicdata.eu
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 167 http://lod2.eu
Domain-specific visualization:
http://energy.publicdata.eu
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 168 http://lod2.eu
Visualization of statistic
data (datacube vocab.)
http://scoreboard.lod2.eu
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 170 http://lod2.eu
13.08.2016 Sören Auer - The emerging 170
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 171 http://lod2.eu
13.08.2016 Sören Auer - The emerging 171
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 173 http://lod2.eu
Visual Query Builder
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 174 http://lod2.eu
Relationship Finder in CPL
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 175 http://lod2.eu
Distributed Social Semantic Networking
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 176 http://lod2.eu
Social Networks are walled gardens
• Take users' data out of their hands,
• predefined privacy & data security regulations
• infrastructure of a single provider (lock-in)
• Facebook (600M+ users) = Web inside the Web
• Interoperability is limited to proprietary APIs
Social networks should be open and evolving
• allow users to control what to enter & keep control over their data
• users should be able to host the data on infrastructure, which is under
their direct control, the same way as they host their own website (TBL)
We need a truly Distributed Social Semantic Network (DSSN)
• Initial approaches appeared with GNU social and more recently Diaspora
• a DSSN should be based on semantic resource descriptions and de-referenceability
so as to ensure versatility, reusability and openness in order to accommodate unforeseen usage scenarios
• a number of standards and best-practices for social, Semantic Web applications such as FOAF, WebID and
Semantic Pingback emerged.
Distributed
Social
Semantic
Networking
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 177 http://lod2.eu
(1) Resources announce services and feeds, feeds announce services – in particular a push service.
(2) Applications initiate ping requests to spin the Linked Data network
(3) Applications subscribe to feeds on push services and receive instant notifications on updates.
(4) Update services are able to modify resources and feeds (e.g. on request of an application)
(5) Personal and global search services index social network resources and are used by applications
(6) Access to resources & services can be delegated to applications by a WebID, i.e. application can act in name of WebID owner
(7) The majority of all access operations is executed through standard web requests.
DSSN Architecture
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 178 http://lod2.eu
• Open-source, MVC architecture
• Plattform independent, based on HTML5, CSS,
Javascript
• jQuery, jQuery Mobile, jQuery UI
• rdfQuery – simple triple store in Javascript
• PhoneGap (Apache Device ready) native apps for
iOS, Android, Blackberry OS, WebOS, Symbian,
Bada
• http://aksw.org/Projects/MobileSocialSemanticWeb
DSSN Mobile Client
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 179 http://lod2.eu
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 180 http://lod2.eu
DSSN Mobile Browsing
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 181 http://lod2.eu
DSSN Mobile Editing
EU-FP7 LOD2 Project Overview . Page 182 http://lod2.eu
Creating Knowledge out of Interlinked Data
Inter-
linking/
Fusing
Classifi-
cation/
Enrichmen
t
Quality
Analysis
Evolution /
Repair
Search/
Browsing/
Exploratio
n
Extractio
n
Storage/
Querying
Manual
revision/
authoring
LOD Lifecycle
supported by
Debian based
LOD2 Stack
(released next week)
EU-FP7 LOD2 Project Overview . Page 183 http://lod2.eu
Creating Knowledge out of Interlinked Data
First release of the LOD2 Stack: stack.lod2.eu & demo.lod2.eu/lod2de
EU-FP7 LOD2 Project Overview . Page 184 http://lod2.eu
Creating Knowledge out of Interlinked Data
EU-FP7 LOD2 Project Overview . Page 185 http://lod2.eu
Creating Knowledge out of Interlinked Data
AKSW Team
Creating Knowledge
out of Interlinked Data
Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 186 http://lod2.eu
Thanks for your attention!
Sören Auer
http://www.uni-leipzig.de/~auer/ | http://aksw.org | http://lod2.org
auer@uni-leipzig.de

Más contenido relacionado

La actualidad más candente

HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Jena – A Semantic Web Framework for Java
Jena – A Semantic Web Framework for JavaJena – A Semantic Web Framework for Java
Jena – A Semantic Web Framework for JavaAleksander Pohl
 
Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic WebLuigi De Russis
 
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemLeveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemSemantic Web Company
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
RDF 개념 및 구문 소개
RDF 개념 및 구문 소개RDF 개념 및 구문 소개
RDF 개념 및 구문 소개Dongbum Kim
 
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOLinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOChris Mungall
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)Dan Brickley
 
Spark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesSpark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesDatabricks
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Simplilearn
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component rebeccatho
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFSNilesh Wagmare
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchNeo4j
 
Selecting Software for Taxonomy, Thesaurus and Ontology Management
Selecting Software for Taxonomy, Thesaurus and Ontology ManagementSelecting Software for Taxonomy, Thesaurus and Ontology Management
Selecting Software for Taxonomy, Thesaurus and Ontology ManagementHeather Hedden
 
Slides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsSlides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsDATAVERSITY
 
Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?Guido Schmutz
 

La actualidad más candente (20)

HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Jena – A Semantic Web Framework for Java
Jena – A Semantic Web Framework for JavaJena – A Semantic Web Framework for Java
Jena – A Semantic Web Framework for Java
 
Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic Web
 
RDF, linked data and semantic web
RDF, linked data and semantic webRDF, linked data and semantic web
RDF, linked data and semantic web
 
Extracting keywords from texts - Sanda Martincic Ipsic
Extracting keywords from texts - Sanda Martincic IpsicExtracting keywords from texts - Sanda Martincic Ipsic
Extracting keywords from texts - Sanda Martincic Ipsic
 
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemLeveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
RDF 개념 및 구문 소개
RDF 개념 및 구문 소개RDF 개념 및 구문 소개
RDF 개념 및 구문 소개
 
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODOLinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)
 
Spark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesSpark DataFrames and ML Pipelines
Spark DataFrames and ML Pipelines
 
1. Apache HIVE
1. Apache HIVE1. Apache HIVE
1. Apache HIVE
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFS
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
 
Selecting Software for Taxonomy, Thesaurus and Ontology Management
Selecting Software for Taxonomy, Thesaurus and Ontology ManagementSelecting Software for Taxonomy, Thesaurus and Ontology Management
Selecting Software for Taxonomy, Thesaurus and Ontology Management
 
Slides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsSlides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property Graphs
 
Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?
 

Similar a Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data TutorialSören Auer
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Anja Jentzsch
 
Linked data and semantic wikis
Linked data and semantic wikisLinked data and semantic wikis
Linked data and semantic wikisSören Auer
 
Soren Auer - LOD2 - creating knowledge out of Interlinked Data
Soren Auer - LOD2 - creating knowledge out of Interlinked DataSoren Auer - LOD2 - creating knowledge out of Interlinked Data
Soren Auer - LOD2 - creating knowledge out of Interlinked DataOpen City Foundation
 
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...Data Beers
 
Linked data for Enterprise Data Integration
Linked data for Enterprise Data IntegrationLinked data for Enterprise Data Integration
Linked data for Enterprise Data IntegrationSören Auer
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Jane Stevenson
 
Linked Data Overview - AGI Technical SIG
Linked Data Overview - AGI Technical SIGLinked Data Overview - AGI Technical SIG
Linked Data Overview - AGI Technical SIGChris Ewing
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us? Andrea Volpini
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked dataLaura Po
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and TechniquesBernhard Haslhofer
 

Similar a Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data (20)

Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
Linked data and semantic wikis
Linked data and semantic wikisLinked data and semantic wikis
Linked data and semantic wikis
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
 
Linking Open Data
Linking Open DataLinking Open Data
Linking Open Data
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
Soren Auer - LOD2 - creating knowledge out of Interlinked Data
Soren Auer - LOD2 - creating knowledge out of Interlinked DataSoren Auer - LOD2 - creating knowledge out of Interlinked Data
Soren Auer - LOD2 - creating knowledge out of Interlinked Data
 
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
 
Linked Data
Linked DataLinked Data
Linked Data
 
Linked Data
Linked DataLinked Data
Linked Data
 
Linked data for Enterprise Data Integration
Linked data for Enterprise Data IntegrationLinked data for Enterprise Data Integration
Linked data for Enterprise Data Integration
 
Web Topics
Web TopicsWeb Topics
Web Topics
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011
 
Linked Data Overview - AGI Technical SIG
Linked Data Overview - AGI Technical SIGLinked Data Overview - AGI Technical SIG
Linked Data Overview - AGI Technical SIG
 
Ld4 dh tutorial
Ld4 dh tutorialLd4 dh tutorial
Ld4 dh tutorial
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Metadata is back!
Metadata is back!Metadata is back!
Metadata is back!
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
 

Más de Sören Auer

Knowledge Graph Research and Innovation Challenges
Knowledge Graph Research and Innovation ChallengesKnowledge Graph Research and Innovation Challenges
Knowledge Graph Research and Innovation ChallengesSören Auer
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Sören Auer
 
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...Sören Auer
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphSören Auer
 
DBpedia - 10 year ISWC SWSA best paper award presentation
DBpedia  - 10 year ISWC SWSA best paper award presentationDBpedia  - 10 year ISWC SWSA best paper award presentation
DBpedia - 10 year ISWC SWSA best paper award presentationSören Auer
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphsSören Auer
 
Towards digitizing scholarly communication
Towards digitizing scholarly communicationTowards digitizing scholarly communication
Towards digitizing scholarly communicationSören Auer
 
Project overview big data europe
Project overview big data europeProject overview big data europe
Project overview big data europeSören Auer
 
LDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and DiscussionLDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and DiscussionSören Auer
 
What can linked data do for digital libraries
What can linked data do for digital librariesWhat can linked data do for digital libraries
What can linked data do for digital librariesSören Auer
 
Open data for smart cities
Open data for smart citiesOpen data for smart cities
Open data for smart citiesSören Auer
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedSören Auer
 
Проект Евросоюза LOD2 и Британский Институт Открытых данных
Проект Евросоюза LOD2 и Британский Институт Открытых данныхПроект Евросоюза LOD2 и Британский Институт Открытых данных
Проект Евросоюза LOD2 и Британский Институт Открытых данныхSören Auer
 
Das Semantische Daten Web für Unternehmen
Das Semantische Daten Web für UnternehmenDas Semantische Daten Web für Unternehmen
Das Semantische Daten Web für UnternehmenSören Auer
 
Creating knowledge out of interlinked data
Creating knowledge out of interlinked dataCreating knowledge out of interlinked data
Creating knowledge out of interlinked dataSören Auer
 
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeFrom Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeSören Auer
 
ESWC2010 "Linked Data: Now what?" Panel Discussion slides
ESWC2010 "Linked Data: Now what?" Panel Discussion slidesESWC2010 "Linked Data: Now what?" Panel Discussion slides
ESWC2010 "Linked Data: Now what?" Panel Discussion slidesSören Auer
 
LESS - Template-based Syndication and Presentation of Linked Data for End-users
LESS - Template-based Syndication and Presentation of Linked Data for End-usersLESS - Template-based Syndication and Presentation of Linked Data for End-users
LESS - Template-based Syndication and Presentation of Linked Data for End-usersSören Auer
 

Más de Sören Auer (20)

Knowledge Graph Research and Innovation Challenges
Knowledge Graph Research and Innovation ChallengesKnowledge Graph Research and Innovation Challenges
Knowledge Graph Research and Innovation Challenges
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...
 
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
 
Cognitive data
Cognitive dataCognitive data
Cognitive data
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge Graph
 
DBpedia - 10 year ISWC SWSA best paper award presentation
DBpedia  - 10 year ISWC SWSA best paper award presentationDBpedia  - 10 year ISWC SWSA best paper award presentation
DBpedia - 10 year ISWC SWSA best paper award presentation
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
 
Towards digitizing scholarly communication
Towards digitizing scholarly communicationTowards digitizing scholarly communication
Towards digitizing scholarly communication
 
Project overview big data europe
Project overview big data europeProject overview big data europe
Project overview big data europe
 
LDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and DiscussionLDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and Discussion
 
What can linked data do for digital libraries
What can linked data do for digital librariesWhat can linked data do for digital libraries
What can linked data do for digital libraries
 
Open data for smart cities
Open data for smart citiesOpen data for smart cities
Open data for smart cities
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
Проект Евросоюза LOD2 и Британский Институт Открытых данных
Проект Евросоюза LOD2 и Британский Институт Открытых данныхПроект Евросоюза LOD2 и Британский Институт Открытых данных
Проект Евросоюза LOD2 и Британский Институт Открытых данных
 
Das Semantische Daten Web für Unternehmen
Das Semantische Daten Web für UnternehmenDas Semantische Daten Web für Unternehmen
Das Semantische Daten Web für Unternehmen
 
Creating knowledge out of interlinked data
Creating knowledge out of interlinked dataCreating knowledge out of interlinked data
Creating knowledge out of interlinked data
 
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeFrom Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
 
ESWC2010 "Linked Data: Now what?" Panel Discussion slides
ESWC2010 "Linked Data: Now what?" Panel Discussion slidesESWC2010 "Linked Data: Now what?" Panel Discussion slides
ESWC2010 "Linked Data: Now what?" Panel Discussion slides
 
LESS - Template-based Syndication and Presentation of Linked Data for End-users
LESS - Template-based Syndication and Presentation of Linked Data for End-usersLESS - Template-based Syndication and Presentation of Linked Data for End-users
LESS - Template-based Syndication and Presentation of Linked Data for End-users
 

Último

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Último (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data

  • 1. DBpedia and the Emerging Web of Linked Data Sören Auer
  • 2. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 2 http://lod2.eu • 2000 Mathematics and Computer Science studies in Hagen, Dresden and Екатеринбург • Managing director of adVIS GmbH – SME focused on Web-Application and Content Management technology • IT consultant for various companies (T-Mobile AG, RDL Corp., Science Computing AG) • 2006 doctorate in Information Systems / Computer Science at Universität Leipzig • 2006-2008 post-doctoral researcher at the DB Group at University of Pennsylvania (USA) • Head of AKSW research group – DBpedia, OntoWiki, LinkedGeoData, Triplify • Research interests: Information Systems, Database and Web Technologies, Semantic Web and Knowledge Engineering, Adaptive Methodologies, HCI, E- Science, Digital Libraries • Coordinator of the EU FP7 IP Project “LOD2 – Creating Knowledge out of Interlinked Data” • Work as expert for W3C, EU FP6/FP7/CIP, University City Keystone Innovation Zone, Swiss National Science Foundation Dr. Sören Auer
  • 3. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 3 http://lod2.eu 1. The Vision & Big Picture 2. Linked Data 101 3. The Linked Data Life-cycle Agenda
  • 4. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 4 http://lod2.eu 1. Reasoning does not scale on the Web • IR / one dimensional indexing scales (Google) • Next step conjunctive querying (OWL-QL?, dynamic scale-out / clustering) • Web scalable DL reasoning is out-of-sight (maybe fragment, fuzzy reasoning has some chances) 2. If it would scale it would not be affordable • “What is the only former Yugoslav republic in the European Union?” • 2880 POWER7 cores, 16 Terabytes memory, 4 Terabytes clustered storage (IBM Watson) still can not answer this question Why the Semantic Web won‘t work (soon)
  • 5. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 5 http://lod2.eu Web server Web server Problem: Try to search for these things on the current Web: • Apartments near German-Russian bilingual childcare in Berlin. • ERP service providers with offices in Vienna and London. • Researchers working on multimedia topics in Eastern Europe. Information is available on the Web, but opaque to current search. Why do we need the Data Web? berlin.de Has everything about childcare in Berlin. Immobilienscout.de Knows all about real estate offers in GermanyDB Web server DB Web server Search engineHTML HTML RDF RDF Solution: complement text on Web pages with structured linked open data & intelligently combine/integrate such structured information from different sources:
  • 6. From the Document Web to the Semantic Data Web Web (since 1992) • HTTP • HTML/CSS/JavaScript Semantic Web (Vision 1998, starting ???) • Reasoning • Logic, Rules • Trust Social Web (since 2003) • Folksonomies/Tagging • Reputation, sharing • Groups, relationships Data Web (since 2006) • URI de-referencability • Web Data integration • RDF serializations
  • 7. Web 1.0 Web 2.0 Web 3.0 Many Web sites containing unstructured, textual content Few large Web sites are specialized on specific content types Many Web sites containing & semantically syndicating arbitrarily structured content Pictures Video Encyclopedic articles + +
  • 8. The Long Tail of Information Domains Pictures News Video Recipes Calendar Currently supported structured content types SemWeb supported structured content Gene sequences Itinerary of King George Talent management Popularity Not or insufficiently supported content types The Long Tail by Chris Anderson (Wired, Oct. ´04) adopted to information domains … … Requirements- Engineering … … Special interest communities
  • 9. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 9 http://lod2.eu 1. Uses RDF Data Model Linked Data in a Nutshell SBBD2011 Florianopolis 3.10.2011 SBC organizes starts takesPlaceIn 2. Is serialised in triples: SBC organizes SBBD2011 SBBD2011 starts “20111003”^^xsd:date SBBD2011 takesPlaceAt Florianopolis 3. Uses Content-negotiation
  • 10. The emerging Web of Data 20082007 2008 2008 2008 2009 2009 Virtouso SemMF SILK poolparty DL-Learner Sindice Sigma ORE OntoWiki MonetDB DXX Engine WiQA repair interlink fuse classify enrich create
  • 11. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 11 http://lod2.eu Conceptual Level Data Access and Integration Object-relational mappings (ORM) • NeXT’s EOF / WebObjects • ADO.NET Entity Framework • Hibernate Entity-attribute-value (EAV) • HELP medical record system, TrialDB Column-oriented DBMS • Collocates column values rather than row values • Vertica, C-Store, MonetDB Data Web • URIs as entity identifiers • HTTP as data access protocol • Local-As-View (LAV) RDBMS • Organize data in relations, rows, cells • Oracle, DB2, MS- SQL Triple/Quad Stores • RDF data model • Virtuoso, Oracle, Sesame DataModels Others • XML, hierachical, tree, graph-oriented DBMS Procedural APIs • ODBC • JDBC DataAccess Query Languages • Datalog, SQL • SPARQL • XPATH/XQuery DataIntegration Linked Data • de-referencable URIs • RDF serialization formats Enterprise Information Integration sets of heterogeneous data sources appear as a single, homogeneous data source Data Warehousing • Based on extract, transform load (ETL) • Global-As-View (GAV) Research Mediators Ontology-based P2P Web service-based
  • 12. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 12 http://lod2.eu 1. The Vision & Big Picture 2. Linked Data 101 (based on Michael Hausenblas‘ slides) 3. The Linked Data Life-cycle Agenda
  • 13. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 13 http://lod2.eu Orientation
  • 14. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 14 http://lod2.eu Linked Data 101 Linked Data provides a standardised API for:  Data and metadata discovery  Data integration  Distributed query
  • 15. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 15 http://lod2.eu Linked Data principles 1. Use URIs to identify the “things” in your data 2. Use http:// URIs so people (and machines) can look them up on the web 3. When a URI is looked up, return a description of the thing (in RDF format) 4. Include links to related things http://www.w3.org/DesignIssues/LinkedData.html
  • 16. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 16 http://lod2.eu Linked Data principles  They are principles, not implementation advices  Not humans or machines but humans and machines!  Content negotiation (e.g. HTML and RDF/XML)  HTML+ RDFa  Metcalfe’s Law http://en.wikipedia.org/wiki/Metcalfe%27s_law
  • 17. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 17 http://lod2.eu Linked Data example 17
  • 18. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 18 http://lod2.eu HTTP URIs  A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource. [RFC3986]  Syntax URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]  Example foo://example.com:8042/over/there?name=ferret#nose _/ _________________/_________/ __________/ __/ | | | | | scheme authority path query fragment
  • 19. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 19 http://lod2.eu HTTP URIs  URI references An RDF URI reference is a Unicode string does not contain any control characters (#x00 - #x1F, #x7F-#x9F) and would produce a valid URI character sequence representing an absolute URI when subjected to an UTF-8 encoding along with %-escaping non-US-ASCII octets.  Qualified Names (QNames) XML’s way to allow namespaced elements/attributes as of QName = Prefix ‘:‘ LocalPart  Compact URIs (CURIEs) Generic, abbreviated syntax for expressing URIs
  • 20. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 20 http://lod2.eu HTTP The Hypertext Transfer Protocol (HTTP) is an application- level protocol for distributed, collaborative, hypermedia information systems. It is a generic, stateless, protocol which can be used for many tasks beyond its use for hypertext, such as name servers and distributed object management systems, through extension of its request methods, error codes and headers. A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.
  • 21. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 21 http://lod2.eu HTTP  HTTP messages consist of requests from client to server and responses from server to client  Set of methods is predefined  GET  POST  PUT  DELETE  HEAD  (OPTIONS)
  • 22. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 22 http://lod2.eu HTTP Status codes  Informational 1xx, provisional response, (100 Continue)  Successful 2xx, request successfully received, understood, and accepted (201 Created)  Redirection 3xx, further action needs to be taken by user agent to fulfill the request (301 Moved Permanently)  Client Error 4xx, client erred (405 Method Not Allowed)  Server Error 5xx, server encountered an unexpected condition (501 Not Implemented)
  • 23. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 23 http://lod2.eu HTTP GET /html/rfc2616 HTTP/1.1 Host: tools.ietf.org User-Agent: Mozilla/5.0 Accept: text/html,application/xhtml+xml,application/xml ;q=0.9,*/*;q=0.8 HTTP/1.x 200 OK Date: Thu, 05 Mar 2009 08:17:33 GMT Server: Apache/2.2.11 Content-Location: rfc2616.html Last-Modified: Tue, 20 Jan 2009 09:16:04 GMT Content-Type: text/html; charset=UTF-8 REQUESTRESPONSE
  • 24. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 24 http://lod2.eu HTTP  Content Negotiation: selecting representation for a given response when multiple representations available  Three types of CN: server-driven, agent-driven CN, transparent CN  Example: curl -I -H "Accept: application/rdf+xml" http://dbpedia.org/resource/Galway HTTP/1.1 303 See Other Content-Type: application/rdf+xml Location: http://dbpedia.org/data/Galway.rdf
  • 25. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 25 http://lod2.eu HTTP  Caching (see Cache–Control header field) is essential for scalability http://webofdata.wordpress.com/2009/11/23/linked-open-data-http-caching/  HTTPbis IETF WG chaired by Mark Nottingham, mainly about: patches, clarifications, deprecate non-used features, documentation of security properties
  • 26. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 26 http://lod2.eu REST - HTTP Representational State Transfer (REST) resource intended conceptual target of a hypertext reference resource identifier URL, URN representation HTML document, JPEG image representation media type, last-modified time metadata resource source link, alternates, vary metadata control data if-modified-since, cache-control http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm http://webofdata.wordpress.com/2009/10/09/linked-data-for-restafarians/
  • 27. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 27 http://lod2.eu Web's Standard Retrieval Algorithm 1. parse URI and find HTTP protocol 2. look up DNS name to determine the associated IP address 3. open a TCP stream to port 80 at the IP address determined above 4. format an HTTP GET request for resource and sends that to the server 5. read response from the server 6. from the status code (200) determine that a representation of the resource is available 7. inspect the returned Content-Type 8. pass the entity-body to its HTML rendering engine http://www.w3.org/2001/tag/doc/selfDescribingDocuments
  • 28. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 28 http://lod2.eu RDF  A data model - directed, labeled graph  Triple: (subject predicate object)  subject … URIref or bNode  predicate … URIref  object … URIref or bNode or literal
  • 29. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 29 http://lod2.eu RDF Triple • • Inspired by linguistic categories • Allowed usage: Subject : URI or blank node Predicate: URI (also called properties) Object : URI or blank nodes or literal Burkhard Jung Leipzig isMayorOf Subject Predicate Object
  • 30. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 30 http://lod2.eu Example RDF Graph  0341Leipzig hasAreaCode Burkhard Jung hasMayor Saxony locatedIn 51.3333 latitude 12.3833 longitude Germany Social Democratic Party 1958-03-07 isMemberOf locatedIn born isMayorOf
  • 31. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 31 http://lod2.eu Literals • Representation of data values • Serialization as strings • Interpretation based on the datatype • Literals without Datatype are treated as strings Leipzig Burkhard Jung 51.3333latitude 12.3833 longitude 1958-03-07 born isMayorOf hasMayor
  • 32. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 32 http://lod2.eu RDF Serialization N3: "Notation 3" - extensive formalism N-Triples: part of N3 Turtle: Extension of N-Triples (shortcuts) Quelle:http://www.w3.org/DesignIssues/Notation3.html
  • 33. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 33 http://lod2.eu Turtle Syntax • URIs in angle brackets • Literals in quotes • Triples separated by dot • Whitespace is ignored 3
  • 34. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 34 http://lod2.eu Turtle Syntax: Shortcuts http://dbpedia.org/resource/Leipzig http://dbpedia.org/property/hasMayor http://dbpedia.org/resource/Burkhard_Jung ; http://www.w3.org/2000/01/rdf-schema#label "Leipzig"@de ; http://www.w3.org/2003/01/geo/wgs84_pos#lat "51.333332"^^xsd:float ; http://www.w3.org/2003/01/geo/wgs84_pos#lon "12.383333"^^xsd:float . Shortcuts for namespace prefixes: @prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#> . @prefix dbp:<http://dbpedia.org/resource/> . @prefix dbpp:<http://dbpedia.org/property/> . @prefix geo:<http://www.w3.org/2003/01/geo/wgs84_pos#> . dbp:Leipzig dbpp:hasMayor dbp:Burkhard_Jung . dbp:Leipzig rdfs:label "Leipzig"@de . dbp:Leipzig geo:lat "51.333332"^^xsd:float . dbp:Leipzig geo:lon "12.383333"^^xsd:float .
  • 35. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 35 http://lod2.eu Turtle Syntax: Shortcuts Group triples with same subject using “;” instead of “.”: @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs="http://www.w3.org/2000/01/rdf-schema#> . @prefix dbp="http://dbpedia.org/resource/> . @prefix dbpp="http://dbpedia.org/property/> . @prefix geo="http://www.w3.org/2003/01/geo/wgs84_pos#> . dbp:Leipzig dbpp:hasMayor dbp:Burkhard_Jung ; rdfs:label "Leipzig"@de ; geo:lat "51.333332"^^xsd:float ; geo:lon "12.383333"^^xsd:float . also Triple with same subject and predicate: @prefix dbp="http://dbpedia.org/resource/> . @prefix dbpp="http://dbpedia.org/property/> . dbp:Leipzig dbp:locatedIn dbp:Saxony, dbp:Germany; dbpp:hasMayor dbp:Burkhard_Jung .
  • 36. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 36 http://lod2.eu XML-Syntax von RDF • Turtle intuitively readable and machine processable • but: better tool support and programming libraries for XML <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dbpp="http://dbpedia.org/property/" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"> <rdf:Description rdf:about="http://dbpedia.org/resource/Leipzig"> <property:hasMayor rdf:resource="http://dbpedia.org/resource/Burkhard_Jung" /> <rdfs:label xml:lang="de">Leipzig</rdfs:label> <geo:lat rdf:datatype="float">51.3333</geo:lat> <geo:lon rdf:datatype="float">12.3833</geo:lon> </rdf:Description> </rdf:RDF>
  • 37. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 37 http://lod2.eu RDF/JSON • JSON = JavaScript Object Notation • Compact format for data exchange between applications • JSON documents are valid JavaScript • Programming language independent, since parser exist for all popular programming languages • Less overhead when parsing and serialising than XML { "S" : { "P" : [ O ] } } • Subject: URI, BNode • Predicate: URI • Object: Type: „URI“, „Literal“ or „bnode“ Value: data value Lang: language tag Datatype: URI of the datatype.
  • 38. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 38 http://lod2.eu JSON Example { "http://dbpedia.org/resource/Leipzig" : { "http://dbpedia.org/property/hasMayor": [ { "type":"uri", "value":"http://dbpedia.org/resource/Burkhard_Jung" } ], "http://www.w3.org/2000/01/rdf-schema#label": [ { "type":"literal", "value":"Leipzig", "lang":"en" } ] , "http://www.w3.org/2003/01/geo/wgs84_pos#lat": [ { "type":"literal", "value":"51.3333", "datatype":"http://www.w3.org/2001/XMLSchema#float" } ] "http://www.w3.org/2003/01/geo/wgs84_pos#lon": % [ { "type":"literal", "value":"12.3833", "datatype":"http://www.w3.org/2001/XMLSchema#float" } ] } }
  • 39. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 39 http://lod2.eu RDFa Syntax • RDFa = Resource Description Framework – in –attributes • Embedding RDF in XHTML • UTF-8 and UTF-16, since Extension of XML based XHTML • Due to embedding in HTML more overhead than other serialisations • Less readable <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html version="XHTML+RDFa 1.0" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dbpp="http://dbpedia.org/property/" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"> <head><title>Leipzig</title></head> <body about="http://dbpedia.org/resource/Leipzig"> <h1 property="rdfs:label" xml:lang="de">Leipzig</h1> <p>Leipzig is a city in Germany. Leipzig's mayor is <a href="Burkhard_Jung" rel="dbpp:hasMayor">Burkhard Jung</a>. It is located at latitude <span property="geo:lat" datatype="xsd:float">51.3333</span> and longitude <span property="geo:lon" datatype="xsd:float">12.3833</span>.</p> </body> </html>
  • 40. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 40 http://lod2.eu Vocabularies  Schema layer of RDF  Defines terms (classes and properties)  Typically RDFS or OWL family  Common vocabularies:  Dublin Core, SKOS  FOAF, SIOC, vCard  DOAP  Core Organization Ontology  VoID http://www.slideshare.net/prototypo/introduction-to-linked-data-rdf-vocabularies
  • 41. SS2011 41 Vokabulare: Friend-of-a-Friend (FOAF) defines classes and properties for representing information about people and their relationships Soeren rdf:type foaf:Person . Soeren currentProject http://OntoWiki.net . Soeren foaf:homepage http://aksw.org/Soeren . Soeren foaf:knows http://sembase.at/Tassilo . Soeren foaf:sha1 09ac456515dee .
  • 42. SS2011 42 Vokabulare: Semantically Interlinked Online Communities. Represent content from Blogs, Wikis, Forums, Mailinglists, Chats etc.
  • 43. SS2011 43 Vokabulare: Simple Knowledge Organization System (SKOS) support the use of thesauri, classification schemes, subject heading systems and taxonomies
  • 44. SS2011 Instance data Instances are associated with one or several classes: Boddingtons rdf:type Ale . Grafentrunk rdf:type Bock . Hoegaarden rdf:type White . Jever rdf:type Pilsner .
  • 45. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 45 http://lod2.eu The Linked Open Data cloud
  • 47. Linked Open Data cloud http://lod-cloud.net/ Media Government Geo Publications User-generated Life sciences Cross-domain
  • 48. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 48 http://lod2.eu LOD cloud stats triples distribution links distribution http://lod-cloud.net/state/
  • 49. TimBL’s 5-star plan for open data ★ Make your data available on the Web under an open license ★★ Make it available as structured data (Excel sheet instead of image scan of a table) ★★★ Use a non-proprietary format (CSV file instead of an Excel sheet) ★★★★ Use Linked Data format (URIs to identify things, RDF to represent data) ★★★★★ Link your data to other people’s data to provide context More: http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/
  • 50. Why going for the 5th star? Central Contractor Registration (CCR) Geonames http://webofdata.wordpress.com/2011/05/22/why-we-link/
  • 51. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 51 http://lod2.eu Effort distribution Third Party Effort Consumer‘s Effort Publisher‘s Effort Fix Overall Data Integration Effort
  • 52. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 52 http://lod2.eu Datasets A dataset is a set of RDF triples that are published, maintained or aggregated by a single provider
  • 53. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 53 http://lod2.eu Linksets  An RDF link is an RDF triple whose subject and object are described in different datasets  A linkset is a collection of such RDF links between two datasets
  • 54. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 54 http://lod2.eu Describing Datasets - VoID  General dataset metadata  Access metadata  Structural metadata  Describing linksets  Deployment and discovery of voiD files http://www.w3.org/TR/void/
  • 55. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 55 http://lod2.eu General dataset metadata  Dataset homepage  Publisher  Title and description  Categorisation  Licensing  Technical features
  • 56. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 56 http://lod2.eu General dataset metadata :DBpedia a void:Dataset ; dcterms:title "DBpedia” ; dcterms:description "RDF data extracted from Wikipedia” ; dcterms:contributor :FU_Berlin ; dcterms:contributor :Uni_Leipzig ; dcterms:contributor :Openlink ; dcterms:source <http://dbpedia.org/resource/Wikipedia> ; void:feature <http://www.w3.org/ns/formats/RDF_XML> ; dcterms:modified "2008-11-17"^^xsd:date . :Geonames a void:Dataset ; dcterms:subject <http://dbpedia.org/resource/Location> . :GeoSpecies a void:Dataset ; dcterms:license <http://creativecommons.org/licenses/by-sa/3.0/us/> .
  • 57. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 57 http://lod2.eu Access metadata  SPARQL endpoints  RDF data dumps  Root resources  URI lookup endpoints  OpenSearch description documents
  • 58. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 58 http://lod2.eu Access metadata :exampleDS void:Dataset ; void:sparqlEndpoint <http://example.org/sparql> ; void:dataDump <http://example.org/dump1.rdf> ; void:uriLookupEndpoint <http://api.example.org/search?qt=term> .
  • 59. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 59 http://lod2.eu Structural metadata  Provides high-level information about the schema and internal structure of a dataset and can be helpful when exploring or querying datasets:  Example resources  Patterns for resource URIs  Vocabularies  Dataset partitions  Statistics
  • 60. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 60 http://lod2.eu Structural metadata :DBpedia a void:Dataset; void:exampleResource <http://dbpedia.org/resource/Berlin> . :LiveJournal a void:Dataset; void:vocabulary <http://xmlns.com/foaf/0.1/> . :DBpedia a void:Dataset; void:classPartition [ void:class foaf:Person; void:entities 312000; ]; void:propertyPartition [ void:property foaf:name; void:triples 312000; ]; .
  • 61. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 61 http://lod2.eu Describing linksets
  • 62. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 62 http://lod2.eu Describing linksets :DBpedia a void:Dataset ; void:subset :DBpedia2Geonames . :Geonames a void:Dataset . :DBpedia2Geonames a void:Linkset ; void:target :DBpedia ; void:target :Geonames ; void:linkPredicate owl:sameAs .
  • 63. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 63 http://lod2.eu Deployment and discovery  Choosing URIs for datasets  Publishing a VoID file alongside a dataset  Turtle  RDFa  SPARQL Service Description Vocabulary http://www.w3.org/TR/sparql11-service-description/  Discovery (well-known URI), based on of RFC5758], registered with IANA http://www.example.com/.well-known/void
  • 64. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 64 http://lod2.eu Consumption - Essentials  Linked Data provides for a global data-space with a uniform API (due to RDF as the data model)  Access methods  Dereference URIs via HTTP GET (RDF/XML, RDFa, etc.)  SPARQL (‘the SQL of RDF’)  Data dumps (RDF/XML, etc.)
  • 65. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 65 http://lod2.eu Consumption - Technologies  Linked Data access mechanisms widely supported  all major platforms and languages (HTTP interface & RDF parsing), such as Java, Python, PHP, C/C++/.NET, etc.  Command line tools (curl, rapper, etc.)  Online tools – http://redbot.org/ (HTTP/low-level) – http://sindice.com/developers/inspector (RDF/data-level)  Structured query: SPARQL (more later)
  • 66. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 66 http://lod2.eu Consumption - Technologies  Distributed setup  need for central point of access (indexer, aggregator)  Sindice, an index of the Web of Data  http://sindice.com/  Sig.ma, Web of Data aggregator & browser  http://sig.ma/  Relationship discovery  http://relfinder.semanticweb.org/
  • 67. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 67 http://lod2.eu Technologies – FYN http://dbpedia.org/resource/Galway 67
  • 68. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 68 http://lod2.eu Technologies – Sig.ma http://sig.ma/search?q=Galway Sig.ma is a Web of Data platform enabling entity visualisation and consolidation both for humans and machines (API) 68
  • 69. Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 69 http://lod2.eu Technologies – sameas.org Sameas.org is a service to find co- references on the Web of Data http://sameas.org/html?q=Galway
  • 70. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 70 http://lod2.eu • All Linked Data datasets share a uniform data model, the RDF statement data model • Information is represented in facts expressed as (subject, predicate, object) triples • Components: globally unique IRI/URI entity identifiers & typed data values (literals) as objects Linked Data Benefits: Uniformity
  • 71. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 71 http://lod2.eu • URIs not just used for identifying entities, but also (as URLs) for locating and retrieving resources that describe these entities on the Web Linked Data Benefits: De-referencability
  • 72. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 72 http://lod2.eu • triples containing URIs from different namespaces as subject and object, establish a link between (the entity identified by the) subject with (the entity identified by the) object (typed RDF links) Linked Data Benefits: Coherence Berlin Germany European Union isCapitalOf isMemberOfKnowledge base 1 Knowledge base 2
  • 73. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 73 http://lod2.eu • RDF data model, is based on a single mechanism for representing information (triples) -> very easy to attain a syntactic and simple semantic integration of different Linked Data sets. • higher level semantic integration can be achieved by employing schema and instance matching techniques and expressing found matches again as additional triple facts Linked Data Benefits: Integrateability
  • 74. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 74 http://lod2.eu • Publishing and updating Linked Data is relatively simple thus facilitating a timely availability • once a Linked Data source is updated it is straightforward to access and use the updated data source (time consuming and error prune extraction, transformation and loading not required) Linked Data Benefits: Timeliness
  • 75. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 75 http://lod2.eu 1. The Vision & Big Picture 2. Linked Data 101 3. The Linked Data Life-cycle Agenda
  • 76. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 76 http://lod2.eu Achievements 1. Extension of the Web with a data commons (25B facts 2. vibrant, global RTD community 3. Industrial uptake begins (e.g. BBC, Thomson Reuters, Eli Lilly) 4. Emerging governmental adoption in sight 5. Establishing Linked Data as a deployment path for the Semantic Web. What works now? What has to be done?  Challenges 1. Coherence: Relatively few, expensively maintained links 2. Quality: partly low quality data and inconsistencies 3. Performance: Still substantial penalties compared to relational 4. Data consumption: large-scale processing, schema mapping and data fusion still in its infancy 5. Usability: Establishing direct end-user tools and network effect • Web - a global, distributed platform for data, information and knowledge integration • exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF July 2007 April 2008 September 2008 July 2009
  • 77. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 77 http://lod2.eu Inter- linking/ Fusing Classifi- cation/ Enrichment Quality Analysis Evolution / Repair Search/ Browsing/ Exploration Extraction Storage/ Querying Manual revision/ authoring Linked Data Lifecycle
  • 78. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 78 http://lod2.eu Extraction
  • 79. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 79 http://lod2.eu From unstructured sources • NLP, text mining, annotation From semi-structured sources • DBpedia, LinkedGeoData, SCOVO/DataCube From structured sources • RDB2RDF Extraction
  • 80. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 80 http://lod2.eu extract structured information from Wikipedia & make this information available on the Web as LOD: • ask sophisticated queries against Wikipedia (e.g. universities in brandenburg, mayors of elevated towns, soccer players), • link other data sets on the Web to Wikipedia data • Represents a community consensus Recently launched DBpedia Live transforms Wikipedia into a structured knowledge base Transforming Wikipedia into an Knowledge Base
  • 81. Structure in Wikipedia • Title • Abstract • Infoboxes • Geo-coordinates • Categories • Images • Links – other language versions – other Wikipedia pages – To the Web – Redirects – Disambiguations
  • 82. Infobox templates {{Infobox Korean settlement | title = Busan Metropolitan City | img = Busan.jpg | imgcaption = A view of the [[Geumjeong]] district in Busan | hangul = 부산 광역시 ... | area_km2 = 763.46 | pop = 3635389 | popyear = 2006 | mayor = Hur Nam-sik | divs = 15 wards (Gu), 1 county (Gun) | region = [[Yeongnam]] | dialect = [[Gyeongsang]] }} http://dbpedia.org/resource/Busan dbp:Busan dbpp:title ″Busan Metropolitan City″ dbp:Busan dbpp:hangul ″부산 광역시″@Hang dbp:Busan dbpp:area_km2 ″763.46“^xsd:float dbp:Busan dbpp:pop ″3635389“^xsd:int dbp:Busan dbpp:region dbp:Yeongnam dbp:Busan dbpp:dialect dbp:Gyeongsang ... Wikitext-Syntax RDF representation
  • 83. A vast multi-lingual, multi-domain knowledge base DBpedia extraction results in: • descriptions of ca. 3.4 million things (1.5 million classified in a consistent ontology, including 312,000 persons, 413,000 places, 94,000 music albums, 49,000 films, 15,000 video games, 140,000 organizations, 146,000 species, 4,600 diseases • labels and abstracts for these 3.2 million things in up to 92 different languages; 1,460,000 links to images and 5,543,000 links to external web pages; 4,887,000 external links into other RDF datasets, 565,000 Wikipedia categories, and 75,000 YAGO categories • altogether over 1 billion pieces of information (i.e. RDF triples): 257M from English edition, 766M from other language editions • DBpedia Live (http://live.dbpedia.org/sparql/) & Mappings Wiki (http://mappings.dbpedia.org) integrate the community into a refinement cycle • Upcomming DBpedia inline
  • 84. 2011/05/12 CONSEGI - Sören Auer: DBpedia 84 DBpedia Architecture Extraction Job Extraction Manager PageCollections Destinations N-Triple Dumps Wikipedia Dumps Wikipedia OAI-PMH Database Wikipedia Live Wikipedia N-Triple Serializer SPARQL- Update Destination Extractors Generic Infobox Label Geo Redirect Disambiguation Image Abstract Pagelink Parsers DateTime Units Ontology- Mappings Mapping-based Infobox String-List Numbers Geo SPARQL endpoint Linked Data The Web RDF browser HTML browserSPARQL clients DBpedia apps Triple Store Virtuoso Update Stream Article- Queue Wikipedia Category
  • 85. 2011/05/12 CONSEGI - Sören Auer: DBpedia 85 Hierarchies DBpedia Ontology Schema: manually created for DBpedia (infoboxes) 275 classes + 1335 properties; 20mio triples YAGO: large hierarchy linking Wikipedia leaf categories to WordNet 250,000 classes UMBEL (Upper Mapping and Binding Exchange Layer): 20000 classes derived from OpenCyc Wikipedia Categories: Not a class hierarchy (e.g. cycles), represented using SKOS 415,000+ categories
  • 86. 2011/05/12 CONSEGI - Sören Auer: DBpedia 86 DBpedia SPARQL Endpoint http://dbpedia.org/sparql hosted on a OpenLink Virtuoso server can answer SPARQL queries like Give me all Sitcoms that are set in NYC? All tennis players from Moscow? All films by Quentin Tarentino? All German musicians that were born in Berlin in the 19th century? All soccer players with tricot number 11, playing for a club having a stadium with over 40,000 seats and is born in a country with over 10 million inhabitants?
  • 87. 2011/05/12 CONSEGI - Sören Auer: DBpedia 87 DBpedia SPARQL Endpoint SELECT ?name ?birth ?description ?person WHERE { ?person dbp:birthPlace dbp:Berlin . ?person skos:subject dbp:Cat:German_musicians . ?person dbp:birth ?birth . ?person foaf:name ?name . ?person rdfs:comment ?description . FILTER (LANG(?description) = 'en') . } ORDER BY ?name
  • 88. 2011/05/12 CONSEGI - Sören Auer: DBpedia 88 DBpedia Applications DBpedia Mobile: location aware mobile client for DBpedia Uses current location and DBpedia to display map Can navigate into other knowledge bases DBpedia Query Builder: user front end for building queries DBpedia Relationship Finder finds relation between two objects in DBpedia
  • 89. 2011/05/12 CONSEGI - Sören Auer: DBpedia 89 DBpedia Applications
  • 90. 2011/05/12 CONSEGI - Sören Auer: DBpedia 90 DBpedia Applications: Relfinder http://www.visualdataweb.org/relfinder.php
  • 91. 2011/05/12 CONSEGI - Sören Auer: DBpedia 91 DBpedia Applications: Zemanta
  • 92. 2011/05/12 CONSEGI - Sören Auer: DBpedia 92 DBpedia Applications: Faceted-Browser
  • 93. 2011/05/12 CONSEGI - Sören Auer: DBpedia 93 DBpedia Applications (3rd party) Muddy Boots (BBC): Annotate actors in BBC News with DBpedia identifiers Open Calais (Reuters): named entity recognition; entities are connected via owl:sameAs to DBpedia, Freebase, Geonames Faviki: Social Bookmarking Tool uses DBpedia in backend to group tags etc. and multi-language support Topbraid Composer: ontology editor, which links entities to DBpedia based on their labels
  • 94. Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann, Slide 94 LinkedGeoData Conversion, interlinking and publishing of OpenStreetMap.org* data sets as RDF. * ”Wikipedia for geographic data”
  • 95. Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann, Slide 95 Motivation ● Ease information integration tasks that require spatial knowledge, such as ● Offerings of bakeries next door ● Map of distributed branches of a company ● Historical sights along a bicycle track ● Therefore use RDF/OWL in order overcome structural and semantic heterogeneity. ● Requires a vocabulary – which we try to establish. ● LOD cloud contains data sets with spatial features ● e.g. Geonames, DBpedia, US census, EuroStat ● But: they are restricted to popular or large entities like countries, famous places etc. ● Therefore they lack buildings, roads, mailboxes, etc.
  • 96. Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann, Slide 96 OpenStreetMap - Datamodel ● Basic entities are: ● Nodes Latitude, Longitude ● Ways Sequence of nodes ● Relations Associations between any number of nodes, ways and relations. ● Each entity may be described with tags (= key-value pairs)
  • 97. Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann, Slide 97 Example: Leipzig's zoo
  • 98. Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann, Slide 98 Data/Mapping Example node_id | k | v -----------+------------------+--------------------- 259212302 | name | Universität Leipzig, Mathematik und Informatik 259212302 | amenity | university 259212302 | addr:street | Johannisgasse 259212302 | addr:postcode | 04103 259212302 | addr:housenumber | 26 259212302 | addr:city | Leipzig
  • 99. Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann, Slide 99 Data/Mapping Example node_id | k | v -----------+------------------+--------------------- 259212302 | name | Universität Leipzig, Mathematik und Informatik 259212302 | amenity | university 259212302 | addr:street | Johannisgasse 259212302 | addr:postcode | 04103 259212302 | addr:housenumber | 26 259212302 | addr:city | Leipzig lgd:node259212302 a lgdo:University ; rdfs:label "Universität Leipzig, Mathematik und Informatik" ; lgdo:hasCity "Leipzig" ; lgdo:hasHouseNumber "26" ; lgdo:hasPostalCode "04103" ; lgdo:hasStreet "Johannisgasse" ; georss:point "51.3369334 12.385401" ; geo:lat 51.3369334 ; geo:long 12.385401 .
  • 100. Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann, Slide 100 Mapping Types ● Three Mapping Types ● Text – (5, name, Leipzig) → lgd:node5 rdfs:label ”Leipzig” – (5, name:de, Leipzig) → lgd:node5 rdfs:label ”Leipzig”@de ● Datatypes – (6, seats, 4) → lgd:node6 lgdo:seats ”4”^^xsd:integer ● Classes/Object Properties – (7, place, city) → lgdn:7 a lgdo:City – (7, religion, pastafarian) → lgdn:7 lgdo:religion lgdo:Pastafarian
  • 101. Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann, Slide 101 Access ● Rest Interface (based on Postgis DB, full osm dataset loaded, > 1billion triples) ● Supports limited queries (e.g. circular/rectangular area, filtering by labels) ● Sparql Endpoints (based on Virtuoso DB, subset of osm dataset, ~222M triples) ● Static (http://linkedgeodata.org/sparql) ● Live (http://live.linkedgeodata.org/sparql) ● Downloads (http://downloads.linkedgeodata.org) ● Monthly updates on the above datasets envisioned
  • 102. Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann, Slide 102 LinkedGeoData Live ● OpenStreetMap provides full dumps and minutely changesets for download ● Changesets are numbered, e.g. ”001/234/567.osc.gz” ● We also convert the changesets to sets of added and removed triples (relative to our store) and publish them ● 001/234/567.added.nt.gz ● 001/234/567.removed.nt.gz ● Advantage: Other users could easily sync their RDF store with LinkedGeoData
  • 103. Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann, Slide 103 DBpedia Mapping – Step By Step Given a DBpedia point, query LGD points within type specific maximum distance Basic idea (performed with Silk): ● Compute spatial score ● Compute name similarity (rdfs:label)
  • 104. Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann, Slide 104 DBpedia Mapping – Step By Step Given a DBpedia point, query LGD points within type specific maximum distance Basic idea (performed with Silk): ● Compute spatial score ● Compute name similarity (rdfs:label) ● Combine both scores ● Depending on final score, either automatically accept/reject links or mark for manual verification.
  • 105. Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann, Slide 105 Statistics (2011-Feb-23) ● 222.539.712 Triples ● 6.666.865 Ways ● 5.882.306 Nodes ● Among them ● 352.673 PlaceOfWorship ● 60.573 RailwayStation ● 59.468 Recycling ● 50.955 Town ● 30.099 Toilet ● 7.222 City
  • 106. Universität Leipzig ▪ Agile Knowledge Engineering and Semantic Web (AKSW) Authors: Sören Auer, Jens Lehmann, Slide 106 Conclusion ● OpenStreetMap ● immensely successful project for collaboratively creating free spatial data ● Community uses key value structures, which provide a rich source of information ● Key strength: broad coverage ● LGD Contributions ● Established mapping to Dbpedia ● Geonames mapping partially done (37 different entity types cities, churches, ...) ● Facet-based LGD Browser provides an interface for OSM/LGD, which highlights its structural aspects ● Live sync ● Goal: Make LGD as useful (succesful) as DBpedia for the geospatial domain
  • 107. Creating Knowledge out of Interlinked Data Sören Auer – SDDB: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 107 http://lod2.eu Many different approaches (D2R, Virtuoso RDF Views, Triplify, …) No agreement on a formal semantics of RDF2RDF mapping • LOD readiness, SPARQL-SQL translation W3C RDB2RDF WG Extraction Relational Data Tool Triplify D2RQ Virtuoso RDF Views Technology Scripting languages (PHP) Java Whole middleware solution SPARQL endpoint - X X Mapping language SQL RDF based RDF based Mapping generation Manual Semi- automatic Manual Scalability Medium-high (but no SPARQL) Medium High
  • 108. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 108 http://lod2.eu From unstructured sources • Deploy existing NLP approaches (OpenCalais, Ontos API) • Develop standardized, LOD enabled interfaces between NLP tools (NLP2RDF) From semi-structured sources • Efficient bi-directional synchronization From structured sources • Declarative syntax and semantics of data model transformations (W3C WG RDB2RDF) Orthogonal challenges • Using LOD as background knowledge • Provenance Extraction Challenges
  • 109. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 109 http://lod2.euStorage and Querying
  • 110. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 110 http://lod2.eu Still by a factor 5-50 slower than relational data management (BSBM, DBpedia Benchmark) Performance increases steadily Comprehensive, well-supported open-soure and commercial implementations are available: • OpenLink’s Virtuoso (os+commercial) • Big OWLIM (commercial), Swift OWLIM (os) • 4store (os) • Talis (hosted) • Bigdata (distributed) • Allegrograph (commercial) • Mulgara (os) RDF Data Management
  • 111. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 111 http://lod2.eu • Uses DBpedia as data and a selection of 25 frequently executed queries • Can generate fractions and multiples of DBpedia‘s size • Does not resemble relational data Performance differences, observed with other benchmarks are amplified DBpedia Benchmark Geometric Mean
  • 112. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 112 http://lod2.eu • Reduce the performance gap between relational and RDF data management • SPARQL Query extensions • Spatial/semantic/temporal data management • More advanced query result caching • View maintenance / adaptive reorganization based on common access patterns • More realistic benchmarks Storage and Querying Challenges
  • 114. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 114 http://lod2.eu 1. Semantic (Text) Wikis • Authoring of semantically annotated texts 2. Semantic Data Wikis • Direct authoring of structured information (i.e. RDF, RDF-Schema, OWL) Two Kinds of Semantic Wikis
  • 115. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 115 http://lod2.eu Versatile domain-independent tool Serves as Linked Data / SPARQL endpoint on the Data Web Open-source project hosted at Google code Not just a Wiki UI, but a whole framework for the development of Semantic Web applications Developed in PHP based on the Zend framework Very active developer and user community More than 500 downloads monthly Large number of use cases OntoWiki – a semantic data wiki
  • 116. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 116 http://lod2.eu OntoWiki Dynamic views on knowledge bases
  • 117. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 117 http://lod2.eu OntoWiki RDF triples on resource details page
  • 118. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 118 http://lod2.eu OntoWiki Dynamische Vorschläge aus dem Daten Web
  • 119. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 119 http://lod2.eu Catalogus Professorum Lipsiensis
  • 121. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 121 http://lod2.eu RDFauthor in OntoWiki
  • 122. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 122 http://lod2.eu Semantic Portal with OntoWiki: Vakantieland
  • 123. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 123 http://lod2.eu RDFaCE- RDFa Content Editor
  • 124. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 124 http://lod2.eu RDFaCE Architecture
  • 125. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 125 http://lod2.eu Integrating various NLP APIs
  • 126. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 126 http://lod2.eu © CC-BY-NC-ND by ~Dezz~ (residae on flickr) Linking
  • 127. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 127 http://lod2.eu Automatic Semi-automatic • SILK • LIMES Manual • Sindice integration into UIs • Semantic Pingback LOD Linking
  • 128. LIMES 0.3: Basic Idea  Uses the characteristics of metric spaces  Especially consequences of triangle inequality ◦ d(x, y) < d(x, z) + d(z, y) ◦ d(x, z) - d(z, y) < d(x, y) < d(x, z) + d(z, y)  Basic idea ◦ Use pessimistic approximations of distances instead of computing them ◦ Only compute distances when needed
  • 130. Computation of Exemplars  Assumption: number of exemplars is given  Goal: Segment target data set
  • 135. Computation of Exemplars NB: Distances from exemplars to all other points are known
  • 136. Filtering x y z 1. Measure distance from each x to each exemplar
  • 137. Filtering x y z 2. Apply d(x, y) - d(y, z) > t  d(x, z) > t
  • 138. Similarity Computation x y z d(x, y) - d(y, z) < t  Compute d(x, z)
  • 139. Serialization  Results are returned as RDF  For example mapping DBpedia and Drugbank @prefix drugbank: <http://www4.wiwiss.fu- berlin.de/drugbank/resource/drugbank/> . @prefix dbpedia: <http://dbpedia.org/ontology/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . dbpedia:Cefaclor owl:sameAs drugbank:DB00833 . dbpedia:Clortermine owl:sameAs drugbank:DB01527 . dbpedia:Prednicarbate owl:sameAs drugbank:DB01130 . dbpedia:Linezolid owl:sameAs drugbank:DB00601 . dbpedia:Valaciclovir owl:sameAs
  • 140. Experiments  Q1: What is the best number of exemplars?  Q2: What is the relation between the similarity threshold q and the total number of comparisons?  Q3: Does the assignment of S and T matter?  Q4: How does LIMES compare to SILK?
  • 141. Q1 and Q2  Experiments on synthetic data  Knowledge bases of sizes 2000, 3000, 5000, 7500 and 10000  Varied number of exemplars  Varied thresholds  Experiments were repeated 5 times  Average results are presented
  • 142. Q1 and Q2 0 20000000 40000000 60000000 80000000 100000000 120000000 0 50 100 150 200 250 300 0.75 0.8 0.85 0.9 0.95 Brute force
  • 143. Q1 and Q2  Q1 ◦ Best number of exemplars depends on q ◦ For q > 0.9, best number lies around |T|1/2  Q2 ◦ As expected, number of comparisons diminishes with growing q
  • 144. Q3 (order of S and T)  Experiments on synthetic data  Knowledge bases of sizes 1000, 2000, 3000, …, 10000  Number of exemplars was |T|1/2  Experiments were repeated 5 times  Average results are presented
  • 145. Q3 TS 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1000 0.20 0.37 0.53 0.69 0.88 1.04 1.14 1.40 1.58 1.67 2000 0.36 0.64 0.88 1.24 1.37 1.63 1.97 2.25 2.50 2.70 3000 0.51 0.86 1.17 1.57 2.00 2.09 2.69 2.91 3.35 3.58 4000 0.70 1.11 1.59 2.00 2.45 2.88 3.10 3.61 3.94 4.50 5000 0.85 1.36 1.87 2.28 2.81 3.39 3.91 4.20 4.84 5.54 6000 1.02 1.60 2.14 2.81 3.29 3.93 4.44 4.96 5.39 6.08 7000 1.22 1.86 2.58 3.15 3.66 4.35 5.11 5.69 6.44 6.62 8000 1.41 2.04 2.78 3.43 4.06 4.98 5.51 6.55 7.14 7.53 9000 1.63 2.36 2.99 3.85 4.72 5.44 6.25 6.88 7.59 8.20 10000 1.80 2.62 3.51 4.25 4.97 6.01 6.33 7.81 8.31 9.15  Green = S first is more time-efficient  Overall less than 5% difference
  • 146. Q4 (comparison with SILK)  3 Experiments on real data ◦ Drugs ◦ Diseases ◦ SimCities  Number of exemplars was |T|1/2  Comparison of runtime with SILK  Experiments were repeated thrice  Best runtimes are presented
  • 148. Q4  We outperform SILK 2 by 1.5 orders of magnitude  The larger the data sources, the higher our speedup (64 for SimCities)
  • 149. Creating Knowledge out of Interlinked Data Sören Auer – SDDB: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 151 http://lod2.eu update and notification services for LOD Downward compatible with Pingback (blogosphere) http://aksw.org/Projects/SemanticPingBack Creating a network effect around Linking Data: Semantic Pingback
  • 150. Creating Knowledge out of Interlinked Data Sören Auer – SDDB: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 152 http://lod2.eu Visualizing Pingbacks in OntoWiki
  • 151. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 153 http://lod2.eu Only 5% of the information on the Data Web is actually linked • Make sense of work in the de-duplication/record linkage literature • Consider the open world nature of Linked Data • Use LOD background knowledge • Zero-configuration linking • Explore active learning approaches, which integrate users in a feedback loop • Maintain a 24/7 linking service: Linked Open Data Around-The- Clock project (LATC-project.eu) Interlinking Challenges
  • 152. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 154 http://lod2.eu Enrichment
  • 153. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 155 http://lod2.eu Linked Data is mainly instance data and !!! ORE (Ontology Repair and Enrichment) tool allows to improve an OWL ontology by fixing inconsistencies & making suggestions for adding further axioms. • Ontology Debugging: OWL reasoning to detect inconsistencies and satisfiable classes + detect the most likely sources for the problems. user can create a repair plan, while maintaining full control. • Ontology Enrichment: uses the DL-Learner framework to suggest definitions & super classes for existing classes in the KB. works if instance data is available for harmonising schema and data. http://aksw.org/Projects/ORE Enrichment & Repair
  • 154. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 156 http://lod2.euAnalysis Quality
  • 155. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 157 http://lod2.eu Quality on the Data Web is varying a lot • Hand crafted or expensively curated knowledge base (e.g. DBLP, UMLS) vs. extracted from text or Web 2.0 sources (DBpedia) Research Challenge • Establish measures for assessing the authority, provenance, reliability of Data Web resources Linked Data Quality Analysis
  • 156. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 158 http://lod2.eu Evolution © CC-BY-SA by alasis on flickr)
  • 157. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 159 http://lod2.eu • unified method, for both data evolution and ontology refactoring. • modularized, declarative definition of evolution patterns is relatively simple compared to an imperative description of evolution • allows domain experts and knowledge engineers to amend the ontology structure and modify data with just a few clicks • Combined with RDF representation of evolution patterns and their exposure on the Linked Data Web, EvoPat facilitates the development of an evolution pattern ecosystem • patterns can be shared and reused on the Data Web. • declarative definition of bad smells and corresponding evolution patterns promotes the (semi-)automatic improvement of information quality. EvoPat – Pattern based KB Evolution
  • 158. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 160 http://lod2.eu Evolution Patterns
  • 159.
  • 160. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 162 http://lod2.eu Exploration
  • 161. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 163 http://lod2.eu An ecosystem of LOD visualizations LODExploration Widgets Spatial faceted- browsing Faceted- browsing Statistical visualization Entity-/faceted- Based browsing Domain specific visualizations … … LODDatasetsChoreography layer • Dataset analysis (size, vocabularies, property histograms etc.) • Selection of suitable visualization widgets
  • 162. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 164 http://lod2.eu TODO: Put ULEI slides Faceted spatial-semantic browsing component
  • 163. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 165 http://lod2.eu Pure JavaScript, requires only SPARQL Endpoint for data access, Cross-Origin Resource Sharing (CORS) enabled. operates on local spatial regions, doed not depend on global meta-data about the data Source code: • https://github.com/AKSW/SpatialSemanticBrowsingWidgets Online Demo - LinkedGeoData Browser: • http://browser.linkedgeodata.org Next steps • Polygone/curve markers, domain specific visualization templates, integration of other sources, mobile interface Publication: • Claus Stadler, Jens Lehmann, Konrad Höffner, Sören Auer: LinkedGeoData: A Core for a Web of Spatial Open Data. To appear in Semantic Web Journal - Special Issue on Linked Spatiotemporal Data and Geo-Ontologies. Faceted spatial-semantic browsing - Availability
  • 164. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 166 http://lod2.eu Generic entity-based exploration with OntoWiki http://fintrans.publicdata.eu
  • 165. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 167 http://lod2.eu Domain-specific visualization: http://energy.publicdata.eu
  • 166. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 168 http://lod2.eu Visualization of statistic data (datacube vocab.) http://scoreboard.lod2.eu
  • 167.
  • 168. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 170 http://lod2.eu 13.08.2016 Sören Auer - The emerging 170
  • 169. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 171 http://lod2.eu 13.08.2016 Sören Auer - The emerging 171
  • 170.
  • 171. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 173 http://lod2.eu Visual Query Builder
  • 172. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 174 http://lod2.eu Relationship Finder in CPL
  • 173. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 175 http://lod2.eu Distributed Social Semantic Networking
  • 174. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 176 http://lod2.eu Social Networks are walled gardens • Take users' data out of their hands, • predefined privacy & data security regulations • infrastructure of a single provider (lock-in) • Facebook (600M+ users) = Web inside the Web • Interoperability is limited to proprietary APIs Social networks should be open and evolving • allow users to control what to enter & keep control over their data • users should be able to host the data on infrastructure, which is under their direct control, the same way as they host their own website (TBL) We need a truly Distributed Social Semantic Network (DSSN) • Initial approaches appeared with GNU social and more recently Diaspora • a DSSN should be based on semantic resource descriptions and de-referenceability so as to ensure versatility, reusability and openness in order to accommodate unforeseen usage scenarios • a number of standards and best-practices for social, Semantic Web applications such as FOAF, WebID and Semantic Pingback emerged. Distributed Social Semantic Networking
  • 175. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 177 http://lod2.eu (1) Resources announce services and feeds, feeds announce services – in particular a push service. (2) Applications initiate ping requests to spin the Linked Data network (3) Applications subscribe to feeds on push services and receive instant notifications on updates. (4) Update services are able to modify resources and feeds (e.g. on request of an application) (5) Personal and global search services index social network resources and are used by applications (6) Access to resources & services can be delegated to applications by a WebID, i.e. application can act in name of WebID owner (7) The majority of all access operations is executed through standard web requests. DSSN Architecture
  • 176. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 178 http://lod2.eu • Open-source, MVC architecture • Plattform independent, based on HTML5, CSS, Javascript • jQuery, jQuery Mobile, jQuery UI • rdfQuery – simple triple store in Javascript • PhoneGap (Apache Device ready) native apps for iOS, Android, Blackberry OS, WebOS, Symbian, Bada • http://aksw.org/Projects/MobileSocialSemanticWeb DSSN Mobile Client
  • 177. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 179 http://lod2.eu
  • 178. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 180 http://lod2.eu DSSN Mobile Browsing
  • 179. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 181 http://lod2.eu DSSN Mobile Editing
  • 180. EU-FP7 LOD2 Project Overview . Page 182 http://lod2.eu Creating Knowledge out of Interlinked Data Inter- linking/ Fusing Classifi- cation/ Enrichmen t Quality Analysis Evolution / Repair Search/ Browsing/ Exploratio n Extractio n Storage/ Querying Manual revision/ authoring LOD Lifecycle supported by Debian based LOD2 Stack (released next week)
  • 181. EU-FP7 LOD2 Project Overview . Page 183 http://lod2.eu Creating Knowledge out of Interlinked Data First release of the LOD2 Stack: stack.lod2.eu & demo.lod2.eu/lod2de
  • 182. EU-FP7 LOD2 Project Overview . Page 184 http://lod2.eu Creating Knowledge out of Interlinked Data
  • 183. EU-FP7 LOD2 Project Overview . Page 185 http://lod2.eu Creating Knowledge out of Interlinked Data AKSW Team
  • 184. Creating Knowledge out of Interlinked Data Sören Auer – SBBD: DBpedia and the Emerging Web of Linked Data 5.10.2011 Page 186 http://lod2.eu Thanks for your attention! Sören Auer http://www.uni-leipzig.de/~auer/ | http://aksw.org | http://lod2.org auer@uni-leipzig.de

Notas del editor

  1. Initially, the Web consisted of many Websites containing only unstructured/textual content.The Web 2.0 extended this traditional Web with few extremely large Websites specializing on certain specific content types. Examples are:Wikipedia for encyclopedic articlesDel.icio.us for Web linksFlickr for picturesYouTube for VideosDigg for news…These websites provide specialized searching, querying, sharing, authoring specifically adopted to the content type of their concern.
  2. Popular content types such as pictures, movies, calendars, encyclopedic articles, news recipes etc. are already sufficiently well supported on the Web.However, there is a long tail of special-interest content (profiles of expertise, historic data and events, bio-medical knowledge, intra-corporational knowledge etc.) which has very low or no current support (for filtering, aggregation, searching, querying, collaborative editing) on the Web.
  3. http://www.flickr.com/photos/residae/2560241604/#/
  4. http://www.flickr.com/photos/alasis/3541341601/sizes/l/in/photostream/