3. Creating Knowledge
out of Interlinked Data
Web
server
Web
server
Problem: Try to search for these things on the current Web:
• Apartments near German-English bilingual childcare in Berlin
• ERP service providers with offices in Vienna and London
• Researchers working on Digital Library topics in Eastern Europe
Information is available on the Web, but opaque to current search.
Why do we need the Data Web?
passau.de
Has everything about
childcare in Passau.
Immobilienscout.de
Knows all about real estate
offers in GermanyDB
Web
server
DB
Web
server
Search engineHTML HTML
RDF
RDF
Solution: complement text on Web pages with structured linked
open data & intelligently combine/integrate/join such structured
information from different sources:
4. Creating Knowledge
out of Interlinked Data
1. Uses RDF Data Model
Linked Data in a Nutshell
TPDL2013
Valetta
24.9.2013
Uni Malta
organizes
starts
takesPlaceIn
2. Is serialised in triples:
Uni_Malta organizes TPDL2013 .
TPDL2013 starts “20130924”^^xsd:date .
TPDL2013 takesPlaceAt Valetta .
3. Uses Content-negotiation
Subject Predicate Object
5. The emerging Web of Data
20082007
2008
2008
2008
2009
2009
2010
Linking Open Data cloud diagram, by
Richard Cyganiak and Anja Jentzsch.
6. Creating Knowledge
out of Interlinked Data
Inter-
linking/
Fusing
Classifi-
cation/
Enrichment
Quality
Analysis
Evolution /
Repair
Search/
Browsing/
Exploration
Extraction
Storage/
Querying
Manual
revision/
authoring
Linked Data
Lifecycle
7. Creating Knowledge
out of Interlinked Data
Extraction
Inter-
linking
Enrichm
ent
Quality
Analysis
Evolution
Repair
Explora-
tion
Extrac-
tion
Store
Query
Author
ing
8. Creating Knowledge
out of Interlinked Data
Storage and Querying
Inter-
linking
Enrichm
ent
Quality
Analysis
Evolution
Repair
Explora-
tion
Extrac-
tion
Store
Query
Author
ing
11. Creating Knowledge
out of Interlinked Data
Enrichment
Inter-
linking
Enrichm
ent
Quality
Analysis
Evolution
Repair
Explora-
tion
Extrac-
tion
Store
Query
Author
ing
12. Creating Knowledge
out of Interlinked Data
Analysis
Quality
Inter-
linking
Enrichm
ent
Quality
Analysis
Evolution
Repair
Explora-
tion
Extrac-
tion
Store
Query
Author
ing
CC BY SA Wikipedia
14. Creating Knowledge
out of Interlinked Data
Exploration
Inter-
linking
Enrichm
ent
Quality
Analysis
Evolution
Repair
Explora-
tion
Extrac-
tion
Store
Query
Author
ing
15. Creating Knowledge
out of Interlinked Data
Virtuoso RDF Store
(conductor,
sparql, isparql, ods)
Virtuoso
sponger
lod2webapi
RDFAuthor
Limes
ORE
D2R
Semantic
Spatial
Browser
sparqlproxy
SigmaEE
OntoWiki
LOD open
refine
Silk-latc
stanbol
valiant
Dbpedia
Spotlight
SPARQLed
PoolParty
Sieve
PoolParty
Extractor
Inter-
linking/
Fusing
Classifi-
cation/
Enrichme
nt
Quality
Analysis
Evolution /
Repair
Search/
Browsing/
Exploratio
n
Extraction
Storage/
Querying
Manual
revision/
authoring
dl-learner
CubeViz
LOD2
demonstrator
R2R
rdf-dataset-integration
Silk
SIREn
Sparqlify
CSVImport
Dbpedia
Spotlight UI
CKAN
(source)
Mondeca Sparql
Endpoint Status
Sindice
(source)
Web Linkage
Validator
Dbpedia
(source)
LOD2 Stack Components
lodms
LOD2 stat.
Workbench
LOD2 Authentication
Component
Data Cube
Validation Tool
Data Cube
Merging Tool
Data Cube
Slicing Tool
LOD2 Provenance
Component
21. Creating Knowledge
out of Interlinked Data
Two definitions:
• Online access to digitized/digital artefacts
(articles, books, manuscripts, photographs,…)
• Digital Knowledge Hubs
new ways of sharing knowledge online
Digital Libraries
22. Creating Knowledge
out of Interlinked Data
In our digital world with completely new
technology (internet, crowd-sourcing, linked
data) and devices (ultrabooks, smart
TVs/phones, tablets) ist not sufficient to just
„digitize“ the concept of a library.
We must re-invent the digital library as a place
for knowledge sharing on the Web.
Hypothesis
24. Creating Knowledge
out of Interlinked Data
Think about new types of artefacts
• Thesauri, ontologies / knowledge bases
• Courseware / learning objects
• Data / knowledge assets
• Semantic descriptions
of the content of
publications
• …
How can we reinvent the Library online?
25. Creating Knowledge
out of Interlinked Data
Think about new types of collaboration and
interaction
• Crowd-sourcing
• Social networking
• Serious games
• …
How can we reinvent the Library online?
26. Creating Knowledge
out of Interlinked Data
Think about new
technologies
• Semantic Web /
Linked Data
• Wikis
• Mashups
• Mobile Apps
• …
How can we reinvent the Library online?
28. Creating Knowledge
out of Interlinked Data
1. OntoWiki – a semantic data wiki
2. Cortex – a semantic digital library search
backend
3. SlideWiki – a platform for crowd-sourcing
multilingual OpenCourseWare
4. SemanticPapers – capturing the meaning
of scientific publications
Are these digital libraries?
30. Creating Knowledge
out of Interlinked Data
1. Semantic (Text) Wikis
• Authoring of semantically
annotated texts
2. Semantic Data Wikis
• Direct authoring of
structured information
(i.e. RDF, RDF-Schema,
OWL)
Two Kinds of Semantic Wikis
36. Cortex – a semantic digital library
search backend
37. Creating Knowledge
out of Interlinked Data
Cortex – Flexible und zukunftsfähige Architektur
Import
Manager
Search
Manager
License
Manager
Availability
Manager
Rest API
ARCHIV INDEX
BMS DICT DS
DS
CMS / User managementPresentation
Import
38. Creating Knowledge
out of Interlinked Data
CORTEX Performance
Metric Description Performance
Queries per Second
(qps)
Number of search request, which can be
processed per second
2000
Search response time Maximum response time (till 100.000.000
objects and 2000 qps)
< 100 ms
Number of Objects Number of objects (resources), for which
CORTEX was developed and tested
100.000.000
40. SlideWiki – a platform for crowd-sourcing
multilingual OpenCourseWare
41.
42.
43.
44.
45. Creating Knowledge
out of Interlinked Data
How is SlideWiki different?
There are a number of online tools for presentations, such as Google
Docs Presentations, Prezi, SlideShare. SlideWiki differs quite a lot from
these due to its focus on:
E-learning - you can add questions to slides and thus compose
comprehensive self-assessment tests for learners
Collaboration - SlideWiki aims at empowering whole communities to
create presentations collaboratively
Translation - with SlideWiki content can be easily translated in more
than 50 languages
No other tool provides this twist and thus SlideWiki offers a unique
feature set.
46. Creating Knowledge
out of Interlinked Data
Researchers spend a lot of time in
• encoding information in text
• Decoding information from text
Can we make this more efficient?
Semantic Publications
47. Creating Knowledge
out of Interlinked Data
Researchers publish their findings in structured
form (e.g. encoded in a RDF knowledge base)
This would enormously simplify:
• Finding related work
• Creating a survey
• Assessing a contribution
• …
Vision of scientific publishing
48. Creating Knowledge
out of Interlinked Data
limes-paper describes appr123
appr123 a approach
appr123 for Link_Discovery
appr123 hasProp looseless
...
limes-paper describes impl123
impl123 a implementation
impl123 implements appr123
impl123 language Java
...
limes-paper describes eval123
eval123 a evaluation
eval123 evaluates impl123
eval123 uses DBpedia
...
Semantically describing the content of scientific
publications
50. Creating Knowledge
out of Interlinked Data
• Digital libraries must support new types of
structured artefacts, interaction &
collaboration paradigms and technologies
• The Linked Data paradigm helps to
connect knowledge from distributed
heterogeneous sources.
Wrap-up
51. EU-FP7 LOD2 Project Overview . Page 51 http://lod2.eu
Creating Knowledge out of Interlinked Data
AKSW Team
52. EU-FP7 LOD2 Project Overview . Page 52 http://lod2.eu
Creating Knowledge out of Interlinked Data
The LOD2 Gang
53. Creating Knowledge
out of Interlinked Data
Thanks for your attention!
Sören Auer
http://www.iai.uni-bonn.de/~auer | http://aksw.org | http://lod2.org
auer@cs.uni-bonn.de
Notas del editor
Storage. RDF Data Management is still more challenging than relational Data Management. We aim to close this performance gap by employing column-store technology, dynamic query optimization, adaptive caching of joins, optimized graph processing, cluster/cloud scalability.
Authoring. LOD2 facilitates the authoring of rich semantic knowledge bases, by leveraging Semantic Wiki technology, the WYSIWIM paradigm (What You See Is What You Mean) and distributed social, semantic collaboration and networking techniques.
Interlinking. Creating and maintaining links in a (semi-)automated fashion is still a major challenge and crucial for establishing coherence and facilitating data integration. We aim at linking approaches yielding high precision and recall, which configure themselves automatically or based on end-user feedback.
Classification. Linked Data on the Web is mainly raw instance data. For data integration, fusion, search and many other applications, however, we need this raw instance data to be linked and integrated with upper level ontologies.
Quality. The quality on the Data Web is varying as the quality on the document web varies. LOD2 develops techniques, which help to assess the quality based on characteristics such as provenance, context, coverage or structure.
Evolution/Repair. Data on the Web is dynamic. We need to facilitate the evolution of data while keeping things stable. Changes and modifications to knowledge bases, vocabularies and ontologies should be transparent and observable. LOD2 also develops methods to spot problems in knowledge bases and to automatically suggest repair strategies.
Search/Browsing/Exploration. For many users Data Web is still invisible below the surface. LOD2 develops search, browsing, exploration and visualization techniques for different kinds of Linked Data (i.e. spatial, temporal, statistic), which make the Data Web sensible for real users.
Im ersten Teil der technischen Präsentation will ich insbesondere auf den backend-Teil der Architektur bestehend aus Import und Discovery-system eingehen
Die aufgeführten Leistungsdaten wurde auf der Referenzarchitektur des Betreibers der Deutschen Digitalen Bibliothek gemessen.