Going from a web of document to a web of knowledge is one of the key goal set by the creator of the World Wide Web, Sir Tim Berners-Lee. This dream is becoming a reality more each day with the development and the integration of new formats and new technologies to represent data as knowledge graphs, interlinking concepts within documents or databases together. This presentation will provide an overview of the generic concepts supporting Linked Data, including formats, the existing technologies supporting these formats, introduce the key existing initiatives relying on these technologies. We will also address the challenge of semantic/knowledge modeling in science and in other domains and the need for more tools to support the use of these technologies. In particular, we will present the semantic annotation service B2NOTE and how the formats and technologies are used to extend the description of datasets within EUDAT and allow the possibility to create new datasets from multiples sources and multiple domains.
Visit https://eudat.eu/eudat-summer-school
Linked Data and Semantic Web - EUDAT Summer School (Yann Le Franc, e-Science Data Factory)
1. www.eudat.eu
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
Introduction to Linked Data
and Semantic Web
Yann Le Franc, PhD
This work is licensed under the Creative
Commons CC-BY 4.0 licence.
Attribution: EUDAT – www.eudat.eu
Version 2017-1
2. How to cope with an expending
universe of scientific data?
“The Hitchhiker’s guide to the Semantic Web Galaxy”
3. How to cope with an expending
universe of scientific data?
“The Hitchhiker’s guide to the Semantic Web Galaxy”
4. EUDAT Summer School, 3-7 July 2017, Crete
Introduction: a bit of context
The general principles of Linked Data and standards
Application: data annotations with B2NOTE
Outline
5. EUDAT Summer School, 3-7 July 2017, Crete
Problem: the volume of scientific data is
expanding
6. EUDAT Summer School, 3-7 July 2017, Crete
?
Challenge: Aggregating multi-dimensional
data from multiple data sources
7. EUDAT Summer School, 3-7 July 2017, Crete
?
Similar problem and challenge in Neuroscience
8. EUDAT Summer School, 3-7 July 2017, Crete
Multiple species
Multi-scale data
ConnectivityGenes Molecules
Electrical
activity Functional
Data aggregation
Similar problem and challenge in Neuroscience
9. EUDAT Summer School, 3-7 July 2017, Crete
Modeling
Multiple species
Multi-scale data
ConnectivityGenes Molecules
Electrical
activity Functional
Data Analysis
Data aggregation
Similar problem and challenge in Neuroscience
10. EUDAT Summer School, 3-7 July 2017, Crete
Modeling
Multiple species
Multi-scale data
ConnectivityGenes Molecules
Electrical
activity Functional
Data Analysis
Data aggregation
Similar problem and challenge in Neuroscience
11. EUDAT Summer School, 3-7 July 2017, Crete
Data enclosed in information silos : Distinct APIs, Data published within HTML or
unstructured
2710 databases related to Neurosciences (Neuroscience Information
Framework)
How can we make these data resources interoperable and
link them together?
The current situation: distributed data
resources in large variety of formats
WebAPI
<HTML>
<HTML>
WebAPI
12. EUDAT Summer School, 3-7 July 2017, Crete
https://fr.wikipedia.org/wiki/Tim_Berners-Lee
A global problem
World Wide Web is a global document space
Documents are interconnected with links
Data is hidden in HTML pages: Easy to use by humans but
not by machines
Large diversity of Web APIs
Impossible to access and interlink data
Need for semantics for transforming the global document
space into a global data space
13. EUDAT Summer School, 3-7 July 2017, Crete
A solution for Life Science, the Universe
and Everything
14. EUDAT Summer School, 3-7 July 2017, Crete
What is Linked Data?
Tim Berners-Lee (2006) - Design Issues
Use URIs as name for things
Use HTTP URIs so that people can look up those
names (dereferencable)
When someone looks up a URI, provide useful
information, using the standards (RDF, SPARQL)
Include links to other URIs, so that they can
discover more things
https://www.w3.org/DesignIssues/LinkedData.html
15. EUDAT Summer School, 3-7 July 2017, Crete
Use URI instead of URN (Uniform Resource Name) and DOIs
Example
Real Person
http://www.esciencedatafactory.com/people/yann_le_franc
Description RDF (for machines)
http://www.esciencedatafactory.com/people/yann_le_franc.rdf
Description HTML (for humans)
http://www.esciencedatafactory.com/people/yann_le_franc.html
Separate the URI representing the real object or concept from its description
Name things with URIs
16. EUDAT Summer School, 3-7 July 2017, Crete
Make use of HTTP content negociation
Two technical solutions for designing the URIs:
1 - Use the content negotiation Redirect 303 (see Other Link)
2 – Hash URI
https://www.w3.org/TR/cooluris/
Make URI dereferencable
https://www.w3.org/Protocols/rfc2616/rfc2616-sec12.html
17. EUDAT Summer School, 3-7 July 2017, Crete
Make URI dereferencable
Use the content negotiation Redirect 303 (see Other Link)
Client Server
18. EUDAT Summer School, 3-7 July 2017, Crete
GET URI
Make URI dereferencable
Use the content negotiation Redirect 303 (see Other Link)
Client Server
Client HEADER
GET /people/yann_le_franc HTTP/1.1
Host: esciencedatafactory.com
Accept: text/html, application/rdf+xml
19. EUDAT Summer School, 3-7 July 2017, Crete
GET URI
303- See URI2
Make URI dereferencable
Use the content negotiation Redirect 303 (see Other Link)
Client Server
Client HEADER
GET /people/yann_le_franc HTTP/1.1
Host: esciencedatafactory.com
Accept: text/html, application/rdf+xml
Server Answer
HTTP/1.1 303 See Other
Location: http://www.esciencedatafactory.com/
people/yann_le_franc.rdf
Vary: Accept
20. EUDAT Summer School, 3-7 July 2017, Crete
GET URI
303- See URI2
GET URI2
Make URI dereferencable
Use the content negotiation Redirect 303 (see Other Link)
Client Server
Client HEADER
GET /people/yann_le_franc.rdf HTTP/1.1
Host: esciencedatafactory.com
Accept: text/html, application/rdf+xml
21. EUDAT Summer School, 3-7 July 2017, Crete
GET URI
303- See URI2
GET URI2
Content URI2
Make URI dereferencable
Use the content negotiation Redirect 303 (see Other Link)
Client Server
Client HEADER
GET /people/yann_le_franc HTTP/1.1
Host: esciencedatafactory.com
Accept: text/html, application/rdf+xml
Server Answer
HTTP/1.1 200 OK
Content-Type: application/rdf+xml
…
22. EUDAT Summer School, 3-7 July 2017, Crete
GET URI
303- See URI2
GET URI2
Content URI2
Make URI dereferencable
Use the content negotiation Redirect 303 (see Other Link)
Client Server
Client HEADER
GET /people/yann_le_franc HTTP/1.1
Host: esciencedatafactory.com
Accept: text/html, application/rdf+xml
Server Answer
HTTP/1.1 200 OK
Content-Type: application/rdf+xml
…
Requires 4 HTTP calls per item
23. EUDAT Summer School, 3-7 July 2017, Crete
Make URI dereferencable
2 – Use Hash URI
GET URI
Client
Server
http://www.esciencedatafactory.com/people
List of people
• http://www.esciencedatafactory.com/people#yann_le_franc
• http://www.esciencedatafactory.com/people#john_doe
Client HEADER
GET /people HTTP/1.1
Host: esciencedatafactory.com
Accept: application/rdf+xml
24. EUDAT Summer School, 3-7 July 2017, Crete
Make URI dereferencable
2 – Use Hash URI
GET URI
Content URI
Client
Server
http://www.esciencedatafactory.com/people
List of people
• http://www.esciencedatafactory.com/people#yann_le_franc
• http://www.esciencedatafactory.com/people#john_doe
Client HEADER
GET /people HTTP/1.1
Host: esciencedatafactory.com
Accept: application/rdf+xml
HTTP/1.1 200 OK
Content-Type: application/rdf+xml
The whole list
Server Answer
25. EUDAT Summer School, 3-7 July 2017, Crete
Make URI dereferencable
2 – Use Hash URI
GET URI
Content URI
Client
Server
http://www.esciencedatafactory.com/people
List of people
• http://www.esciencedatafactory.com/people#yann_le_franc
• http://www.esciencedatafactory.com/people#john_doe
Client HEADER
GET /people HTTP/1.1
Host: esciencedatafactory.com
Accept: application/rdf+xml
HTTP/1.1 200 OK
Content-Type: application/rdf+xml
The whole list
Server Answer
Cache
26. EUDAT Summer School, 3-7 July 2017, Crete
Make URI dereferencable
2 – Use Hash URI
GET URI
Content URI
Client
ServerCache
Get the whole file and then look into the file to find the items with the hash
http://www.esciencedatafactory.com/people
List of people
• http://www.esciencedatafactory.com/people#yann_le_franc
• http://www.esciencedatafactory.com/people#john_doe
27. EUDAT Summer School, 3-7 July 2017, Crete
Resource A
URI
Resource B
URI
Relation
URI
My website
http://www.example.com/
index.html
Me
http://myprofile/name
Created by
RDF Triple
(subject, predicate, object)
The RDF Data Model
28. EUDAT Summer School, 3-7 July 2017, Crete
Labeled directed graph
From W3C RDF 1.1. Primer https://www.w3.org/TR/rdf11-primer/
RDF in action
30. EUDAT Summer School, 3-7 July 2017, Crete
RDFa
RDF serializations
<!DOCTYPE html PUBLIC “ _//W3C//DTD XHTML+RDFa 1.0//EN”
“http://www.w3c.org/MarkUp/DTD/xhtml-rdfa-1.dtd”>
<html xmlns=“http://www.w3c.org/1999/xhtml”
xmlns:rdf=“http://www.w3c.org/1999/02/22-rdf-syntax-ns#”
xmlns:foaf=“http://xmlns.com/foaf/0.1/”>
<head>
<meta http-equiv=“Content-Type” content=“application/xhtml+xml; charset=UTF-8”/>
<title>Profile page for Yann Le Franc</title>
<:/head>
<body>
<div about=http://www.esciencedatafactory.com/people/yann_le_franc typeof=“foaf:Person”>
<span property=“foaf:name”>Yann Le Franc</span>
</div>
</body>
</html>
31. EUDAT Summer School, 3-7 July 2017, Crete
Subject Predicate Object
Alice is a friend of Bob
Bob Is interested
in
The Mona
Lisa
Bob Is a Person
Bob Is born 14 July 1990
The Mona
Lisa
Was created
by
Leonardo Da
Vinci
La Joconde in
Washington
Is about The Mona
Lisa
Triple Store
SPARQL endpoint
SPARQL
Queries
Publishing RDF
32. EUDAT Summer School, 3-7 July 2017, Crete
RDF Triple store Graph database
M. Junghanns and A. Petermann, “Management and Analysis of Big Graph Data: Current Systems and Open Challenges,” …
(eds: S Sakr, 2017.
B. Haslhofer, E. Momeni Roochi, B. Schandl, and S. Zander, “Europeana RDF Store Report,” Mar. 2011.
Z. Kaoudi and G. Weikum, RDF in the clouds: a survey In The VLDB Journal. 2014.
Technologies to publish RDF
33. EUDAT Summer School, 3-7 July 2017, Crete
Resource 1: http://www.incf.org/images/newsroom/le-franc
Resource 2:
http://m.c.lnkd.licdn.com/mpr/mpr/shrink_200_200/p/2/000/22d/056/2bdc24c.jpg
Last Name : Le Franc
<last_name>
Le Franc
</last_name>
Family Name : Le Franc
<family_name>
Le Franc
</family_name>
Do we need anything else?
34. EUDAT Summer School, 3-7 July 2017, Crete
Resource 1: http://www.incf.org/images/newsroom/le-franc
Resource 2:
http://m.c.lnkd.licdn.com/mpr/mpr/shrink_200_200/p/2/000/22d/056/2bdc24c.jpg
Last Name : Le Franc
<last_name>
Le Franc
</last_name>
Family Name : Le Franc
<family_name>
Le Franc
</family_name>
Do we need anything else?
Synonym/Equivalent
35. EUDAT Summer School, 3-7 July 2017, Crete
Resource 1: http://www.incf.org/images/newsroom/le-franc
Resource 2:
http://m.c.lnkd.licdn.com/mpr/mpr/shrink_200_200/p/2/000/22d/056/2bdc24c.jpg
Last Name : Le Franc
<last_name>
Le Franc
</last_name>
Family Name : Le Franc
<family_name>
Le Franc
</family_name>
Do we need anything else?
Synonym/Equivalent
?
? ??
?
WE NEED COMMON VOCABULARIES TO SHARE THE SAME SEMANT
36. EUDAT Summer School, 3-7 July 2017, Crete
Yes if you are interested in:
Sharing data with other
Data aggregation from multiple sources
Not if you are a lone scientist in your ivory tower
Do we really need vocabularies?
37. EUDAT Summer School, 3-7 July 2017, Crete
“In computer science and information science, an ontology formally represents
knowledge as a set of concepts within a domain, using a shared vocabulary to
denotes the types, properties and interrelationships of the concepts” - Wikipedia
You need to create a controlled vocabulary also called ontology that could be
used as a common “standardized” vocabulary to annotate your resource
W3C semantic web standards:
RDF Schema
OWL (Web Ontology Language)
SKOS (Simple Knowledge Organization System)
What is an ontology?
How do you encode this in practice?
How can we make it better?
38. EUDAT Summer School, 3-7 July 2017, Crete
Class
What is an ontology in practice?
39. EUDAT Summer School, 3-7 July 2017, Crete
Class
Unique identifier
Label
Human-readable definition
Other metadata
(creator, version, date,…)
What is an ontology in practice?
40. EUDAT Summer School, 3-7 July 2017, Crete
Superclass
Unique identifier
Label
Human-readable definition
Other metadata
(creator, version, date,…)
Subclass
Unique identifier
Label
Human-readable definition
is_aSubsumption
relation
Macaqua mulata is an animal
What is an ontology in practice?
41. EUDAT Summer School, 3-7 July 2017, Crete
Person
Unique identifier
Label
Human-readable definition
Other metadata
(creator, version, date,…)
Yann
Le Franc
Unique identifier
Label
Human-readable definition
is_aSubsumption
relation
What is an ontology in practice?
42. EUDAT Summer School, 3-7 July 2017, Crete
Superclass
Subclass
is_aSubsumption
relation
Superclass 2
has_a
Associative relation
What is an ontology in practice?
43. EUDAT Summer School, 3-7 July 2017, Crete
Person
Yann
Le Franc
is_aSubsumption
relation
Relations between concepts are based on first-order logic
Use reasoners/classifiers- machine learning algorithms
Name
has_a
Associative relation
What is an ontology in practice?
50. EUDAT Summer School, 3-7 July 2017, Crete
Example of vocabularies
FOAF – Friend Of A Friend
DCAT (Data Catalog Vocabulary)
PROV (Provenance vocabulary)
Web Annotation
Music Ontology
SIOC (Semantically Interlinked Online Community)
51. EUDAT Summer School, 3-7 July 2017, Crete
By user:Marobi1 [CC0], via Wikimedia Commons
https://en.wikipedia.org/wiki/Semantic_Web_Stack
The semantic web stack
52. EUDAT Summer School, 3-7 July 2017, Crete
Limitation of a unique formal model: monolithic ontologies
Difficulty to reconcile different models
Lack of validation and quality testing for ontologies
Difficult reach consensus on research topics
Slow integration of the new concepts in existing ontologies
Hard to use for scientists
However designing common terminologies is valuable and Mostly Harmless
?
Limits of the approach
53. EUDAT Summer School, 3-7 July 2017, Crete
Google Knowledge Graph
https://www.google.com/intl/bn/insidesearch/features/sea
rch/knowledge.html
Facebook graph:
https://developers.facebook.com/docs/graph-
api/overview/
Wikidata:
https://www.wikidata.org/wiki/Wikidata:Main_Page
Freebase
Dbpedia
https://datahub.io/dataset
EBI RDF store
Some major RDF resources
54. EUDAT Summer School, 3-7 July 2017, Crete
Metadata
Different types of metadata to describe the context, the
content, the format and the history of the data
Metadata are generally frozen after publication of a data
record
Descriptive Metadata can be incomplete and/or biased
by the data publisher perspective
55. EUDAT Summer School, 3-7 July 2017, Crete
Metadata
Different types of metadata to describe the context, the
content, the format and the history of the data
Metadata are generally frozen after publication of a data
record
Descriptive Metadata can be incomplete and/or biased
by the data publisher perspective
Annotations
How to add new metadata/information in a flexible way?
56. EUDAT Summer School, 3-7 July 2017, Crete
What do we mean by annotation?
By definition, an annotation is “a note added to a text,
book, drawing, etc., as a comment or an explanation”
(from Merriam Webster).
In our context, it is an assertion we want to make about a
digital resource i.e. a text file, an image, a recording, a
movie,... .
57. EUDAT Summer School, 3-7 July 2017, Crete
Semantic Annotation: General Principles
58. EUDAT Summer School, 3-7 July 2017, Crete
Web Annotation Data Model
Use W3C Web Annotation data model –
(https://www.w3.org/TR/annotation-model/)
Serialized in JSON-LD (https://www.w3.org/TR/json-ld/)
= JSON based representation of RDF graphs
59. EUDAT Summer School, 3-7 July 2017, Crete
The annotation “use-cases”
Manual annotations of data elements: semantic
tagging and file linking
Semi-automatic annotations of data element content:
related with LTER Data Pilot
Data curation: curation status tags
Create aggregated datasets from multi-scale or
multi-domain datasets.
60. EUDAT Summer School, 3-7 July 2017, Crete
B2NOTE
Crowdsourcing annotator
All annotation are public
Private annotation in the next release
Easy-to-use
auto-completion with terms from domain specific controlled vocabularies
Intuitive User Interface
Easily create new datasets selected based on annotations
Easy integration approach based Widget/iframe approach
Integrate with EUDAT services
Integrate with community web UI
Easy to deploy
Store triples as JSON-LD in MongoDB backend
Uses Django as CMS
63. EUDAT Summer School, 3-7 July 2017, Crete
B2NOTE at work
Try it @ http://b2note.bsc.es
Login/Register Annotation interface Access to annotation
64. EUDAT Summer School, 3-7 July 2017, Crete
B2NOTE at work
Access semantic term
information
Search files using
annotations
Export annotations and
selected data for reuse
65. EUDAT Summer School, 3-7 July 2017, Crete
Test integration with B2SHARE
https://trng-b2share.eudat.eu/
66. EUDAT Summer School, 3-7 July 2017, Crete
The added-value of annotations
Enriching digital content with your personal keyword
without modifying the data record
Structure data differently using annotations
Support data curation before and after publication
Create aggregated datasets from multi-scale or multi-
domain datasets.
67. EUDAT Summer School, 3-7 July 2017, Crete
Additional Resources
EUDAT Webinar: Organise, retrieve and
aggregate data using annotations with
B2NOTE