The document discusses Linked Open Data (LOD) and its applications in e-government and commercial publishing. It describes a LOD2 project demo application that allows searching legislative documents from the CELLAR database using SPARQL queries. Metadata about the documents, including licenses, can be retrieved in different formats. The documents and metadata can be integrated with other vocabularies like EUROVOC. This allows the content to be reused, with references to the original sources, in applications and products from commercial publishers.
The demo application shows how LOD allows more direct access to primary sources of content and metadata, and helps publishers enhance their offerings by linking to and reusing this open data.
1. Linked (Open) Data in e-Government
and Commercial Publishing
EU F7 project LOD2
partner TenForce (BE)
Johan De Smedt
2014-01-17
TenForce – project: LOD2
1
4. Internet and HTTP - example (1/.) (2/2)
• The internet as it is familiar now:
– text, photo, video, ....
– hyperliens
• URL format: http://{domaine}/{chemin}
• Hyperlinked delivery over the HTTP protocol
– With an immense infrastructure (servers for
DNS, Proxy, cache management, DHCP, ...)
– Supporting HTTP parameters and content
negotiation (format/mime-type, language, ...)
2014-01-17
TenForce – project: LOD2
4
5. Categories of Internet Users (1/3)
• Categories of users
– Humans
– Applications (software)
• Information handling
– Consumers
– Publishers
– Aggregators
2014-01-17
TenForce – project: LOD2
5
6. Categories of Internet Users 2/3
• Examples of non human users ...
–
–
–
–
Index and search robots
Mobile applications
Browsers
Information aggregators and suppliers
•
•
•
•
•
•
•
2014-01-17
Portals – scientific editors (and others)
Weather forecast
Traffic
News
e-Goverrnement
Hotel and travel booking
...
TenForce – project: LOD2
6
7. Categories of Internet Users 3/3
• ... at the service of humans
– economic activities
– curiosity
– Control (processing procedures, security, ...)
– implementation of policies and directives
– traffic control and guidance
– ...
2014-01-17
TenForce – project: LOD2
7
8. The objective of web semantics
• Provide the tools (semantic language) to enable
communication between Internet users
(especially between applications)
– Manipulation of raw data to produce value-added
information is a key element of the service industry
knowledge
• Establish
– "Common understanding"
– "Iteroperabillity"
– "Collaboration"
2014-01-17
TenForce – project: LOD2
8
9. Key elements for the building a
"common understanding"
• Publish knowledge models for specific domains
– Taxonomy, classification, Thesaurus, subject register, Named Authoithy lists, ...
– About general publications, the labor market, legislation, geolocation, sports, politics, ...
• Publish vocabularies to express relationships, dependencies, data values
- knowledge base schema (ontology)
– Works of art, rights, licenses, trade, ...
– Establish a framework to build and publish (update and maintain) the above
publications
– Help make the Internet a growing collection of related databases
– Use standard or reference ontologies and taxonomies
• Publishing in a semantic format:
– content (HTML/human) AND metadata (RDF/application)
• Reliable publishers of quality data are added value
2014-01-17
TenForce – project: LOD2
9
11. The Demo Application:
CELLAR - LOD2
• What is CELLAR
– Owner: The Publication Office of the European Union
– On-line publications:
• EU legislation - content and metadata
• Shortly: EU and national Jurisprudence and case law.
• What is LOD2
– LOD: Linked Open Data
– links = hypertext links (HTTP)
• A research project of the 7th EU Framework Programme
• Participants: Industry, publishers, Universities, ICT enterprises
• The demo application
– Use CELLAR as the original source provider of content in private
published content.
• (example, the publisher: Wolters Kluwer – Germany [WKD])
2014-01-17
TenForce – project: LOD2
11
12. Demo Use Case (1/3)
• Legislation related products or tools used by:
–
–
–
–
editorial staff of commercial publishers,
their customers,
Their customer’s customers and
the general public
... are getting direct access to linked EU primary
source content and metadata to:
– improve information quality
– reduce editorial work
– broaden content and metadata product offering
13. Produits - sans LOD 2/5
Cloud products
1 source
Unique source of content and
metadata in the product
2014-01-17
TenForce – project: LOD2
13
14. Products – without LOD 3/5
• Without LOD
– access is via Eur-Lex which is not the primary
information source but a publication on its own
• delay, availability, not the raw content or metadata
– Scraped information is reviewed and stored locally
• task for WKD editorial staff
– WKD products need to be complete and selfcontained with limited linking to available online
original source
15. Produits - avec LOD 4/5
1)
original source of raw
content and metadata
– access by REST API
2) content and metadata sources
- human interface
Cloud products
3 Sources
3) enriched content and
enriched metadata sources
2014-01-17
TenForce – project: LOD2
15
16. Products – with LOD 5/5
• With LOD there is:
– Direct access to the primary information source
• content and metadata
– Application assistance for linking with and reusing
content and metadata from the original source
– WKD product offering is completed with the
available online original source by exposing the
origins
17. The Demo
• Advanced search (SPARQL) in web databases
– uses the vocabulary : DCAT – schema of the catalog of
datasets
• License information is added to datasets using linked
data (LD)
• Retrieve CELLAR stored content and metadata via LD
• Integrate with EUROVOC using LD
• Reuse CELLAR metadata in WKD content and add
provenance (PROV) refering the oroginal source.
• Goto the public URL
– http://212.71.25.157:8080/wp9IntAppEx-1.0/
2014-01-17
TenForce – project: LOD2
17
18. Demo (1/.)
• Demo in @en and @de, could be in 20+
languages
• Combined search on CELLAR WP7 LOD DCAT
– Full text = “Agrarstruktur Griechenland”
– Title = “Kommission”
– Issue date = “[ 1986-07-05 , 2000-01-15 [“
– Theme = “Besteuerung”
2014-01-17
TenForce – project: LOD2
18
19. •
full text = Agrarstruktur Griechenland
–
score/rank
Demo (1.1/.)
2014-01-17
TenForce – project: LOD2
19
20. •
•
full text = Agrarstruktur Griechenland
title = Kommission
Demo (1.2/.)
2014-01-17
TenForce – project: LOD2
20
21. •
•
•
full text = Agrarstruktur Griechenland
title = Kommission
publicaiton date [ 1986-07-05 , 2000-01-15 [
Demo (1.3/.)
2014-01-17
TenForce – project: LOD2
21
22. •
•
•
•
full text = Agrarstruktur Griechenland
title = Kommission
publicaiton date [ 1986-07-05 , 2000-01-15 [
theme = Besteuerung
2014-01-17
Demo (1.4/.)
TenForce – project: LOD2
22
23. Demo (2/.)
• License information
– Should be available in the original source
– Can be merged into the source by a download
service, addressed via DCAT distribution
information
– License reference provides
•
•
•
•
Work title
Publication Office publisher
License statement
Primary source content
23
24. license reference with primary source title (from DCAT register)
2014-01-17
Demo (2.1/.)
TenForce – project: LOD2
24
37. Exemples des cas d’usage connexes
2014-01-17
TenForce – project: LOD2
37
38. Scenario 1 – Employment
Use Case:
SME in the Aachen area has a job vacancy for a Java programmer
Background:
It is getting harder to find good software developers, esp.
beyond urban centres. Applicants in areas close to national
borders face the challenge that they need very practical
information around mobility, which is currently hardly available
Eurovoc topics covered:
Labour, Labour Market, Job Mobility, Job Vacancy
Sources involved:
European Legislation, Eurostat, destat, ESCO, Open Street Map,
Public transport Aachen, European Agency for Safety and Health
at Work
Solution:
EC contributes core ingredients for a central hub for
transnational job mobility challenges
TenForce – project: LOD2
38
39. Scenario 2 – Environment
Use Case:
German supermarket chain wants to start an image campaign on
seafood that is not in danger towards overfishing in the coming years
Background:
In Germany, the market for organic food is growing rapidly as is the
support for sustainability. Unfortunately, the information on sustainability
is so scattered, that there is no way – e.g. for advertising industry – to
react properly and seriously on this consumer trend
Eurovoc topics covered:
Nature reserve, environmental politics, management of resources,
Fishing industry, fresh fish, catch quota
Sources involved:
European legislation, Eurostat, destat, FAO, World Bank, European
Environment Agency
Solution:
EC contributes core ingredients for a central hub for environmental
protection
TenForce – project: LOD2
39
40. Scenario 3 – Energy
Use Case:
House owner in the Netherlands wants to build solar cells on his roof
Background:
Due to the „Energiewende“ in Germany, a lot of knowledge on
renewal energy, its impact, technologies and vendors has been
created on a national level. This information is also relevant for other
EU member states and their citizens
Eurovoc topics covered:
Energy industry, solar energy, photovoltaic cell
Sources involved:
European legislation, Eurostat, destat, Joint Research Center,
Agency for the Cooperation of Energy Regulators, International
Energy Agency, Stiftung Warentest
Solution:
EC contributes core ingredients for transnational energy challenges
TenForce – project: LOD2
40
41. Next for CELLAR (2014)
• Transform all published CELLAR legislation
according ELI directive
• Publish case law according ECLI directive
• Publish the catalog of available legislation and
case law (occasionally using the W3C DCAT
recommendation)
• Publish all EU used taxonomies using the LOD
best practices.
2014-01-17
TenForce – project: LOD2
41
43. The ESCO Project
• ESCO
– Project owner: DG-EMPL
– ESCO
• https://ec.europa.eu/esco/home (version 0)
• European Skills, Competences, Qualifications and
Occupations
• The knowledge base details concepts in three pillars
(taxonomies) and provides semantically rich relations
between the concepts.
• Re-uses several other taxonomies
(Eurostat, Unesco, DG-EAC, PO of the EU)
2014-01-17
TenForce – project: LOD2
43
44. ESCO Data Model
Occupation Pillar
Organized by economic activity
sectors
- Agriculture
- Education
- ...
NACE
subject
correspondance
exactMatch
ISCO08
ISCO88
ROME
O [Occupation]
broaderMatch
•
mapped to
broaderMatch
– ISCO xx (standard of ILO/UNO)
– ROME (French labor market standard)
– ...
2014-01-17
broaderMatch
exactMatch
TenForce – project: LOD2
44
45. 2014-01-17
ESCO Data Model
Occupation Pillar
• relation Description
text document - unstructured or semi structured
Occupation
TenForce – project: LOD2
aboutOccupation
Occupation Description:
================
=============================
=============================
=============================
Skills: ================
=============================
=============================
=============================
Qualifications: ================
=============================
================
=============================
45
46. •
Skills are
– transversal (across activity sectors)
– specific to an activity sector
•
Types of skills
– knowledge, skill, competence, ability
•
ESCO Data Model
Occupation Pillar
Group of skills
– Leaf Group of skills
•
Skill (member of a skill group)
text document - unstructured or semi structured
aboutOccupation
Occupation
2014-01-17
TenForce – project: LOD2
Occupation Description:
================
=============================
=============================
=============================
Skills: ================
=============================
=============================
=============================
Qualifications: ================
=============================
•
================
=============================
skill
essential
skill
desired
relation
occupation - skill
46
47. ESCO Data Model
Foreign
Language
expertise
(1)
main facet
(1)
sub facet
(3)
sub facet
(3)
Language
usage
Facet
Language
Facet
(4)
(4)
english
member
(2)
german
skos:exactMatch
oasis
LoC
EU-PO
understanding
Listening
member
topMember
narrower
Reading
dutch
1. Define the different aspects/dimensions of a concept:
- main facet (0..1)
- sub facets (0..n)
Speaking
Spoken
interaction
2. Define/specify the standard to use or give a good
description of the concepts contained by each facet
narrower
3. For each list of values from step 2. a collection of
concepts (Facet Group) is created.
Spoken
production
4. Manage the members of the facet group
Writing
• Skill and Skill facet
2014-01-17
TenForce – project: LOD2
47
48. ESCO Data Model
Qualification Pillar
• EQF, FoET, Awarding Body
ESCO
Q-Pillar
exactMatch
FoET
Q-groups
tagging
EQF
Q-members
hasAwardingBodyDescription
tagging
Awarding
Body
description
2014-01-17
TenForce – project: LOD2
48
51. ESCO Data Model
Qualification Pillar
• Qualification are maintained (direct) or included (indirect)
• direct Qualification are maintained by the DG-EMPL/ESCO.
Inclusion is an “as needed” basis
– International qualification schemes (outside of the EU)
• USA, Chine, ...
– Qualifications awarded by enterprises
• ORACLE, CISCO, Microsoft, ...
• Qualification subject to indirect inclusion
– Are maintained by national (EU member) organizations
– Registered and structured by DG EAC
(Education and Culture)
– Transferred to DG EMPL using the XML schema of DG-EAC
– Uploaded in ESCO by DG-EMPL/ESCO
2014-01-17
TenForce – project: LOD2
51
52. ESCO Data Model
Qualification Pillar
• Relationship description
XML document + occasional description
aboutQualification
Qualification Description:
================
=============================
=============================
=============================
qualification
hasAwardingBody
skill
competence
Skills: ================
=============================
=============================
=============================
2014-01-17
awarding body
TenForce – project: LOD2
skill
52
53. ESCO Data Model - summary
•
ESCO consists of three pillars (A pillar is a class of concepts)
– occupation
– competence
– qualification
•
ESCO concepts are mapped to other concepts of like taxonomies. The mapping is
expressed using SKOS mapping properties.
– The correspondence between ESCO and ISCO (ESCO occupation has as broader match an ISCO
occupation group)
– Planned: mapping ESCO to ROME (French occupation taxonomy)
... other mappings may be established as needed (O * NET)
•
The ESCO semantics are expressed using standard support taxonomies
– To tag ESCO pillar concepts (using DCMI property dcterms:subject)
– To structure recurring specializations in the ESCO model (using facets, collections or groups of
concepts)
– Examples
•
•
•
•
•
•
•
2014-01-17
Location (Eurostat: NUTS; ISO 3166)
economic activity sectors (Eurostat: NACE)
European qualification Framework (EQF)
CEFR (Common European Framework of Reference for Languages)
UNESCO (ISU): FoET, ISCED
Languages (Publication Office of the EU, Library of Congress, OASIS-psi, ISO 639)
...
TenForce – project: LOD2
53
54. Tools for Linked Open Data
2014-01-17
TenForce – project: LOD2
54
55. A small list of tools for LOD
• SPARQL end-point –NoSQL data base (RDF graph, Colonne)
– Virtuoso, Oracle, Allegrograph
• Frameworks integrating sematic libraries
– Jena, Sesame
• Analyser
– Topbraid, Protégé
• Alignment of knowledge bases
– SILK:
• http://lod2.eu/Project/Silk.html
• http://wifo5-03.informatik.uni-mannheim.de/bizer/silk/
• LOD best practices
– https://dvcs.w3.org/hg/gld/raw-file/default/bp/index.html
2014-01-17
TenForce – project: LOD2
55
@frL'Internet comme il est familier aujourd'hui: - texte, photo, vidéo, .... - hyperliens (URL en format: http://{domaine}/{chemin} )Livraison lien hypertexte sur le protocole HTTP - Avec une immense infrastructure (serveurs: DNS, proxy, gestion du cache, DHCP, ...) - Soutenir les paramètres HTTP et négociation de contenu (type MIME/format, langue, ...)@enThe internet as it is familiar now:text, photo, video, ....hyperlinks (URL en format: http://{domain}/{path} )Hyperlinked delivery over the HTTP protocolWith an immense infrastructure (servers for DNS, Proxy, cache management, DHCP, ...)Supporting HTTP parameters and content negotiation (format/mime-type, language, ...)
The internet as it is familiar now:text, photo, video, ....hyperlinksURL en format: http://{domain}/{path}Hyperlinked delivery over the HTTP protocolWith an immense infrastructure (servers for DNS, Proxy, cache management, DHCP, ...)Supporting HTTP parameters and content negotiation (format/mime-type, language, ...)