1. ULI meeting – 2013/05/28 – Page 1 http://lod2.eu
Creating Knowledge out of Interlinked Data
LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
AKSW, Universität Leipzig
Sebastian Hellmann
Linked Data
for
Abbreviations and Segmentation
http://nlp2rdf.org
http://lod2.eu
http://slideshare.net/kurzum
2. ULI meeting – 2013/05/28 – Page 2 http://lod2.eu
Sebastian Hellmann – researcher working on LOD2 EU Project
AKSW – Agile Knowledge and the Semantic Web research group in Leipzig -
http://aksw.org
InfAI – Institute for Applied Informatics - http://infai.org
Contents:
• Introduction to Linked Data
• Linked data close-up: DBpedia data set
• Exploitation of free and open data for CLDR
• Collaboration points
Introduction
4. ULI meeting – 2013/05/28 – Page 4 http://lod2.eu
http://lod-cloud.net
Linked Open Data
- All datasets provide open access to individual records via HTTP
- Many are free (no payment required, as in royalty-free)
- Some are openly licensed, e.g. CC-0 or CC-BY-SA
=> Open access also applies to published HTML on the WWW, but here the data
itself is published unrendered via RDF
6. ULI meeting – 2013/05/28 – Page 6 http://lod2.eu
• DBpedia is a crowd-sourced community effort to extract structured
information from Wikipedia and make this information available on the
Web.
• allows sophisticated queries against Wikipedia content
• allows links from the different data sets on the Web to Wikipedia data
• data is extracted continuously: http://live.dbpedia.org
• WikiData will be integrated within the next four months
via Google Summer of Code project
http://dbpedia.org
7. ULI meeting – 2013/05/28 – Page 7 http://lod2.eu
http://dbpedia.org/resource/Berlin
First paragraph in more
than 20 languages
8. ULI meeting – 2013/05/28 – Page 8 http://lod2.eu
http://dbpedia.org/resource/Berlin
Facts from Wikipedia infoboxes
12. ULI meeting – 2013/05/28 – Page 12 http://lod2.eu
• DBpedia Extraction Framework can be extended to easily extract any data
from Wikipedia: https://github.com/dbpedia/extraction-framework
• We are using it to extract corpora for NLP
• e.g. URI, surrounding text, surface form
• Probabilities:
• P(sf|URI): P that “apple” refers to wikipedia:Apple_Inc.
• P(URI|sf): P that wikipedia:Apple_Inc. is “apple” in text
Trend 2: DBpedia 4 NLP
13. ULI meeting – 2013/05/28 – Page 13 http://lod2.eu
• DBpedia is a data dissemination project:
• as download for reuse
• As Linked Data for interlinking
• Corpora will be published via the NLP Interchange RDF Format (NIF) -
http://nlp2rdf.org
Trend 2: DBpedia 4 NLP
14. ULI meeting – 2013/05/28 – Page 14 http://lod2.eu
DBpedia Live Abbreviation Example
Up-to-date gazetteer
- AFD party was founded earlier this year.
- lexical information and statistics could be included
16. ULI meeting – 2013/05/28 – Page 16 http://lod2.eu
• DBpedia
• Main version and I18n chapters
• http://dbpedia.org/Datasets/NLP
• Wiktionary 2 RDF: http://dbpedia.org/Wiktionary
• Wortschatz from Uni Leipzig (planned as Linked Data)
• http://corpora.informatik.uni-leipzig.de/download.html
• JRC Names: http://langtech.jrc.it/JRC-Names.html
• JRC-Names is a highly multilingual named entity resource for person and
organisation names
• Lexvo.org:
• provides URIs for ISO 629-3
• http://lexvo.org/id/iso639-3/spa
Example data sets from LLOD
17. ULI meeting – 2013/05/28 – Page 17 http://lod2.eu
http://linguistics.okfn.org/resources/llod/
=> CLDR will make an excellent addition to LLOD
Linguistic LOD
18. ULI meeting – 2013/05/28 – Page 18 http://lod2.eu
• CLDR as Linked Data
• empowers third parties to link to your authoritative data
• links are reusable
• LIDER EU project (presumably starting in October) will provide some
support for linked data adopters
• ULI members can join the industry and advisory board
• Workshop “DBpedia & NLP” in Oct, 2013
• http://nlp-dbpedia2013.blogs.aksw.org/
• Creation of free and open benchmarks in RDF
• We could promote CLDR and collect contributions
Collaboration points I
19. ULI meeting – 2013/05/28 – Page 19 http://lod2.eu
• Personally, I can:
• Join ULI mailing list
• Look out for appropriate data
• Look for opportunities (e.g. synergies with other projects)
• Provide some counseling (e.g. pointers, technology Q&A)
=> this will be done as preparation for the LIDER EU project, CLDR
• Academic collaboration:
• Excellent PhD student topic: Create corpora, interlink and fuse data and
benchmark effectiveness for segmentation
• Provide knowledge transfer (e.g. tutorials, visits)
Collaboration points II
20. ULI meeting – 2013/05/28 – Page 20 http://lod2.eu
Open Community – All feedback is welcome!
http://slideshare.net/kurzum
Websites:
http://dbpedia.org
http://nlp2rdf.org
http://lod2.eu
Thanks for your attention
22. ULI meeting – 2013/05/28 – Page 22 http://lod2.eu
LOD2 EU Project produces LOD2 Stack.
Three requirements to unlock Natural Language Processing (NLP) for the project:
1. NLP tool output is required to be in RDF
2. Scalability (less triples, focus on usefulness)
3. Common vocabulary to integrate and use NLP tools
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to
achieve interoperability between Natural Language Processing (NLP) tools,
language resources and annotations.
• Version 1.0 published in November 2011
• Version 2.0 is scheduled for completion within 2013
NLP Interchange Format 2.0
25. ULI meeting – 2013/05/28 – Page 25 http://lod2.eu
Adressing Primary Data
NIF 1.0:http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729
NIF 2.0 uses RFC 5147:
http://www.w3.org/DesignIssues/LinkedData.html#char=717,729
User extensions possible:
http://www.w3.org/DesignIssues/LinkedData.html#your_own_scheme
(but you have to link to documentation on how it was created)
26. ULI meeting – 2013/05/28 – Page 26 http://lod2.eu
As a Web Service
curl
--data-urlencode prefix="http://prefix.given.by/theClient#"
--data-urlencode input="[...]"
(--data-urlencode source=”http://www.w3.org/DesignIssues/LinkedData.html”)
http://nlp2rdf.lod2.eu/demo/NIFStanfordCore
27. ULI meeting – 2013/05/28 – Page 27 http://lod2.eu
• Tibeto-Burman languages: http://purl.org/olia/tibet.owl#VNst
• Russian TreeTagger :
http://purl.org/olia/russ.owl#partizip_prt_sg_neut_passiv_gen_langform
• German STTS: http://purl.org/olia/stts.owl#VAPP
• English Penn: http://purl.org/olia/penn.owl#VBG
→ all map to http://purl.org/olia/olia.owl#NonFiniteVerb
Ontologies of Lingingustic Annotation (OLiA) contain mappings for over 50 Tagsets (free
and open, CC-By)
Vocabulary Module: OLiA
28. ULI meeting – 2013/05/28 – Page 28 http://lod2.eu
• NIF 2.0 tries to be compatible to (Vocabulary Module):
• ITS 2.0
• FISE used in Apache Stanbol (IKS-EU Project)
• LAF/GrAF XML – ISO standard, recently published
• Fragment Identifiers by IETF and W3C
• Lemon ontology from Monnet EU Project
• NERD ontology from EURECOM and LinkedTV EU Project
• Xpointer/XPath URI scheme
• Open Annotation
NIF 2.0 - plans
29. ULI meeting – 2013/05/28 – Page 29 http://lod2.eu
NIF 2.0 :
• NIF is free and open (CC-0 or CC-BY)
• All ontologies will be hosted for persistently by University Leipzig
• Sign up on the mailinglist at http://nlp2rdf.org
• Provide Use Cases, Requirements, Implementations at:
• http://wiki.nlp2rdf.org/wiki/Use_cases#Use_cases
• http://wiki.nlp2rdf.org/wiki/Requirements#Requirements
How you can contribute:
30. ULI meeting – 2013/05/28 – Page 30 http://lod2.eu
LOD 2 Stack
• Currently project half-time
• Most of the tools are free and open source
• Commercial rollout planned
• Many webinars available
• You can integrate your tool via Debian package
http://lod2.eu
http://stack.lod2.eu/
How you can contribute: