SlideShare una empresa de Scribd logo
1 de 54
NIF Tutorial – 2013/09/24 – Page 1 http://lod2.eu
Creating Knowledge out of Interlinked Data
LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
AKSW, Universität Leipzig
Sebastian Hellmann
Content Analysis
and the Semantic Web
NIF 2.0 Tutorial
http://nlp2rdf.org
http://lod2.eu
http://slideshare.net/kurzum
NIF Tutorial – 2013/09/24 – Page 2 http://lod2.eu
Sebastian Hellmann – researcher working on LOD2 EU Project
AKSW – Agile Knowledge and the Semantic Web research group in Leipzig -
http://aksw.org
InfAI – Institute for Applied Informatics - http://infai.org
ALL DEMOS ARE AVAILABLE AT:
http://nlp2rdf.org/leipzig-24-9-2013
Introduction
NIF Tutorial – 2013/09/24 – Page 3 http://lod2.eu
Introduction
ALL DEMOS ARE AVAILABLE AT:
http://nlp2rdf.org/leipzig-24-9-2013
NIF Tutorial – 2013/09/24 – Page 4 http://lod2.eu
End users have tasks for NLP, but:
Each new tool is a challenge:
• How to download and start it?
• What kind of annotations does it use?
• How good does it perform (on my domain)?
• If badly, are there any alternatives? How can I find them?
• Open source?
• Lot's of know-how needed to exploit NLP.
• Lot's of data needed to exploit NLP.
Barriers to NLP
NIF Tutorial – 2013/09/24 – Page 5 http://lod2.eu
The Semantic Gap
NIF Tutorial – 2013/09/24 – Page 6 http://lod2.eu
NIF Tutorial – 2013/09/24 – Page 7 http://lod2.eu
• Part 1: exploiting free, open and interoperable (FOI)
language resources
• Part 2: Connecting text to these resources
• Part 3: tools, demos, infrastructure
From a walled garden to
an interoperable infrastructure
NIF Tutorial – 2013/09/24 – Page 8 http://lod2.eu
• Part 1: exploiting free, open and interoperable (FOI)
language resources
From a walled garden to
an interoperable infrastructure
NIF Tutorial – 2013/09/24 – Page 9 http://lod2.eu
http://lod-cloud.net
Linguistic/NLP Data currently filed
under “cross-domain”
NIF Tutorial – 2013/09/24 – Page 10 http://lod2.eu
http://lod-cloud.net
Linked Open Data
- All datasets provide open access to individual records via HTTP
- Many are free (no payment required, as in royalty-free)
- Some are openly licensed, e.g. CC-0 or CC-BY-SA
=> Open access also applies to published HTML on the WWW, but in LOD the data
itself is published unrendered via RDF
NIF Tutorial – 2013/09/24 – Page 11 http://lod2.eu
Question:
• Who knows how to add a new bubble to the LOD cloud?
From a walled garden to
an interoperable infrastructure
NIF Tutorial – 2013/09/24 – Page 12 http://lod2.eu
• Who knows how to add a new bubble to the LOD cloud?
http://datahub.io/group/linguistics
https://github.com/jmccrae/llod-cloud.py
http://validator.lod-cloud.net/validate.php
From a walled garden to
an interoperable infrastructure
NIF Tutorial – 2013/09/24 – Page 13 http://lod2.eu
NIF Tutorial – 2013/09/24 – Page 14 http://lod2.eu
NIF Tutorial – 2013/09/24 – Page 15 http://lod2.eu
Question:
• What are the most important data sets and ontologies for NLP?
• Who has used what?
FOI data
NIF Tutorial – 2013/09/24 – Page 16 http://lod2.eu
Analysis of mentions of Wikipedia / DBpedia at LREC 2012:
• https://www.google.com/webhp?q=site:http%3A%2F%2Fwww.lrec-conf.org%2
→ 163 papers
• https://www.google.com/webhp?q=site:http%3A%2F%2Fwww.lrec-conf.org%2
→ 24 papers
FOI data 1: Wikipedia / DBpedia
NIF Tutorial – 2013/09/24 – Page 17 http://lod2.eu
• Training data for NLP, e.g. URI, surrounding text, surface form
• Probabilities:
• P(sf|URI): P that “apple” refers to wikipedia:Apple_Inc.
• P(URI|sf): P that wikipedia:Apple_Inc. is “apple” in text
FOI data 1: Wikipedia / DBpedia
http://wiki.dbpedia.org/Datasets/NLP
NIF Tutorial – 2013/09/24 – Page 18 http://lod2.eu
FOI data: Wikipedia / DBpedia
http://lookup.dbpedia.org/api/search.asmx/KeywordSearch?
QueryString=sodium
http://lookup.dbpedia.org/api/search.asmx/KeywordSearch?
QueryString=sodium
Available data for “Sodium”
http://dbpedia.org/snorql
select ?labels where {
<http://dbpedia.org/resource/Sodium> rdfs:label ?labels .
} LIMIT 100
select ?altlabel where {
?redirect dbpedia-owl:wikiPageRedirects <http://dbpedia.org/resource/Sodium> .
?redirect rdfs:label ?altlabel .
} LIMIT 100
http://lcl.uniroma1.it/babelnet/explore.jsp?word=sodium&lang=EN
NIF Tutorial – 2013/09/24 – Page 19 http://lod2.eu
Wiktionary2RDF – Mediator Wrapper
http://dbpedia.org/Wiktionary
NIF Tutorial – 2013/09/24 – Page 20 http://lod2.eu
http://dbpedia.org/Wiktionary
NIF Tutorial – 2013/09/24 – Page 21 http://lod2.eu
http://dbpedia.org/Wiktionary
NIF Tutorial – 2013/09/24 – Page 22 http://lod2.eu
Wiktionary2RDF – Mediator Wrapper
http://dbpedia.org/Wiktionary
Mediator
Lemon
NIF Tutorial – 2013/09/24 – Page 23 http://lod2.eu
Wiktionary2RDF – Mediator Wrapper
http://lcl.uniroma1.it/babelnet/explore.jsp?word=sodium&lang=EN
https://en.wiktionary.org/wiki/sodium#English
http://wiktionary.dbpedia.org/resource/sodium
NIF Tutorial – 2013/09/24 – Page 24 http://lod2.eu
Lemon Ontology - http://lemon-model.net
NIF Tutorial – 2013/09/24 – Page 25 http://lod2.eu
Lemon Ontology - http://lemon-model.net
IntersectiveDataPropertyAdjective ("extinct" ,
dbpedia:conservationStatus ,"EX")
IntersectiveDataPropertyAdjective ("endangered" ,
dbpedia:conservationStatus ,"EN")
https://github.com/cunger/lemon.dbpedia
Christina Unger, John Mccrae, Sebastian Walter, Sara Winter and Philipp Cimiano (2013):
A lemon lexicon for DBpedia. NLP & DBpedia Workshop
NIF Tutorial – 2013/09/24 – Page 26 http://lod2.eu
• Part 2: Connecting text to these resources
From a walled garden to
an interoperable infrastructure
NIF Tutorial – 2013/09/24 – Page 27 http://lod2.eu
From a walled garden to
an interoperable infrastructure
https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki
NIF Tutorial – 2013/09/24 – Page 28 http://lod2.eu
From a walled garden to
an interoperable infrastructure
Overview of existing tools:
• http://en.wikipedia.org/wiki/Knowledge_extraction#Tools
NIF Tutorial – 2013/09/24 – Page 29 http://lod2.eu
From a walled garden to
an interoperable infrastructure
Developers nightmare:
• All tools belong to similar class of NLP tools
→ Wikifier or Named Entity Linking, SOA principle
But they all have:
• Heterogeneous output formats (JSON, XML)
• Heterogeneous API parameters
• Heterogeneous ways of annotating text:
• Some remove HTML internally, offsets not usable
• Some use byte offset instead of char offset
NIF Tutorial – 2013/09/24 – Page 30 http://lod2.eu
From a walled garden to
an interoperable infrastructure
Demo
• http://rdface.aksw.org/new/tinymce/examples/rdface.html
NIF Tutorial – 2013/09/24 – Page 31 http://lod2.eu
ITS 2.0 - http://www.w3.org/TR/its20/
The Internationalization Tag Set (ITS) 2.0 – enhances the foundation to
integrate automated processing of human language into core Web
technologies.
• Currently last call
• Driven by localization industry
• Embed translation aids into HTML and XML
• Robust way to encode NLP information in HTML
• ITS 2.0 describes 20 data categories → ontology
NIF Tutorial – 2013/09/24 – Page 32 http://lod2.eu
NIF overview
Summary
• Motivated the Walled Garden problem
• Overview of the emerging Web of Language resources
• Motivated the NLP tool heterogeneity problem
• Introduction of ITS 2.0 Use case for NIF
• Now: NIF 2.0
NIF Tutorial – 2013/09/24 – Page 33 http://lod2.eu
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to
achieve interoperability between Natural Language Processing (NLP) tools,
language resources and annotations.
• Reuse of existing standards such as RDF, OWL 2, the PROV Ontology, LAF
(ISO 24612), Unicode and RFC 5147
• Standardize access parameters, annotations (e.g. tokenization), validation
and log messages.
• A NIF workflow, however, can obviously not provide any better performance
(F-measure, speed) than a properly configured UIMA or GATE pipeline with
the same components.
• Lower entry barrier, easy data integration, reusability of tools and
conceptualisation, off-the-shelf solutions for common tasks.
NIF Overview
NIF Tutorial – 2013/09/24 – Page 34 http://lod2.eu
Relation of NIF and UIMA and Gate
• A Formal Framework for Linguistic Annotation (2000) by Steven Bird, Mark
Liberman
• take home message: generic annotation formats should be based on
graphs
• Ontologies in NIF (e.g. OliA, lemon) can be hard compiled for internal use (as
is done in Stanbol)
WP3 Task 3.2 – Community work: NLP2RDF
Not primarily aimed at
increasing features or
performance (F-Measure)
NIF Tutorial – 2013/09/24 – Page 35 http://lod2.eu
WP3 Task 3.2 – NIF overview
NIF Tutorial – 2013/09/24 – Page 36 http://lod2.eu
• NIF turns out to have a Unique selling proposition regarding NLP and RDF
• NIF will be the recommended RDF conversion of the Internationalisation
Tagset 2.0 of W3C (ITS 2.0) - http://www.w3.org/TR/its20/
• There was no alternative RDF vocabulary for this conversion available.
NIF Overview
NIF Tutorial – 2013/09/24 – Page 37 http://lod2.eu
WP3 Task 3.2 – Community work: NLP2RDF
RDFa parsers loose all provenance information:
<http://examples.com/books/wikinomics> dc:title ''Wikinomics'' .
https://en.wikipedia.org/wiki/RDFa
NIF Tutorial – 2013/09/24 – Page 38 http://lod2.eu
Available resources:
http://persistence.uni-leipzig.org/nlp2rdf/
Disclaimer
Migration to the online presence is still on-going, but there are 15 scientific
publications, e.g.
Integrating NLP using Linked Data. Sebastian Hellmann, Jens Lehmann, Sören Auer, and Martin Brümmer. 12th
International Semantic Web Conference, 21-25 October 2013, Sydney, Australia, (2013) -
http://svn.aksw.org/papers/2013/ISWC_NIF/public.pdf
NIF Overview
NIF Tutorial – 2013/09/24 – Page 39 http://lod2.eu
Question:
• What is a String?
NIF Basics
NIF Tutorial – 2013/09/24 – Page 40 http://lod2.eu
Counting strings is more difficult than it seems:
• Three ways to count Unicode:
• Code Units
• Code Points
• Graphems
• Encoding:
• UTF-8, 16, 32
NIF Basics Unicode
NIF Tutorial – 2013/09/24 – Page 41 http://lod2.eu
• Code Unit. The minimal bit combination that can represent a unit of encoded
text for processing or interchange. The Unicode Standard uses 8-bit code
units in the UTF-8 encoding form, 16-bit code units in the UTF-16 encoding
form, and 32-bit code units in the UTF-32 encoding form.
• Code Point. (1) Any value in the Unicode codespace; that is, the range of
integers from 0 to 10FFFF16. Not all code points are assigned to encoded
characters. See code point type. (2) A value, or position, for a character, in
any coded character set.
• Unicode Normal Form C
• http://unicode.org/reports/tr15/#Norm_Forms
Unicode
NIF Tutorial – 2013/09/24 – Page 42 http://lod2.eu
• Recommendation for RDF Literals
• http://unicode.org/reports/tr15/#Norm_Forms
Unicode Normal Form C
NIF Tutorial – 2013/09/24 – Page 43 http://lod2.eu
• NIF uses Unicode Normal Form C
• NIF counts in Code Points
Unicode
NIF Tutorial – 2013/09/24 – Page 44 http://lod2.eu
• Sadly, there are still implementation problems:
• Java length() vs. PHP strlen() function
• curl --data-urlencode i=" 대 " -d f=text "http://nlp2rdf.lod2.eu/nif-ws.php"
• Korean Character is URL encoded (#%EB%8C%80) and counted as 3
characters (not NFC in PHP)
Demo
ALL DEMOS ARE AVAILABLE AT:
http://nlp2rdf.org/leipzig-24-9-2013
NIF Tutorial – 2013/09/24 – Page 45 http://lod2.eu
• Now some RDF (finally):
• Note that in NIF the document is != content of the document.
• two different documents can have the same content
=> must not have the same URI
Context
NIF Tutorial – 2013/09/24 – Page 46 http://lod2.eu
Annotations
NIF Tutorial – 2013/09/24 – Page 47 http://lod2.eu
Tokenization
Christian Chiarcos, Julia Ritz, Manfred Stede: By all these lovely tokens... Merging conflicting tokenizations.
Language Resources and Evaluation 46(1): 53-74 (2012)
NIF Tutorial – 2013/09/24 – Page 48 http://lod2.eu
NIF
Demo:
http://nlp2rdf.lod2.eu/demo.php
NIF Tutorial – 2013/09/24 – Page 49 http://lod2.eu
• SPARQL queries produce (find) errors
• http://persistence.uni-leipzig.org/nlp2rdf/ontologies/testcase/lib/nif-2.0-suite.t
• RLOG – An RDF Logging Ontology
• ./validate.jar -i nif-erroneous-model.ttl -t file
• Demo → character count
• Demo → all errors
Validation over specification
ALL DEMOS ARE AVAILABLE AT:
http://nlp2rdf.org/leipzig-24-9-2013
NIF Tutorial – 2013/09/24 – Page 50 http://lod2.eu
NIF
Demo:
http://nlp2rdf.lod2.eu/demo.php
NIF Tutorial – 2013/09/24 – Page 51 http://lod2.eu
NIF
NIF Tutorial – 2013/09/24 – Page 52 http://lod2.eu
• http://www.w3.org/TR/its20/#conversion-to-nif
• http://www.w3.org/TR/its20/#nif-backconversion
NIF
NIF Tutorial – 2013/09/24 – Page 53 http://lod2.eu
• Demo
• Load Terminological model or Inference Model
Reasoning
NIF Tutorial – 2013/09/24 – Page 54 http://lod2.eu
Open Community – All feedback is welcome!
http://slideshare.net/kurzum
Websites:
http://dbpedia.org
http://nlp2rdf.org
http://lod2.eu
Thanks for your attention
ALL DEMOS ARE AVAILABLE AT:
http://nlp2rdf.org/leipzig-24-9-2013

Más contenido relacionado

La actualidad más candente

From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...WARCnet
 
Fremeatfeisgiltt2015 fremelinkeddatalocalisers-150603090934-lva1-app6891
Fremeatfeisgiltt2015 fremelinkeddatalocalisers-150603090934-lva1-app6891Fremeatfeisgiltt2015 fremelinkeddatalocalisers-150603090934-lva1-app6891
Fremeatfeisgiltt2015 fremelinkeddatalocalisers-150603090934-lva1-app6891FREMEProjectH2020
 
Freme at feisgiltt 2015 freme & linked data & localisers
Freme at feisgiltt 2015   freme & linked data & localisersFreme at feisgiltt 2015   freme & linked data & localisers
Freme at feisgiltt 2015 freme & linked data & localisersFelix Sasaki
 
META-NET: Language Technology for Europe
META-NET: Language Technology for EuropeMETA-NET: Language Technology for Europe
META-NET: Language Technology for EuropeGeorg Rehm
 
META-NET and META-SHARE: Language Technology for Europe
META-NET and META-SHARE: Language Technology for EuropeMETA-NET and META-SHARE: Language Technology for Europe
META-NET and META-SHARE: Language Technology for EuropeGeorg Rehm
 

La actualidad más candente (9)

LOD2 Webinar: SIREn
LOD2 Webinar: SIREnLOD2 Webinar: SIREn
LOD2 Webinar: SIREn
 
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and SparqlifyLOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and Sparqlify
 
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...
 
Fremeatfeisgiltt2015 fremelinkeddatalocalisers-150603090934-lva1-app6891
Fremeatfeisgiltt2015 fremelinkeddatalocalisers-150603090934-lva1-app6891Fremeatfeisgiltt2015 fremelinkeddatalocalisers-150603090934-lva1-app6891
Fremeatfeisgiltt2015 fremelinkeddatalocalisers-150603090934-lva1-app6891
 
Freme at feisgiltt 2015 freme & linked data & localisers
Freme at feisgiltt 2015   freme & linked data & localisersFreme at feisgiltt 2015   freme & linked data & localisers
Freme at feisgiltt 2015 freme & linked data & localisers
 
META-NET: Language Technology for Europe
META-NET: Language Technology for EuropeMETA-NET: Language Technology for Europe
META-NET: Language Technology for Europe
 
viki.
viki.viki.
viki.
 
LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: Zemanta / Open refine LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: Zemanta / Open refine
 
META-NET and META-SHARE: Language Technology for Europe
META-NET and META-SHARE: Language Technology for EuropeMETA-NET and META-SHARE: Language Technology for Europe
META-NET and META-SHARE: Language Technology for Europe
 

Similar a NIF 2.0 Tutorial: Content Analysis and the Semantic Web

Linked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and SegmentationLinked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and SegmentationSebastian Hellmann
 
NIF 2.0 Phd thesis intermediate report
NIF 2.0 Phd thesis intermediate reportNIF 2.0 Phd thesis intermediate report
NIF 2.0 Phd thesis intermediate reportSebastian Hellmann
 
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...Sebastian Hellmann
 
Populating DBpedia FR and using it for Extracting Information
Populating DBpedia FR and using it for Extracting InformationPopulating DBpedia FR and using it for Extracting Information
Populating DBpedia FR and using it for Extracting InformationJulien PLU
 
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...Pieter Pauwels
 
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...semanticsconference
 
Linked Data in Linguistics for NLP and Web Annotation
Linked Data in Linguistics for NLP and Web AnnotationLinked Data in Linguistics for NLP and Web Annotation
Linked Data in Linguistics for NLP and Web AnnotationSebastian Hellmann
 
NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23Sebastian Hellmann
 
Linguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkLinguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkSebastian Hellmann
 
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF toolsCIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF toolsPieter Pauwels
 
TPAC2016 - From Linked Building Data to Building Data on the Web
TPAC2016 - From Linked Building Data to Building Data on the WebTPAC2016 - From Linked Building Data to Building Data on the Web
TPAC2016 - From Linked Building Data to Building Data on the WebPieter Pauwels
 
Navigation-induced Knowledge Engineering by Example
 Navigation-induced Knowledge Engineering by Example Navigation-induced Knowledge Engineering by Example
Navigation-induced Knowledge Engineering by ExampleSebastian Hellmann
 
RDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct MappingRDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct MappingBoris Villazón-Terrazas
 

Similar a NIF 2.0 Tutorial: Content Analysis and the Semantic Web (20)

Linked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and SegmentationLinked Data for Abbreviations and Segmentation
Linked Data for Abbreviations and Segmentation
 
NIF 2.0 Phd thesis intermediate report
NIF 2.0 Phd thesis intermediate reportNIF 2.0 Phd thesis intermediate report
NIF 2.0 Phd thesis intermediate report
 
NIF 2.0 draft for Pisa
NIF 2.0 draft for PisaNIF 2.0 draft for Pisa
NIF 2.0 draft for Pisa
 
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
 
Populating DBpedia FR and using it for Extracting Information
Populating DBpedia FR and using it for Extracting InformationPopulating DBpedia FR and using it for Extracting Information
Populating DBpedia FR and using it for Extracting Information
 
LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases
LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge BasesLOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases
LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases
 
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
BuildingSMART Standards Summit 2015 - Technical Room - Linked Data for Constr...
 
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
 
Linked Data in Linguistics for NLP and Web Annotation
Linked Data in Linguistics for NLP and Web AnnotationLinked Data in Linguistics for NLP and Web Annotation
Linked Data in Linguistics for NLP and Web Annotation
 
LOD2 webinar series: Virtuoso by OpenLink Software
LOD2 webinar series: Virtuoso by OpenLink SoftwareLOD2 webinar series: Virtuoso by OpenLink Software
LOD2 webinar series: Virtuoso by OpenLink Software
 
Linked Open Data stuff
Linked Open Data stuffLinked Open Data stuff
Linked Open Data stuff
 
NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23NIF - Version 1.0 - 2011/10/23
NIF - Version 1.0 - 2011/10/23
 
Linguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkLinguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future Work
 
LOD2 Webinar Series: LOD2 in information and publishing industry
LOD2 Webinar Series: LOD2 in information and publishing industryLOD2 Webinar Series: LOD2 in information and publishing industry
LOD2 Webinar Series: LOD2 in information and publishing industry
 
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF toolsCIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
 
TPAC2016 - From Linked Building Data to Building Data on the Web
TPAC2016 - From Linked Building Data to Building Data on the WebTPAC2016 - From Linked Building Data to Building Data on the Web
TPAC2016 - From Linked Building Data to Building Data on the Web
 
Navigation-induced Knowledge Engineering by Example
 Navigation-induced Knowledge Engineering by Example Navigation-induced Knowledge Engineering by Example
Navigation-induced Knowledge Engineering by Example
 
Free Webinar: LOD2 Stack - 1st release
Free Webinar: LOD2 Stack - 1st releaseFree Webinar: LOD2 Stack - 1st release
Free Webinar: LOD2 Stack - 1st release
 
RDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct MappingRDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct Mapping
 
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and AuthoringLOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
 

Más de Sebastian Hellmann

DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016Sebastian Hellmann
 
Lider Reference Model ld4lt session March, 3rd, 2015
Lider Reference Model ld4lt session  March, 3rd, 2015Lider Reference Model ld4lt session  March, 3rd, 2015
Lider Reference Model ld4lt session March, 3rd, 2015Sebastian Hellmann
 
LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015Sebastian Hellmann
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataSebastian Hellmann
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftSebastian Hellmann
 

Más de Sebastian Hellmann (10)

KEDL DBpedia 2019
KEDL DBpedia  2019KEDL DBpedia  2019
KEDL DBpedia 2019
 
DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016
 
Lider Reference Model ld4lt session March, 3rd, 2015
Lider Reference Model ld4lt session  March, 3rd, 2015Lider Reference Model ld4lt session  March, 3rd, 2015
Lider Reference Model ld4lt session March, 3rd, 2015
 
LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of Data
 
Introduction to LDL 2012
Introduction to LDL 2012Introduction to LDL 2012
Introduction to LDL 2012
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
NIF - NLP Interchange Format
NIF - NLP Interchange FormatNIF - NLP Interchange Format
NIF - NLP Interchange Format
 
Tool collection as linkeddata
Tool collection as linkeddataTool collection as linkeddata
Tool collection as linkeddata
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draft
 

Último

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Último (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

NIF 2.0 Tutorial: Content Analysis and the Semantic Web

  • 1. NIF Tutorial – 2013/09/24 – Page 1 http://lod2.eu Creating Knowledge out of Interlinked Data LOD2 Presentation . 02.09.2010 . Page http://lod2.eu AKSW, Universität Leipzig Sebastian Hellmann Content Analysis and the Semantic Web NIF 2.0 Tutorial http://nlp2rdf.org http://lod2.eu http://slideshare.net/kurzum
  • 2. NIF Tutorial – 2013/09/24 – Page 2 http://lod2.eu Sebastian Hellmann – researcher working on LOD2 EU Project AKSW – Agile Knowledge and the Semantic Web research group in Leipzig - http://aksw.org InfAI – Institute for Applied Informatics - http://infai.org ALL DEMOS ARE AVAILABLE AT: http://nlp2rdf.org/leipzig-24-9-2013 Introduction
  • 3. NIF Tutorial – 2013/09/24 – Page 3 http://lod2.eu Introduction ALL DEMOS ARE AVAILABLE AT: http://nlp2rdf.org/leipzig-24-9-2013
  • 4. NIF Tutorial – 2013/09/24 – Page 4 http://lod2.eu End users have tasks for NLP, but: Each new tool is a challenge: • How to download and start it? • What kind of annotations does it use? • How good does it perform (on my domain)? • If badly, are there any alternatives? How can I find them? • Open source? • Lot's of know-how needed to exploit NLP. • Lot's of data needed to exploit NLP. Barriers to NLP
  • 5. NIF Tutorial – 2013/09/24 – Page 5 http://lod2.eu The Semantic Gap
  • 6. NIF Tutorial – 2013/09/24 – Page 6 http://lod2.eu
  • 7. NIF Tutorial – 2013/09/24 – Page 7 http://lod2.eu • Part 1: exploiting free, open and interoperable (FOI) language resources • Part 2: Connecting text to these resources • Part 3: tools, demos, infrastructure From a walled garden to an interoperable infrastructure
  • 8. NIF Tutorial – 2013/09/24 – Page 8 http://lod2.eu • Part 1: exploiting free, open and interoperable (FOI) language resources From a walled garden to an interoperable infrastructure
  • 9. NIF Tutorial – 2013/09/24 – Page 9 http://lod2.eu http://lod-cloud.net Linguistic/NLP Data currently filed under “cross-domain”
  • 10. NIF Tutorial – 2013/09/24 – Page 10 http://lod2.eu http://lod-cloud.net Linked Open Data - All datasets provide open access to individual records via HTTP - Many are free (no payment required, as in royalty-free) - Some are openly licensed, e.g. CC-0 or CC-BY-SA => Open access also applies to published HTML on the WWW, but in LOD the data itself is published unrendered via RDF
  • 11. NIF Tutorial – 2013/09/24 – Page 11 http://lod2.eu Question: • Who knows how to add a new bubble to the LOD cloud? From a walled garden to an interoperable infrastructure
  • 12. NIF Tutorial – 2013/09/24 – Page 12 http://lod2.eu • Who knows how to add a new bubble to the LOD cloud? http://datahub.io/group/linguistics https://github.com/jmccrae/llod-cloud.py http://validator.lod-cloud.net/validate.php From a walled garden to an interoperable infrastructure
  • 13. NIF Tutorial – 2013/09/24 – Page 13 http://lod2.eu
  • 14. NIF Tutorial – 2013/09/24 – Page 14 http://lod2.eu
  • 15. NIF Tutorial – 2013/09/24 – Page 15 http://lod2.eu Question: • What are the most important data sets and ontologies for NLP? • Who has used what? FOI data
  • 16. NIF Tutorial – 2013/09/24 – Page 16 http://lod2.eu Analysis of mentions of Wikipedia / DBpedia at LREC 2012: • https://www.google.com/webhp?q=site:http%3A%2F%2Fwww.lrec-conf.org%2 → 163 papers • https://www.google.com/webhp?q=site:http%3A%2F%2Fwww.lrec-conf.org%2 → 24 papers FOI data 1: Wikipedia / DBpedia
  • 17. NIF Tutorial – 2013/09/24 – Page 17 http://lod2.eu • Training data for NLP, e.g. URI, surrounding text, surface form • Probabilities: • P(sf|URI): P that “apple” refers to wikipedia:Apple_Inc. • P(URI|sf): P that wikipedia:Apple_Inc. is “apple” in text FOI data 1: Wikipedia / DBpedia http://wiki.dbpedia.org/Datasets/NLP
  • 18. NIF Tutorial – 2013/09/24 – Page 18 http://lod2.eu FOI data: Wikipedia / DBpedia http://lookup.dbpedia.org/api/search.asmx/KeywordSearch? QueryString=sodium http://lookup.dbpedia.org/api/search.asmx/KeywordSearch? QueryString=sodium Available data for “Sodium” http://dbpedia.org/snorql select ?labels where { <http://dbpedia.org/resource/Sodium> rdfs:label ?labels . } LIMIT 100 select ?altlabel where { ?redirect dbpedia-owl:wikiPageRedirects <http://dbpedia.org/resource/Sodium> . ?redirect rdfs:label ?altlabel . } LIMIT 100 http://lcl.uniroma1.it/babelnet/explore.jsp?word=sodium&lang=EN
  • 19. NIF Tutorial – 2013/09/24 – Page 19 http://lod2.eu Wiktionary2RDF – Mediator Wrapper http://dbpedia.org/Wiktionary
  • 20. NIF Tutorial – 2013/09/24 – Page 20 http://lod2.eu http://dbpedia.org/Wiktionary
  • 21. NIF Tutorial – 2013/09/24 – Page 21 http://lod2.eu http://dbpedia.org/Wiktionary
  • 22. NIF Tutorial – 2013/09/24 – Page 22 http://lod2.eu Wiktionary2RDF – Mediator Wrapper http://dbpedia.org/Wiktionary Mediator Lemon
  • 23. NIF Tutorial – 2013/09/24 – Page 23 http://lod2.eu Wiktionary2RDF – Mediator Wrapper http://lcl.uniroma1.it/babelnet/explore.jsp?word=sodium&lang=EN https://en.wiktionary.org/wiki/sodium#English http://wiktionary.dbpedia.org/resource/sodium
  • 24. NIF Tutorial – 2013/09/24 – Page 24 http://lod2.eu Lemon Ontology - http://lemon-model.net
  • 25. NIF Tutorial – 2013/09/24 – Page 25 http://lod2.eu Lemon Ontology - http://lemon-model.net IntersectiveDataPropertyAdjective ("extinct" , dbpedia:conservationStatus ,"EX") IntersectiveDataPropertyAdjective ("endangered" , dbpedia:conservationStatus ,"EN") https://github.com/cunger/lemon.dbpedia Christina Unger, John Mccrae, Sebastian Walter, Sara Winter and Philipp Cimiano (2013): A lemon lexicon for DBpedia. NLP & DBpedia Workshop
  • 26. NIF Tutorial – 2013/09/24 – Page 26 http://lod2.eu • Part 2: Connecting text to these resources From a walled garden to an interoperable infrastructure
  • 27. NIF Tutorial – 2013/09/24 – Page 27 http://lod2.eu From a walled garden to an interoperable infrastructure https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki
  • 28. NIF Tutorial – 2013/09/24 – Page 28 http://lod2.eu From a walled garden to an interoperable infrastructure Overview of existing tools: • http://en.wikipedia.org/wiki/Knowledge_extraction#Tools
  • 29. NIF Tutorial – 2013/09/24 – Page 29 http://lod2.eu From a walled garden to an interoperable infrastructure Developers nightmare: • All tools belong to similar class of NLP tools → Wikifier or Named Entity Linking, SOA principle But they all have: • Heterogeneous output formats (JSON, XML) • Heterogeneous API parameters • Heterogeneous ways of annotating text: • Some remove HTML internally, offsets not usable • Some use byte offset instead of char offset
  • 30. NIF Tutorial – 2013/09/24 – Page 30 http://lod2.eu From a walled garden to an interoperable infrastructure Demo • http://rdface.aksw.org/new/tinymce/examples/rdface.html
  • 31. NIF Tutorial – 2013/09/24 – Page 31 http://lod2.eu ITS 2.0 - http://www.w3.org/TR/its20/ The Internationalization Tag Set (ITS) 2.0 – enhances the foundation to integrate automated processing of human language into core Web technologies. • Currently last call • Driven by localization industry • Embed translation aids into HTML and XML • Robust way to encode NLP information in HTML • ITS 2.0 describes 20 data categories → ontology
  • 32. NIF Tutorial – 2013/09/24 – Page 32 http://lod2.eu NIF overview Summary • Motivated the Walled Garden problem • Overview of the emerging Web of Language resources • Motivated the NLP tool heterogeneity problem • Introduction of ITS 2.0 Use case for NIF • Now: NIF 2.0
  • 33. NIF Tutorial – 2013/09/24 – Page 33 http://lod2.eu The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. • Reuse of existing standards such as RDF, OWL 2, the PROV Ontology, LAF (ISO 24612), Unicode and RFC 5147 • Standardize access parameters, annotations (e.g. tokenization), validation and log messages. • A NIF workflow, however, can obviously not provide any better performance (F-measure, speed) than a properly configured UIMA or GATE pipeline with the same components. • Lower entry barrier, easy data integration, reusability of tools and conceptualisation, off-the-shelf solutions for common tasks. NIF Overview
  • 34. NIF Tutorial – 2013/09/24 – Page 34 http://lod2.eu Relation of NIF and UIMA and Gate • A Formal Framework for Linguistic Annotation (2000) by Steven Bird, Mark Liberman • take home message: generic annotation formats should be based on graphs • Ontologies in NIF (e.g. OliA, lemon) can be hard compiled for internal use (as is done in Stanbol) WP3 Task 3.2 – Community work: NLP2RDF Not primarily aimed at increasing features or performance (F-Measure)
  • 35. NIF Tutorial – 2013/09/24 – Page 35 http://lod2.eu WP3 Task 3.2 – NIF overview
  • 36. NIF Tutorial – 2013/09/24 – Page 36 http://lod2.eu • NIF turns out to have a Unique selling proposition regarding NLP and RDF • NIF will be the recommended RDF conversion of the Internationalisation Tagset 2.0 of W3C (ITS 2.0) - http://www.w3.org/TR/its20/ • There was no alternative RDF vocabulary for this conversion available. NIF Overview
  • 37. NIF Tutorial – 2013/09/24 – Page 37 http://lod2.eu WP3 Task 3.2 – Community work: NLP2RDF RDFa parsers loose all provenance information: <http://examples.com/books/wikinomics> dc:title ''Wikinomics'' . https://en.wikipedia.org/wiki/RDFa
  • 38. NIF Tutorial – 2013/09/24 – Page 38 http://lod2.eu Available resources: http://persistence.uni-leipzig.org/nlp2rdf/ Disclaimer Migration to the online presence is still on-going, but there are 15 scientific publications, e.g. Integrating NLP using Linked Data. Sebastian Hellmann, Jens Lehmann, Sören Auer, and Martin Brümmer. 12th International Semantic Web Conference, 21-25 October 2013, Sydney, Australia, (2013) - http://svn.aksw.org/papers/2013/ISWC_NIF/public.pdf NIF Overview
  • 39. NIF Tutorial – 2013/09/24 – Page 39 http://lod2.eu Question: • What is a String? NIF Basics
  • 40. NIF Tutorial – 2013/09/24 – Page 40 http://lod2.eu Counting strings is more difficult than it seems: • Three ways to count Unicode: • Code Units • Code Points • Graphems • Encoding: • UTF-8, 16, 32 NIF Basics Unicode
  • 41. NIF Tutorial – 2013/09/24 – Page 41 http://lod2.eu • Code Unit. The minimal bit combination that can represent a unit of encoded text for processing or interchange. The Unicode Standard uses 8-bit code units in the UTF-8 encoding form, 16-bit code units in the UTF-16 encoding form, and 32-bit code units in the UTF-32 encoding form. • Code Point. (1) Any value in the Unicode codespace; that is, the range of integers from 0 to 10FFFF16. Not all code points are assigned to encoded characters. See code point type. (2) A value, or position, for a character, in any coded character set. • Unicode Normal Form C • http://unicode.org/reports/tr15/#Norm_Forms Unicode
  • 42. NIF Tutorial – 2013/09/24 – Page 42 http://lod2.eu • Recommendation for RDF Literals • http://unicode.org/reports/tr15/#Norm_Forms Unicode Normal Form C
  • 43. NIF Tutorial – 2013/09/24 – Page 43 http://lod2.eu • NIF uses Unicode Normal Form C • NIF counts in Code Points Unicode
  • 44. NIF Tutorial – 2013/09/24 – Page 44 http://lod2.eu • Sadly, there are still implementation problems: • Java length() vs. PHP strlen() function • curl --data-urlencode i=" 대 " -d f=text "http://nlp2rdf.lod2.eu/nif-ws.php" • Korean Character is URL encoded (#%EB%8C%80) and counted as 3 characters (not NFC in PHP) Demo ALL DEMOS ARE AVAILABLE AT: http://nlp2rdf.org/leipzig-24-9-2013
  • 45. NIF Tutorial – 2013/09/24 – Page 45 http://lod2.eu • Now some RDF (finally): • Note that in NIF the document is != content of the document. • two different documents can have the same content => must not have the same URI Context
  • 46. NIF Tutorial – 2013/09/24 – Page 46 http://lod2.eu Annotations
  • 47. NIF Tutorial – 2013/09/24 – Page 47 http://lod2.eu Tokenization Christian Chiarcos, Julia Ritz, Manfred Stede: By all these lovely tokens... Merging conflicting tokenizations. Language Resources and Evaluation 46(1): 53-74 (2012)
  • 48. NIF Tutorial – 2013/09/24 – Page 48 http://lod2.eu NIF Demo: http://nlp2rdf.lod2.eu/demo.php
  • 49. NIF Tutorial – 2013/09/24 – Page 49 http://lod2.eu • SPARQL queries produce (find) errors • http://persistence.uni-leipzig.org/nlp2rdf/ontologies/testcase/lib/nif-2.0-suite.t • RLOG – An RDF Logging Ontology • ./validate.jar -i nif-erroneous-model.ttl -t file • Demo → character count • Demo → all errors Validation over specification ALL DEMOS ARE AVAILABLE AT: http://nlp2rdf.org/leipzig-24-9-2013
  • 50. NIF Tutorial – 2013/09/24 – Page 50 http://lod2.eu NIF Demo: http://nlp2rdf.lod2.eu/demo.php
  • 51. NIF Tutorial – 2013/09/24 – Page 51 http://lod2.eu NIF
  • 52. NIF Tutorial – 2013/09/24 – Page 52 http://lod2.eu • http://www.w3.org/TR/its20/#conversion-to-nif • http://www.w3.org/TR/its20/#nif-backconversion NIF
  • 53. NIF Tutorial – 2013/09/24 – Page 53 http://lod2.eu • Demo • Load Terminological model or Inference Model Reasoning
  • 54. NIF Tutorial – 2013/09/24 – Page 54 http://lod2.eu Open Community – All feedback is welcome! http://slideshare.net/kurzum Websites: http://dbpedia.org http://nlp2rdf.org http://lod2.eu Thanks for your attention ALL DEMOS ARE AVAILABLE AT: http://nlp2rdf.org/leipzig-24-9-2013