Oc wg-nif-20130711

www.sti-innsbruck.at© Copyright 2008 STI INNSBRUCK www.sti-innsbruck.at
NLP Interchange Format
José M. García

www.sti-innsbruck.at
Outline
• What is NIF?
• Design requirements
• URI schemes
• NIF ontologies
• Use cases
• Relationship with ELRA
• Roadmap for NIF 2.0
• Conclusions
2

What is NIF?
• Natural Language Processing Interchange Format
• NIF is an RDF/OWL-based format that aims to achieve interoperability
between Natural Language Processing (NLP) tools, language
resources and annotations.
• Building blocks
– URI scheme for identifying elements in texts
– Ontology for describing common NLP terms
• Created and maintained by AKSW group of University of Leipzig, during
the LOD2 EU project.
• Community project: http://persistence.uni-leipzig.org/nlp2rdf/
3

NIF design requirements
Compatibility
with RDF
Coverage
Structural
Interoperability
Conceptual
Interoperability
Granularity
Provenance and
Confidence
Simplicity Scalability
4

URI schemes
• Text needs to be referenceable by URIs
• With URI references text can be used as resources in RDF statements
• NIF distinguishes:
– Documents
– Text of the document
– Substrings of the text.
• URI scheme is an algorithm to create IDs for text and substrings
• URI elements
– Document URI
– Separator
– Character indices
5

RFC 5147
• Canonical URI scheme for NIF is based on RFC 5147
• It standardizes fragment identifiers for text/plain media type
6
http://www.w3.org/DesignIssues/LinkedData.html

RFC 5147
7
http://www.w3.org/DesignIssues/LinkedData.html#char=0,26610

RFC 5147
8

NIF Core Ontology
• Classes and properties to describe relation between
– Documents
– Text
– Substrings
– Corresponding URI schemes
9

NIF Core Ontology
• Additional classes and properties (unstable/testing)
– More URI schemes
– Text structure (words, sentences, paragraphs…)
– Part of Speech (POS)
– Annotations with Stanbol
– Confidence
10

Workflows, Modularity and Extensibility of NIF
• Workflows for NLP integration
– Normalization
– Tokenization
– Merge RDF annotations
11

• NIF ontology logical modules
– Terminological model
– Inference model
– Validation model
• Vocabulary modules
– FISE
– ITS
– OLiA
– NERD
– …
12

• Granularity profiles
13

ITS Use Case
• The Internationalization Tag Set 2.0 is a W3C working draft that is
becoming a Recommendation.
• ITS standardizes HTML and XML attributes which can be used to
annotate nodes with processing information for language service
providers (i18n, l10n)
• ITS 2.0 RDF ontology was developed using NIF, including a round-trip
conversion algorithm from ITS to NIF.
• NIF is expected to receive wide adoption by translation & language
service providers
• ITS 2.0 RDF ontology provides properties which can be used to provide
best practices for NLP annotations.
14

OLiA Use Case
• The Ontologies of Linguistic Annotation provide stable identifiers for
morpho-syntactical annotation tag sets, so that NLP tools can use these
ids for better interoperability.
• OLiA provides Annotation Models and a Reference Model, comprising
more than 110 OWL ontologies for over 34 tag sets in 69 languages
• Features
– Documentation
– Flexible Granularity
– Language Independence
• NIF provides two properties
– nif:oliaIndividual (links a nif:String to an OLiA Annotation Model)
– nif:oliaCategory (links to the Reference Model)
15

RDFaCE Use Case
• RDFa Content Editor is a rich text editor that supports WYSIWYM
authoring including various views of the semantically enriched textual
content.
• It combines results of different NLP APIs for automatic content
annotation
– Heterogeneous APIs access, URI generation and output data structure
– Solution: server-side proxy, hard-coded input and connection of each API.
• NIF simplified the integration, adding an interoperability layer
16

What is ELRA?
• European Language Resources Association
• http://www.elra.info
• Effort to make available Language Resources (LR) for language
engineering and to evaluate language engineering technologies.
• LR marketplace
• Related organizations
– ELDA (ELRA’s operational body)
– LREC conferences
17

What is ELRA?
18

Relationship with NIF
• Different objectives
• LR written resources (esp. Corpora) can be annotated with NIF for
further interoperability and integration with NLP tools
• ADVANTAGE: Large test data collection to evaluate NLP tools
• DISADVANTAGE: Cost of LR (though there are free ones)
19

Roadmap for NIF 2.0
• Release of NIF 1.0
– DONE (Nov 2009)
• Release of NIF 2.0 Draft
– CURRENT effort on solving pending issues
– Adoption in ITS 2.0 W3C (soon-to-be) Recommendation
– NIF-Core ontology is becoming stable
– RLOG - an RDF Logging Ontology
– NIF Validator software available
• Release of NIF 2.0 Core
• Release of NIF 2.0 Extensions
– ITS ontology, PROV ontology, Lemon Ontology, NERD, UIMA, MARL opinion
ontology…
20

Conclusions
• NIF allows to integrate NLP tools using Linked Data
• Ongoing effort
• Many adopters and supporters
– LOD2 EU project
– Several W3C working groups
– Named Entity Recognition and Disambiguation (NERD)
– Ontologies of Linguistic Annotation (OLiA)
– …
• 27 different implementations and use cases
– Some available at http://persistence.uni-leipzig.org/nlp2rdf/
21

References
1. http://persistence.uni-leipzig.org/nlp2rdf/
2. Integrating NLP using Linked Data by Sebastian Hellmann, Jens
Lehmann, Sören Auer, and Martin Brümmer in 12th International
Semantic Web Conference, 21-25 October 2013, Sydney, Australia
23

Oc wg-nif-20130711

Recomendados

Recomendados

Más contenido relacionado

Similar a Oc wg-nif-20130711

Similar a Oc wg-nif-20130711 (20)

Más de STIinnsbruck

Más de STIinnsbruck (20)

Último

Último (20)

Oc wg-nif-20130711