Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
19/07/2015 1Presenter name
“Methodology for Linguistic Linked Open Data
generation. The Apertium RDF case”
Jorge Gracia, D...
20/07/2015 2Jorge Gracia, Daniel Vila-Suero
Outline
Introduction
Methodology
Analysis of data sources
Modelling
URI/IRI de...
20/07/2015 3Jorge Gracia, Daniel Vila-Suero
Introduction
20/07/2015 4Jorge Gracia, Daniel Vila-Suero
Introduction
Current multilingual lexica and electronic dictionaries
• Proprie...
20/07/2015 5Jorge Gracia, Daniel Vila-Suero
Introduction
GOAL: to expose linguistic data contained in language
resources a...
20/07/2015 6Jorge Gracia, Daniel Vila-Suero
Introduction
Different methods and guidelines available:
• LOD2
• Datalift
• W...
20/07/2015 7Jorge Gracia, Daniel Vila-Suero
Introduction
Guidelines for Multilingual Linked Data
Guidelines for LD generat...
20/07/2015 8Jorge Gracia, Daniel Vila-Suero
Introduction
Reference cards for Linguistic Linked Data
• How to publish Lingu...
20/07/2015 9Jorge Gracia, Daniel Vila-Suero
Introduction
Motivating example:
the Apertium bilingual dictionaries
...althou...
20/07/2015 10Jorge Gracia, Daniel Vila-Suero
Introduction
Apertium [http://www.apertium.org] open source
platform for Mach...
20/07/2015 11Jorge Gracia, Daniel Vila-Suero
Introduction
Afrikaans <-> Dutch
Breton --> French
Catalan <-> Italian
Welsh ...
20/07/2015 12Jorge Gracia, Daniel Vila-Suero
Methodology
20/07/2015 13Jorge Gracia, Daniel Vila-Suero
Main activities:
1. Analysis of data sources
2. Modelling
3. URI/IRI design
4...
20/07/2015 14Jorge Gracia, Daniel Vila-Suero
Main activities:
1. Analysis of data sources
2. Modelling
3. URI/IRI design
4...
20/07/2015 15Jorge Gracia, Daniel Vila-Suero
Analysis of data sources
The goal is to:
• Specify and analyse the data sourc...
20/07/2015 16Jorge Gracia, Daniel Vila-Suero
Analysis of data sources EXAMPLE
Documentation of data source:
– Type of data...
20/07/2015 17Jorge Gracia, Daniel Vila-Suero
Analysis of data sources EXAMPLE
<Lexicon>
<feat att="language" val="en"/>
.....
20/07/2015 18Jorge Gracia, Daniel Vila-Suero
Main activities:
1. Analysis of data sources
2. Modelling
3. URI/IRI design
4...
20/07/2015 19Jorge Gracia, Daniel Vila-Suero
Modelling
Modelling tasks
1. Analysis and selection of domain vocabularies
2....
20/07/2015 20Jorge Gracia, Daniel Vila-Suero
NIF
NLP Interchange Format
LexInfo
Dublin Core
Use http://lov.okfn.org/
Model...
20/07/2015 21Jorge Gracia, Daniel Vila-Suero
Modelling
20/07/2015 22Jorge Gracia, Daniel Vila-Suero
LexicalSense
trans
translationTarget
context
TranslationSet Translation
trans...
20/07/2015 23Jorge Gracia, Daniel Vila-Suero
Modelling EXAMPLE
Mapping of data sources
20/07/2015 24Jorge Gracia, Daniel Vila-Suero
Modelling EXAMPLE
20/07/2015 25Jorge Gracia, Daniel Vila-Suero
Main activities:
1. Analysis of data sources
2. Modelling
3. URI/IRI design
4...
20/07/2015 26Jorge Gracia, Daniel Vila-Suero
URI/IRI design
The goal is to:
• Define URI/IRI patterns and namespaces to be...
20/07/2015 27Jorge Gracia, Daniel Vila-Suero
URI/IRI design
Some good practises…
1. Define namespace(s) (that you own or h...
20/07/2015 28Jorge Gracia, Daniel Vila-Suero
URI/IRI design
Following ISA recommendations:
http://{domain}/{type}/{concept...
20/07/2015 29Jorge Gracia, Daniel Vila-Suero
URI/IRI design EXAMPLE
# Apertium English lexicon:
http://linguistic.linkedda...
20/07/2015 30Jorge Gracia, Daniel Vila-Suero
Main activities:
1. Analysis of data sources
2. Modelling
3. URI/IRI design
4...
20/07/2015 31Jorge Gracia, Daniel Vila-Suero
RDF Generation
1. Selection, extension or development of
technologies for RDF...
20/07/2015 32Jorge Gracia, Daniel Vila-Suero
RDF Generation EXAMPLE
Goal:
apertium:lexiconEN a lemon:Lexicon ;
dc:source <...
20/07/2015 33Jorge Gracia, Daniel Vila-Suero
RDF Generation EXAMPLE
20/07/2015 34Jorge Gracia, Daniel Vila-Suero
Main activities:
1. Analysis of data sources
2. Modelling
3. URI/IRI design
4...
20/07/2015 35Jorge Gracia, Daniel Vila-Suero
Publication
The goal is to:
• Make available the RDF dataset following Linked...
20/07/2015 36Jorge Gracia, Daniel Vila-Suero
Publication
Metadata definition using the previously selected
vocabularies (D...
20/07/2015 37Jorge Gracia, Daniel Vila-Suero

Add "rights" metadata in the
dataset description (e.g., VoID, DCAT)
1
Use s...
20/07/2015 38Jorge Gracia, Daniel Vila-Suero
Publication
LD FRONTEND
SPARQL STORE
SPARQL ENDPOINT
HTTP
CONFIGURATION FILE
...
20/07/2015 39Jorge Gracia, Daniel Vila-Suero
Publication EXAMPLE
• SPARQL endpoint
http://linguistic.linkeddata.es/apertiu...
20/07/2015 40Jorge Gracia, Daniel Vila-Suero
Publication EXAMPLE
http://datahub.io/dataset/apertium-rdf-en-es
20/07/2015 41Jorge Gracia, Daniel Vila-Suero
Publication EXAMPLE
Extending DCAT description
<dcat:Dataset rdf:about="http:...
20/07/2015 42Jorge Gracia, Daniel Vila-Suero
• Loading the RDF data into a SPARQL endpoint
is not enough for publishing LD...
20/07/2015 43Jorge Gracia, Daniel Vila-Suero
Apertium RDF
Traversing the Apertium
RDF graph
20/07/2015 44Jorge Gracia, Daniel Vila-Suero
Apertium RDF
Lang. pair # triples # trans.
CA-IT 180,851 7,869
EN-CA 759,601 ...
20/07/2015 45Jorge Gracia, Daniel Vila-Suero
Apertium RDF
The LLOD cloud
20/07/2015 46Jorge Gracia, Daniel Vila-Suero
Apertium RDF
20/07/2015 47Jorge Gracia, Daniel Vila-Suero
Apertium RDF
Direct translations for “bank”@en
Translated written repr. Part ...
20/07/2015 48Jorge Gracia, Daniel Vila-Suero
Lexicon CA
Lexicon EN
Lexicon EN
Lexicon ES
Translation
Set EN-ES
Translation...
20/07/2015 49Jorge Gracia, Daniel Vila-Suero
orilla
“ribera”@es
bank-
banco
TranslationSetEN-ESLexiconES LexiconEN
“orilla...
20/07/2015 50Jorge Gracia, Daniel Vila-Suero
Indirect translations for “bank” EN-> ES -> PT
Pivot translation written repr...
20/07/2015 51Jorge Gracia, Daniel Vila-Suero
Apertium RDF
Dijkstra algorithm to choose shortest path
20/07/2015 52Jorge Gracia, Daniel Vila-Suero
bench
banco
LexiconEN LexiconESLexiconCA
banc
orilla
ribera
bank
riba
How to ...
20/07/2015 53Jorge Gracia, Daniel Vila-Suero
Given a lexical entry s:
1. Get direct translations of s in the pivot languag...
20/07/2015 54Jorge Gracia, Daniel Vila-Suero
bench
banco
LexiconEN LexiconESLexiconCA
banc
orilla
ribera
bank
riba
s = “ba...
20/07/2015 55Jorge Gracia, Daniel Vila-Suero
Around 270.000 links between Apertium RDF – BabelNet
Apertium RDF
Linking Ape...
20/07/2015 56Jorge Gracia, Daniel Vila-Suero
Apertium RDF
Translated
Written Repr.
BabelSynset BabelNet gloss
"banco" @es ...
20/07/2015 57Jorge Gracia, Daniel Vila-Suero
Conclusions
20/07/2015 58Jorge Gracia, Daniel Vila-Suero
Conclusions
• Methodology, guidelines, and reference cards
for LLOD generatio...
20/07/2015 59Jorge Gracia, Daniel Vila-Suero
Further readings
J. Gracia, "Multilingual dictionaries and the web of data,”
...
20/07/2015 60Jorge Gracia, Daniel Vila-Suero
THANK YOU!
Próxima SlideShare
Cargando en…5
×

Methodology for Linguistic Linked Open Data generation. The Apertium RDF case

703 visualizaciones

Publicado el

Presentation at Eurolan'15 about the methodology for Linguistic Linked Open Data generation, with its application into a partucluar case: the Apertium family of bilngual dictionaries.

Publicado en: Internet
  • Sé el primero en comentar

Methodology for Linguistic Linked Open Data generation. The Apertium RDF case

  1. 1. 19/07/2015 1Presenter name “Methodology for Linguistic Linked Open Data generation. The Apertium RDF case” Jorge Gracia, Daniel Vila-Suero jgracia, dvila@fi.upm.es The Summer School on Linguistic Linked Open Data 12th EUROLAN School. 20th July 2015
  2. 2. 20/07/2015 2Jorge Gracia, Daniel Vila-Suero Outline Introduction Methodology Analysis of data sources Modelling URI/IRI design RDF Generation Publication Traversing the Apertium RDF graph Conclusions
  3. 3. 20/07/2015 3Jorge Gracia, Daniel Vila-Suero Introduction
  4. 4. 20/07/2015 4Jorge Gracia, Daniel Vila-Suero Introduction Current multilingual lexica and electronic dictionaries • Proprietary formats • Non-standard APIs • Disconnected from other resources
  5. 5. 20/07/2015 5Jorge Gracia, Daniel Vila-Suero Introduction GOAL: to expose linguistic data contained in language resources as Linked Data on the Web
  6. 6. 20/07/2015 6Jorge Gracia, Daniel Vila-Suero Introduction Different methods and guidelines available: • LOD2 • Datalift • W3C Linked Data cookbook • W3C Best Practices for Linked Data But… multilingualism and linguistic linked data are not explicitly treated
  7. 7. 20/07/2015 7Jorge Gracia, Daniel Vila-Suero Introduction Guidelines for Multilingual Linked Data Guidelines for LD generation of language resources (at W3C BPMLOD community group) • Bilingual dictionaries • Multilingual dictionaries (BabelNet) • WordNets • Terminologies in TBX D. Vila-Suero, A. Gómez-Pérez, E. Montiel-Ponsoda, J. Gracia, and G. Aguado-de Cea, "Publishing Linked Data: the multilingual dimension". Springer Berlin Heidelberg, Aug. 2014, pp. 101-118.
  8. 8. 20/07/2015 8Jorge Gracia, Daniel Vila-Suero Introduction Reference cards for Linguistic Linked Data • How to publish Linguistic Linked Data • Language Resource Licensing - ODRL Reference Card • Inclusion in the LLOD Cloud • Data ID • Discovering Language Resources with Ling • NIF corpus • How to represent crosslingual links • Documenting a language resource in Datahub http://www.lider-project.eu/guidelines
  9. 9. 20/07/2015 9Jorge Gracia, Daniel Vila-Suero Introduction Motivating example: the Apertium bilingual dictionaries ...although the methodology is general enough to be applied to many other scenarios
  10. 10. 20/07/2015 10Jorge Gracia, Daniel Vila-Suero Introduction Apertium [http://www.apertium.org] open source platform for Machine Translation. Its bilingual dictionaries available in XML.
  11. 11. 20/07/2015 11Jorge Gracia, Daniel Vila-Suero Introduction Afrikaans <-> Dutch Breton --> French Catalan <-> Italian Welsh <-> English Danish <-- Norwegian English <-> Catalan English <-> Spanish English <-> Galician Esperanto <-- Catalan Esperanto <-> English Esperanto <-- Spanish Esperanto <-- French Spanish <-> Aragonese Spanish <-> Asturian Spanish <-> Catalan Spanish <-> Galician Spanish <-> Italian Spanish <-> Portuguese Spanish <-> Romanian Basque --> English Basque --> Spanish French <-> Catalan French <-> Spanish Serbo-Croatian <-> English Serbo-Croatian <-> Macedonian Serbo-Croatian <-> Slovenian Indonesian <-> Malaysian Icelandic <-> Swedish Icelandic --> English Kazakh <-> Tatar Macedonian <-> Bulgarian Macedonian --> English Norwegian Nynorsk <-> Norwegian Bokmål Occitan <-> Catalan Occitan <-> Spanish Portuguese <-> Catalan Portuguese <-> Galician Northern Sami --> Norwegian Bokmål Swedish <-> Danish …… More that 40 language pairs 22 of them (more stable) available in LMF
  12. 12. 20/07/2015 12Jorge Gracia, Daniel Vila-Suero Methodology
  13. 13. 20/07/2015 13Jorge Gracia, Daniel Vila-Suero Main activities: 1. Analysis of data sources 2. Modelling 3. URI/IRI design 4. RDF Generation 5. Publication Each activity composed of several tasks
  14. 14. 20/07/2015 14Jorge Gracia, Daniel Vila-Suero Main activities: 1. Analysis of data sources 2. Modelling 3. URI/IRI design 4. RDF Generation 5. Publication
  15. 15. 20/07/2015 15Jorge Gracia, Daniel Vila-Suero Analysis of data sources The goal is to: • Specify and analyse the data sources in order to plan and manage the subsequent activities • Main aspects to specify are: – Format – Identifiers structure – Access methods: file, webservice, etc. – Data models: Standards, terminologies, etc. – Language representation: how languages are tagged, represented, etc. – License and provenance: existing license of data sources
  16. 16. 20/07/2015 16Jorge Gracia, Daniel Vila-Suero Analysis of data sources EXAMPLE Documentation of data source: – Type of data: Bilingual dictionary (English and Spanish) – Data model: LMF (Lexical Markup Framework) – Format: XML files – License: GPL 3.0 – Provenance: Apertium EN-ES – ….
  17. 17. 20/07/2015 17Jorge Gracia, Daniel Vila-Suero Analysis of data sources EXAMPLE <Lexicon> <feat att="language" val="en"/> ... <LexicalEntry id="bench-n-en"> <feat att="partOfSpeech" val="n"/> <Lemma> <feat att="writtenForm" val="bench"/> </Lemma> <Sense id="bench_banco-n-l"/> </LexicalEntry> …
  18. 18. 20/07/2015 18Jorge Gracia, Daniel Vila-Suero Main activities: 1. Analysis of data sources 2. Modelling 3. URI/IRI design 4. RDF Generation 5. Publication
  19. 19. 20/07/2015 19Jorge Gracia, Daniel Vila-Suero Modelling Modelling tasks 1. Analysis and selection of domain vocabularies 2. Selection of vocabularies for representing licensing, provenance and other metadata 3. Mapping of data sources and vocabularies
  20. 20. 20/07/2015 20Jorge Gracia, Daniel Vila-Suero NIF NLP Interchange Format LexInfo Dublin Core Use http://lov.okfn.org/ Modelling Analysis of vocabularies DCAT PROV W3C Provenance Ontology ODRL Open Digital Rights Language
  21. 21. 20/07/2015 21Jorge Gracia, Daniel Vila-Suero Modelling
  22. 22. 20/07/2015 22Jorge Gracia, Daniel Vila-Suero LexicalSense trans translationTarget context TranslationSet Translation translationConfidence:double Modelling Translation Categories http://purl.org/net/translation-categories translationCategory context Resource http://purl.org/net/translation.owl Translation Module translationSource directEquivalent culturalEquivalent lexicalEquivalent
  23. 23. 20/07/2015 23Jorge Gracia, Daniel Vila-Suero Modelling EXAMPLE Mapping of data sources
  24. 24. 20/07/2015 24Jorge Gracia, Daniel Vila-Suero Modelling EXAMPLE
  25. 25. 20/07/2015 25Jorge Gracia, Daniel Vila-Suero Main activities: 1. Analysis of data sources 2. Modelling 3. URI/IRI design 4. RDF Generation 5. Publication
  26. 26. 20/07/2015 26Jorge Gracia, Daniel Vila-Suero URI/IRI design The goal is to: • Define URI/IRI patterns and namespaces to be used • Ensure that LD best practices are followed
  27. 27. 20/07/2015 27Jorge Gracia, Daniel Vila-Suero URI/IRI design Some good practises… 1. Define namespace(s) (that you own or have control over). 2. Define how to create the ID of resources (reuse original data source keys if possible) 3. Define the structure of the URI space to organize the resources in different addresses and avoid collision. Useful guidance at: ISA - Study on persistent URIs Archer et al., Linked Data patterns book online  URI patterns
  28. 28. 20/07/2015 28Jorge Gracia, Daniel Vila-Suero URI/IRI design Following ISA recommendations: http://{domain}/{type}/{concept}/{reference} where: {type} : a value from the set of type of resources, examples are 'id' or 'item' for real world objects; 'doc' for documents that describe those objects; 'def' for concepts; 'set' for datasets Archer, P., Goedertier, S., & Loutas, N. (2012). “Study on persistent URIs”. Technical report
  29. 29. 20/07/2015 29Jorge Gracia, Daniel Vila-Suero URI/IRI design EXAMPLE # Apertium English lexicon: http://linguistic.linkeddata.es/id/apertium/lexiconEN # Apertium Spanish lexicon: http://linguistic.linkeddata.es/id/apertium/lexiconES # Apertium English-Spanish translation set: http://linguistic.linkeddata.es/id/apertium/tranSetEN-ES Following ISA recommendations:
  30. 30. 20/07/2015 30Jorge Gracia, Daniel Vila-Suero Main activities: 1. Analysis of data sources 2. Modelling 3. URI/IRI design 4. RDF Generation 5. Publication
  31. 31. 20/07/2015 31Jorge Gracia, Daniel Vila-Suero RDF Generation 1. Selection, extension or development of technologies for RDF generation – Open Refine – D2RQ – XMLS – … 2. Mapping of data sources to RDF 3. Transformation of data sources to RDF
  32. 32. 20/07/2015 32Jorge Gracia, Daniel Vila-Suero RDF Generation EXAMPLE Goal: apertium:lexiconEN a lemon:Lexicon ; dc:source <http://hdl.handle.net/10230/17110> . ... apertium:lexiconEN lemon:entry apertium:lexiconEN/bench-n-en . apertium:lexiconEN/bench-n-en a lemon:LexicalEntry ; lemon:lexicalForm apertium:lexiconEN/bench-n-en-form ; lexinfo:partOfSpeech lexinfo:noun . apertium:lexiconEN/bench-n-en-form a lemon:Form ; lemon:writtenRep "bench"@en .
  33. 33. 20/07/2015 33Jorge Gracia, Daniel Vila-Suero RDF Generation EXAMPLE
  34. 34. 20/07/2015 34Jorge Gracia, Daniel Vila-Suero Main activities: 1. Analysis of data sources 2. Modelling 3. URI/IRI design 4. Generation 5. Publication
  35. 35. 20/07/2015 35Jorge Gracia, Daniel Vila-Suero Publication The goal is to: • Make available the RDF dataset following Linked Data best practices • Facilitate dataset discovery and consumption
  36. 36. 20/07/2015 36Jorge Gracia, Daniel Vila-Suero Publication Metadata definition using the previously selected vocabularies (DCAT, DC, VOID, …) 1. Register dataset in Datahub 2. Extend generated DCAT file and link to Datahub’s one 3. Publish both data and metadata files DCAT Data catalog vocabulary
  37. 37. 20/07/2015 37Jorge Gracia, Daniel Vila-Suero  Add "rights" metadata in the dataset description (e.g., VoID, DCAT) 1 Use standard predicates to declare "rights” statements (e.g., Dublin Core terms: dc:rights, dct:license) 2 ? Use rights declaration language, e.g., ODRL Yes Use URI of standard license e.g., CC0 3b3a No Standard license available ODRL Open Digital Rights Language Publication
  38. 38. 20/07/2015 38Jorge Gracia, Daniel Vila-Suero Publication LD FRONTEND SPARQL STORE SPARQL ENDPOINT HTTP CONFIGURATION FILE - Location of the RDF data - Define access methods - and even the presentation of the data SPARQL QUERY LANGUAGE Dataset and vocabulary publication on the Web
  39. 39. 20/07/2015 39Jorge Gracia, Daniel Vila-Suero Publication EXAMPLE • SPARQL endpoint http://linguistic.linkeddata.es/apertium/sparql- editor/ • Web interface http://linguistic.linkeddata.es/apertium/ • Datahub http://datahub.io/dataset?q=apertium+rdf&organiz ation=oeg-upm
  40. 40. 20/07/2015 40Jorge Gracia, Daniel Vila-Suero Publication EXAMPLE http://datahub.io/dataset/apertium-rdf-en-es
  41. 41. 20/07/2015 41Jorge Gracia, Daniel Vila-Suero Publication EXAMPLE Extending DCAT description <dcat:Dataset rdf:about="http://linguistic.linkeddata.es/set/apertium/EN-ES"> <owl:sameAs rdf:resource="http://datahub.io/dataset/apertium-rdf-en- es"></owl:sameAs> <dct:source rdf:resource="http://hdl.handle.net/10230/17110"></dct:source> <dct:license rdf:resource="http://purl.oclc.org/NET/rdflicense/gpl- 3.0"></dct:license> <rdfs:seeAlso rdf:resource="http://dbpedia.org/resource/Apertium"></rdfs:seeAlso> <rdfs:seeAlso rdf:resource="http://purl.org/ms-lod/UPF- MetadataRecords.ttl#Apertium-en-es_resource-5v2"></rdfs:seeAlso> </dcat:Dataset>
  42. 42. 20/07/2015 42Jorge Gracia, Daniel Vila-Suero • Loading the RDF data into a SPARQL endpoint is not enough for publishing LD: – Why? We provide a queryable repository, but URIs are not de-referenceable • We need a mechanism to make our URIs de- referenceable: – Through a common web server (as files) – Linked Data front-ends: • Pubby • More sophisticated: LD APIs (Puelia, Elda) Publication: SOME TIPS
  43. 43. 20/07/2015 43Jorge Gracia, Daniel Vila-Suero Apertium RDF Traversing the Apertium RDF graph
  44. 44. 20/07/2015 44Jorge Gracia, Daniel Vila-Suero Apertium RDF Lang. pair # triples # trans. CA-IT 180,851 7,869 EN-CA 759,601 33,029 EN-ES 576,316 25,830 EN-GL 425,117 20,034 EO-CA 426,301 19,964 EO-EN 617,772 31,474 EO-ES 380,198 17,212 EO-FR 726,281 35,791 ES-AN 71,997 3,110 ES-AST 825,54 36,096 ES-CA 730,501 31,291 Lang. pair # triples # trans. ES-GL 206,284 8,985 ES-PT 279,245 12,054 ES-RO 400,366 17,318 EU-ES 262,336 11,838 EU-EN 265,466 13,089 FR-CA 152,002 6,550 FR-ES 495,614 21,475 OC-CA 346,346 15,983 OC-ES 317,162 14,561 PT-CA 163,149 7,111 PT-GL 234,065 10,144 22 generated datasets
  45. 45. 20/07/2015 45Jorge Gracia, Daniel Vila-Suero Apertium RDF The LLOD cloud
  46. 46. 20/07/2015 46Jorge Gracia, Daniel Vila-Suero Apertium RDF
  47. 47. 20/07/2015 47Jorge Gracia, Daniel Vila-Suero Apertium RDF Direct translations for “bank”@en Translated written repr. Part of Speech "banc"@ca http://www.lexinfo.net/ontology/2.0/lexinfo#noun "riba"@ca http://www.lexinfo.net/ontology/2.0/lexinfo#noun "banco"@es http://www.lexinfo.net/ontology/2.0/lexinfo#noun "orilla"@es http://www.lexinfo.net/ontology/2.0/lexinfo#noun "ribera"@es http://www.lexinfo.net/ontology/2.0/lexinfo#noun "beira"@gl http://www.lexinfo.net/ontology/2.0/lexinfo#noun "banco"@gl http://www.lexinfo.net/ontology/2.0/lexinfo#noun "ourela"@gl http://www.lexinfo.net/ontology/2.0/lexinfo#noun "orela"@gl http://www.lexinfo.net/ontology/2.0/lexinfo#noun "banku"@eu http://www.lexinfo.net/ontology/2.0/lexinfo#noun "erribera"@eu http://www.lexinfo.net/ontology/2.0/lexinfo#noun "ertz"@eu http://www.lexinfo.net/ontology/2.0/lexinfo#noun "amuntegar"@ca http://www.lexinfo.net/ontology/2.0/lexinfo#verb "agolpar"@es http://www.lexinfo.net/ontology/2.0/lexinfo#verb "amontonar"@es http://www.lexinfo.net/ontology/2.0/lexinfo#verb "apelotonar"@es http://www.lexinfo.net/ontology/2.0/lexinfo#verb "hacinar"@es http://www.lexinfo.net/ontology/2.0/lexinfo#verb .... ...
  48. 48. 20/07/2015 48Jorge Gracia, Daniel Vila-Suero Lexicon CA Lexicon EN Lexicon EN Lexicon ES Translation Set EN-ES Translation Set EN-CA Apertium LMF Apertium RDF EN-ES EN-CA Monolingual lexicons Translation sets Apertium RDF
  49. 49. 20/07/2015 49Jorge Gracia, Daniel Vila-Suero orilla “ribera”@es bank- banco TranslationSetEN-ESLexiconES LexiconEN “orilla”@es banco- banco TranslationSetES-PT LexiconPT banco “banco”@pt bank benchribera orla bank- ribera bank- orilla bench- banco orilla- orla “bench”@en “bank”@en “orla”@pt banco “banco”@es
  50. 50. 20/07/2015 50Jorge Gracia, Daniel Vila-Suero Indirect translations for “bank” EN-> ES -> PT Pivot translation written repres. Indirect translation written repres. "banco"@es "banco"@pt "orilla"@es "orla"@pt Apertium RDF
  51. 51. 20/07/2015 51Jorge Gracia, Daniel Vila-Suero Apertium RDF Dijkstra algorithm to choose shortest path
  52. 52. 20/07/2015 52Jorge Gracia, Daniel Vila-Suero bench banco LexiconEN LexiconESLexiconCA banc orilla ribera bank riba How to measure confidence Apertium RDF
  53. 53. 20/07/2015 53Jorge Gracia, Daniel Vila-Suero Given a lexical entry s: 1. Get direct translations of s in the pivot language Ps 2. ∀ p ∈ Ps, get its translations in the target language Tp 3. For every t ∈ Tp, (a) gets its set of translations in the pivot language (Pt) (b) calculates the score for t: |||| *2)( ts ts PP PP tscore + ∩ = Tanaka, K., & Umemura, K. (1994). “Construction of a bilingual dictionary intermediated by a third language”. In COLING, pp. 297–303. One time inverse consultation (OTIC) Apertium RDF
  54. 54. 20/07/2015 54Jorge Gracia, Daniel Vila-Suero bench banco LexiconEN LexiconESLexiconCA banc orilla ribera bank riba s = “banco”@es Pbanco={“bank”@en, “bench”@en} Tbank={“banc”@ca, “riba”@ca} Tbench={“banc”@ca} Pbanc={“bank”@en, “bench”@en} Priba={“bank”@en} score(“banc”@ca) = 1.0 score(“riba”@ca) = 0.5 Apertium RDF
  55. 55. 20/07/2015 55Jorge Gracia, Daniel Vila-Suero Around 270.000 links between Apertium RDF – BabelNet Apertium RDF Linking Apertium to external resources
  56. 56. 20/07/2015 56Jorge Gracia, Daniel Vila-Suero Apertium RDF Translated Written Repr. BabelSynset BabelNet gloss "banco" @es http://babelnet.org/rdf/s00008371n “A building in which the business of banking transacted” "banco" @es http://babelnet.org/rdf/s00008366n “An arrangement of similar objects in a row or in tiers” "banco" @es http://babelnet.org/rdf/s15346085n “An ocean bank, sometimes referred to as a fishing bank or simply bank, ...” … … … "orilla" @es http://babelnet.org/rdf/s00008363n “Sloping land (especially the slope beside a body of water)” "ribera" @es http://babelnet.org/rdf/s00008363n “Sloping land (especially the slope beside a body of water)” Translations for “bank”@en
  57. 57. 20/07/2015 57Jorge Gracia, Daniel Vila-Suero Conclusions
  58. 58. 20/07/2015 58Jorge Gracia, Daniel Vila-Suero Conclusions • Methodology, guidelines, and reference cards for LLOD generation • Exemplified with the Apertium RDF case – Apertium data on the Web following SW standards – Common entry point for all the dictionaries – Direct and indirect translations can be easily obtained via SPARQL – Linked with BabelNet
  59. 59. 20/07/2015 59Jorge Gracia, Daniel Vila-Suero Further readings J. Gracia, "Multilingual dictionaries and the web of data,” Kernerman Dictionaries News, no. 23, pp. 1-4, Jun. 2015. J. Gracia, D. Vila-Suero, J. McCrae, T. Flati, C. Baron, and M. Dojchinovski, "Language resources and linked data: A practical perspective," in Proc. of Knowledge Engineering and Knowledge Management (EKAW'14), ser. Lecture Notes in Computer Science, Springer International Publishing, Nov. 2014, vol. 8982, pp. 3-17. J. Gracia, E. Montiel-Ponsoda, D. Vila-Suero, and G. Aguado-de Cea, "Enabling language resources to expose translations as linked data on the web," in Proc. of 9th Language Resources and Evaluation Conference (LREC'14), Reykjavik (Iceland). European Language Resources Association (ELRA), May 2014, pp. 409-413 J. Gracia, E. Montiel-Ponsoda, P. Cimiano, A. Gómez-Pérez, P. Buitelaar, and J. McCrae, "Challenges for the multilingual web of data," Journal of Web Semantics, vol. 11, pp. 63-71, Mar. 2012.
  60. 60. 20/07/2015 60Jorge Gracia, Daniel Vila-Suero THANK YOU!

×