18/05/2016 1Presenter name
Apertium RDF: 
an experience in generating 
linguistic linked open data 
Jorge Gracia
Ontology ...
16/06/2015 2Jorge Gracia
Outline
Motivation
The Apertium platform
Representing translations in RDF
Building the Apertium R...
16/06/2015 3Jorge Gracia
3
Motivation
16/06/2015 4Jorge Gracia
Motivation
Current multilingual lexica and electronic dictionaries
• Proprietary formats 
• Non‐s...
16/06/2015 5Jorge Gracia
Motivation
GOAL: to expose translations contained in bilingual 
dictionaries as Linked Data on th...
16/06/2015 6Jorge Gracia
6
The Apertium platform
16/06/2015 7Jorge Gracia
Apertium
Apertium [http://www.apertium.org] open source 
platform for Machine Translation. Its bi...
16/06/2015 8Jorge Gracia
Apertium
8
Afrikaans <-> Dutch
Breton --> French
Catalan <-> Italian
Welsh <-> English
Danish <--...
16/06/2015 9Jorge Gracia
9
Representing translations 
in RDF
16/06/2015 10Jorge Gracia
lemon
10
16/06/2015 11Jorge Gracia
LexicalSense
trans
translationTarget
context
TranslationSet Translation
translationConfidence:do...
16/06/2015 12Jorge Gracia
lemon:LexicalEntry
lemon:LexicalEntry
lemon:LexicalSense
lemon:LexicalSense
lemon:Lexicon
lexico...
16/06/2015 13Jorge Gracia
13
Building the 
Apertium RDF graph
16/06/2015 14Jorge Gracia
Methodology
1. Data analysis and vocabulary selection
2. Modelling
3. URIs design
4. RDF generat...
16/06/2015 15Jorge Gracia
Modelling
Mapping of data sources
16/06/2015 16Jorge Gracia
URIs design
# Apertium English lexicon:
http://linguistic.linkeddata.es/id/apertium/lexiconEN
# ...
16/06/2015 17Jorge Gracia
RDF Generation
RDF generation based on Open Refine 
• E.g., RDF generated:
apertium:lexiconEN a ...
16/06/2015 18Jorge Gracia
Publication
• SPARQL endpoint
http://linguistic.linkeddata.es/apertium/sparql‐
editor/ 
• Web in...
16/06/2015 19Jorge Gracia
19
Traversing the graph
16/06/2015 20Jorge Gracia
22 generated datasets
20
Lang. pair # triples # trans.
CA‐IT 180,851 7,869
EN‐CA 759,601 33,029
...
16/06/2015 21Jorge Gracia
Apertium RDF in the LLOD cloud
21
16/06/2015 22Jorge Gracia
Apertium RDF in the LLOD cloud
16/06/2015 23Jorge Gracia
Direct translations
23
Direct translations for “bank”@en
Translated written repr. Part of Speech...
16/06/2015 24Jorge Gracia
Lexicon CA
Lexicon EN
Lexicon EN
Lexicon ES
Translation
Set EN‐ES
Translation
Set EN‐CA
Apertium...
16/06/2015 25Jorge Gracia
orilla
“ribera”@es
bank‐
banco
TranslationSetEN‐ESLexiconES LexiconEN
“orilla”@es
banco‐
banco
T...
16/06/2015 26Jorge Gracia
Indirect translations
Indirect translations for “bank” EN‐> ES ‐> PT
26
Pivot translation writte...
16/06/2015 27Jorge Gracia
Apertium RDF graph
Dijkstra algorithm to choose shortest path
27
16/06/2015 28Jorge Gracia
bench
banco
LexiconEN LexiconESLexiconCA
banc
orilla
ribera
bank
riba
How to measure confidence
16/06/2015 29Jorge Gracia
One time inverse consultation (OTIC)
29
Given a lexical entry s:
1. Get direct translations of s...
16/06/2015 30Jorge Gracia
bench
banco
LexiconEN LexiconESLexiconCA
banc
orilla
ribera
bank
riba
One time inverse consultat...
16/06/2015 31Jorge Gracia
31
Linking with external 
sources
16/06/2015 32Jorge Gracia
Linking to BabelNet
32
Around 130.000 links between Apertium RDF – BabelNet
16/06/2015 33Jorge Gracia
Linking to BabelNet
Translated 
Written Repr.
BabelSynset BabelNet gloss
"banco" @es http://babe...
16/06/2015 34Jorge Gracia
34
Conclusions
16/06/2015 35Jorge Gracia
Conclusions
• Apertium data on the Web following SW standards 
• Common entry point for all the ...
16/06/2015 36Jorge Gracia
Conclusions
Related reading… 
http://kdictionaries.com/kdn/kdn23.pdf
16/06/2015 37Jorge Gracia
Thanks for your attention !
37
http://linguistic.linkeddata.es/apertium/
16/06/2015 38Jorge Gracia
Some results of applying OTIC 
38
Language path Threshold Precision Recall
EN‐CA‐ES
0.0 76% 48%
...
Próxima SlideShare
Cargando en…5
×

Sd llod-15 apertium

792 visualizaciones

Publicado el

This presentation describes how the RDF version of the Apertium bilingual dictionaries were modelled, generated, and linked to BabelNet

Publicado en: Internet
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Sd llod-15 apertium

  1. 1. 18/05/2016 1Presenter name Apertium RDF:  an experience in generating  linguistic linked open data  Jorge Gracia Ontology Engineering Group (OEG) Universidad Politécnica de Madrid (UPM) jgracia@fi.upm.es 1st Summer Datathon on Linguistic Linked Open Data Cercedilla (Spain), 15-19th June 2015
  2. 2. 16/06/2015 2Jorge Gracia Outline Motivation The Apertium platform Representing translations in RDF Building the Apertium RDF graph Traversing the graph Linking with external sources Conclusions 2
  3. 3. 16/06/2015 3Jorge Gracia 3 Motivation
  4. 4. 16/06/2015 4Jorge Gracia Motivation Current multilingual lexica and electronic dictionaries • Proprietary formats  • Non‐standard APIs • Disconnected from other resources 4
  5. 5. 16/06/2015 5Jorge Gracia Motivation GOAL: to expose translations contained in bilingual  dictionaries as Linked Data on the Web Joint effort by 5
  6. 6. 16/06/2015 6Jorge Gracia 6 The Apertium platform
  7. 7. 16/06/2015 7Jorge Gracia Apertium Apertium [http://www.apertium.org] open source  platform for Machine Translation. Its bilingual  dictionaries available in XML. 7
  8. 8. 16/06/2015 8Jorge Gracia Apertium 8 Afrikaans <-> Dutch Breton --> French Catalan <-> Italian Welsh <-> English Danish <-- Norwegian English <-> Catalan English <-> Spanish English <-> Galician Esperanto <-- Catalan Esperanto <-> English Esperanto <-- Spanish Esperanto <-- French Spanish <-> Aragonese Spanish <-> Asturian Spanish <-> Catalan Spanish <-> Galician Spanish <-> Italian Spanish <-> Portuguese Spanish <-> Romanian Basque --> English Basque --> Spanish French <-> Catalan French <-> Spanish Serbo-Croatian <-> English Serbo-Croatian <-> Macedonian Serbo-Croatian <-> Slovenian Indonesian <-> Malaysian Icelandic <-> Swedish Icelandic --> English Kazakh <-> Tatar Macedonian <-> Bulgarian Macedonian --> English Norwegian Nynorsk <-> Norwegian Bokmål Occitan <-> Catalan Occitan <-> Spanish Portuguese <-> Catalan Portuguese <-> Galician Northern Sami --> Norwegian Bokmål Swedish <-> Danish …… More that 40 language pairs 22 of them (more stable) available in LMF
  9. 9. 16/06/2015 9Jorge Gracia 9 Representing translations  in RDF
  10. 10. 16/06/2015 10Jorge Gracia lemon 10
  11. 11. 16/06/2015 11Jorge Gracia LexicalSense trans translationTarget context TranslationSet Translation translationConfidence:double The translation module Translation Categories http://purl.org/net/translation-categories translationCategory context Resource http://purl.org/net/translation.owl Translation Module translationSource directEquivalent culturalEquivalent lexicalEquivalent 11
  12. 12. 16/06/2015 12Jorge Gracia lemon:LexicalEntry lemon:LexicalEntry lemon:LexicalSense lemon:LexicalSense lemon:Lexicon lexiconEN lemon:Lexicon  lexiconES tr:Translation “bench”@en “banco”@es lemon:entry lemon:entry lemon:isSenseOf lemon:isSenseOf tr:translationTarget tr:translationSource tr:trans lemon:lexicalForm lemon:lexicalForm lemon:Form lemon:Form lemon:writtenRep tr:TranslationSet translationSetEN‐ES lemon:writtenRep Translation example
  13. 13. 16/06/2015 13Jorge Gracia 13 Building the  Apertium RDF graph
  14. 14. 16/06/2015 14Jorge Gracia Methodology 1. Data analysis and vocabulary selection 2. Modelling 3. URIs design 4. RDF generation 5. Publication as linked data 14
  15. 15. 16/06/2015 15Jorge Gracia Modelling Mapping of data sources
  16. 16. 16/06/2015 16Jorge Gracia URIs design # Apertium English lexicon: http://linguistic.linkeddata.es/id/apertium/lexiconEN # Apertium Spanish lexicon: http://linguistic.linkeddata.es/id/apertium/lexiconES # Apertium English-Spanish translation set: http://linguistic.linkeddata.es/id/apertium/tranSetEN- ES Following ISA recommendations [Archer et al.]: Archer, P., Goedertier, S., & Loutas, N. (2012). Study on persistent URIs. Tech. rep..
  17. 17. 16/06/2015 17Jorge Gracia RDF Generation RDF generation based on Open Refine  • E.g., RDF generated: apertium:lexiconEN a lemon:Lexicon ; dc:source <http://hdl.handle.net/10230/17110> . ... apertium:lexiconEN lemon:entry apertium:lexiconEN/bench-n-en . apertium:lexiconEN/bench-n-en a lemon:LexicalEntry ; lemon:lexicalForm apertium:lexiconEN/bench- n-en-form ; lexinfo:partOfSpeech lexinfo:noun . apertium:lexiconEN/bench-n-en-form a lemon:Form ; lemon:writtenRep "bench"@en .
  18. 18. 16/06/2015 18Jorge Gracia Publication • SPARQL endpoint http://linguistic.linkeddata.es/apertium/sparql‐ editor/  • Web interface http://linguistic.linkeddata.es/apertium/ • Datahub http://datahub.io/dataset?q=apertium+rdf&organiz ation=oeg‐upm 18
  19. 19. 16/06/2015 19Jorge Gracia 19 Traversing the graph
  20. 20. 16/06/2015 20Jorge Gracia 22 generated datasets 20 Lang. pair # triples # trans. CA‐IT 180,851 7,869 EN‐CA 759,601 33,029 EN‐ES 576,316 25,830 EN‐GL 425,117 20,034 EO‐CA 426,301 19,964 EO‐EN 617,772 31,474 EO‐ES 380,198 17,212 EO‐FR 726,281 35,791 ES‐AN 71,997 3,110 ES‐AST 825,54 36,096 ES‐CA 730,501 31,291 Lang. pair # triples # trans. ES‐GL 206,284 8,985 ES‐PT 279,245 12,054 ES‐RO 400,366 17,318 EU‐ES 262,336 11,838 EU‐EN 265,466 13,089 FR‐CA 152,002 6,550 FR‐ES 495,614 21,475 OC‐CA 346,346 15,983 OC‐ES 317,162 14,561 PT‐CA 163,149 7,111 PT‐GL 234,065 10,144
  21. 21. 16/06/2015 21Jorge Gracia Apertium RDF in the LLOD cloud 21
  22. 22. 16/06/2015 22Jorge Gracia Apertium RDF in the LLOD cloud
  23. 23. 16/06/2015 23Jorge Gracia Direct translations 23 Direct translations for “bank”@en Translated written repr. Part of Speech "banc"@ca http://www.lexinfo.net/ontology/2.0/lexinfo#noun "riba"@ca http://www.lexinfo.net/ontology/2.0/lexinfo#noun "banco"@es http://www.lexinfo.net/ontology/2.0/lexinfo#noun "orilla"@es http://www.lexinfo.net/ontology/2.0/lexinfo#noun "ribera"@es http://www.lexinfo.net/ontology/2.0/lexinfo#noun "beira"@gl http://www.lexinfo.net/ontology/2.0/lexinfo#noun "banco"@gl http://www.lexinfo.net/ontology/2.0/lexinfo#noun "ourela"@gl http://www.lexinfo.net/ontology/2.0/lexinfo#noun "orela"@gl http://www.lexinfo.net/ontology/2.0/lexinfo#noun "banku"@eu http://www.lexinfo.net/ontology/2.0/lexinfo#noun "erribera"@eu http://www.lexinfo.net/ontology/2.0/lexinfo#noun "ertz"@eu http://www.lexinfo.net/ontology/2.0/lexinfo#noun "amuntegar"@ca http://www.lexinfo.net/ontology/2.0/lexinfo#verb "agolpar"@es http://www.lexinfo.net/ontology/2.0/lexinfo#verb "amontonar"@es http://www.lexinfo.net/ontology/2.0/lexinfo#verb "apelotonar"@es http://www.lexinfo.net/ontology/2.0/lexinfo#verb "hacinar"@es http://www.lexinfo.net/ontology/2.0/lexinfo#verb .... ...
  24. 24. 16/06/2015 24Jorge Gracia Lexicon CA Lexicon EN Lexicon EN Lexicon ES Translation Set EN‐ES Translation Set EN‐CA Apertium LMF Apertium RDF EN‐ES EN‐CA Monolingual lexicons Translation sets 24
  25. 25. 16/06/2015 25Jorge Gracia orilla “ribera”@es bank‐ banco TranslationSetEN‐ESLexiconES LexiconEN “orilla”@es banco‐ banco TranslationSetES‐PT LexiconPT banco “banco”@pt bank benchribera orla bank‐ ribera bank‐ orilla bench‐ banco orilla‐ orla “bench”@en “bank”@en “orla”@pt banco “banco”@es
  26. 26. 16/06/2015 26Jorge Gracia Indirect translations Indirect translations for “bank” EN‐> ES ‐> PT 26 Pivot translation written repres. Indirect translation written repres. "banco"@es "banco"@pt "orilla"@es "orla"@pt
  27. 27. 16/06/2015 27Jorge Gracia Apertium RDF graph Dijkstra algorithm to choose shortest path 27
  28. 28. 16/06/2015 28Jorge Gracia bench banco LexiconEN LexiconESLexiconCA banc orilla ribera bank riba How to measure confidence
  29. 29. 16/06/2015 29Jorge Gracia One time inverse consultation (OTIC) 29 Given a lexical entry s: 1. Get direct translations of s in the pivot language Ps 2.  p  Ps, get its translations in the target language Tp 3. For every t  Tp, (a) gets its set of translations in the pivot language (Pt) (b) calculates the score for t: |||| *2)( ts ts PP PP tscore    Tanaka, K., & Umemura, K. (1994). Construction of a bilingual dictionary intermediated by a third language. In COLING, pp. 297–303.
  30. 30. 16/06/2015 30Jorge Gracia bench banco LexiconEN LexiconESLexiconCA banc orilla ribera bank riba One time inverse consultation s = “banco”@es Pbanco={“bank”@en, “bench”@en} Tbank={“banc”@ca, “riba”@ca} Tbench={“banc”@ca} Pbanc={“bank”@en, “bench”@en} Priba={“bank”@en} score(“banc”@ca) = 1.0 score(“riba”@ca) = 0.5
  31. 31. 16/06/2015 31Jorge Gracia 31 Linking with external  sources
  32. 32. 16/06/2015 32Jorge Gracia Linking to BabelNet 32 Around 130.000 links between Apertium RDF – BabelNet
  33. 33. 16/06/2015 33Jorge Gracia Linking to BabelNet Translated  Written Repr. BabelSynset BabelNet gloss "banco" @es http://babelnet.org/rdf/s00008371n “A building in which the business  of banking transacted” "banco" @es http://babelnet.org/rdf/s00008366n “An arrangement of similar  objects in a row or in tiers” "banco" @es http://babelnet.org/rdf/s15346085n “An ocean bank, sometimes  referred to as a fishing bank or  simply bank, ...” … … … "orilla" @es http://babelnet.org/rdf/s00008363n “Sloping land (especially the  slope beside a body of water)” "ribera" @es http://babelnet.org/rdf/s00008363n “Sloping land (especially the  slope beside a body of water)” Translations for “bank”@en
  34. 34. 16/06/2015 34Jorge Gracia 34 Conclusions
  35. 35. 16/06/2015 35Jorge Gracia Conclusions • Apertium data on the Web following SW standards  • Common entry point for all the Apertium dictionaries • Direct and indirect translations can be easily obtained  via SPARQL • Confidence degree for indirect translations • Linked with BabelNet 35
  36. 36. 16/06/2015 36Jorge Gracia Conclusions Related reading…  http://kdictionaries.com/kdn/kdn23.pdf
  37. 37. 16/06/2015 37Jorge Gracia Thanks for your attention ! 37 http://linguistic.linkeddata.es/apertium/
  38. 38. 16/06/2015 38Jorge Gracia Some results of applying OTIC  38 Language path Threshold Precision Recall EN‐CA‐ES 0.0 76% 48% 0.5 77% 48% 1.0 82% 43% ES‐EN‐CA 0.0 53% 39% 0.5 55% 39% 1.0 61% 36% EN‐ES‐CA 0.0 73% 38% 0.5 76% 38% 1.0 83% 33%

×