Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Sd llod-15 apertium

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Próximo SlideShare
Open Data & Linked Open Data
Open Data & Linked Open Data
Cargando en…3
×

Eche un vistazo a continuación

1 de 38 Anuncio
Anuncio

Más Contenido Relacionado

Similares a Sd llod-15 apertium (20)

Más reciente (20)

Anuncio

Sd llod-15 apertium

  1. 1. 18/05/2016 1Presenter name Apertium RDF:  an experience in generating  linguistic linked open data  Jorge Gracia Ontology Engineering Group (OEG) Universidad Politécnica de Madrid (UPM) jgracia@fi.upm.es 1st Summer Datathon on Linguistic Linked Open Data Cercedilla (Spain), 15-19th June 2015
  2. 2. 16/06/2015 2Jorge Gracia Outline Motivation The Apertium platform Representing translations in RDF Building the Apertium RDF graph Traversing the graph Linking with external sources Conclusions 2
  3. 3. 16/06/2015 3Jorge Gracia 3 Motivation
  4. 4. 16/06/2015 4Jorge Gracia Motivation Current multilingual lexica and electronic dictionaries • Proprietary formats  • Non‐standard APIs • Disconnected from other resources 4
  5. 5. 16/06/2015 5Jorge Gracia Motivation GOAL: to expose translations contained in bilingual  dictionaries as Linked Data on the Web Joint effort by 5
  6. 6. 16/06/2015 6Jorge Gracia 6 The Apertium platform
  7. 7. 16/06/2015 7Jorge Gracia Apertium Apertium [http://www.apertium.org] open source  platform for Machine Translation. Its bilingual  dictionaries available in XML. 7
  8. 8. 16/06/2015 8Jorge Gracia Apertium 8 Afrikaans <-> Dutch Breton --> French Catalan <-> Italian Welsh <-> English Danish <-- Norwegian English <-> Catalan English <-> Spanish English <-> Galician Esperanto <-- Catalan Esperanto <-> English Esperanto <-- Spanish Esperanto <-- French Spanish <-> Aragonese Spanish <-> Asturian Spanish <-> Catalan Spanish <-> Galician Spanish <-> Italian Spanish <-> Portuguese Spanish <-> Romanian Basque --> English Basque --> Spanish French <-> Catalan French <-> Spanish Serbo-Croatian <-> English Serbo-Croatian <-> Macedonian Serbo-Croatian <-> Slovenian Indonesian <-> Malaysian Icelandic <-> Swedish Icelandic --> English Kazakh <-> Tatar Macedonian <-> Bulgarian Macedonian --> English Norwegian Nynorsk <-> Norwegian Bokmål Occitan <-> Catalan Occitan <-> Spanish Portuguese <-> Catalan Portuguese <-> Galician Northern Sami --> Norwegian Bokmål Swedish <-> Danish …… More that 40 language pairs 22 of them (more stable) available in LMF
  9. 9. 16/06/2015 9Jorge Gracia 9 Representing translations  in RDF
  10. 10. 16/06/2015 10Jorge Gracia lemon 10
  11. 11. 16/06/2015 11Jorge Gracia LexicalSense trans translationTarget context TranslationSet Translation translationConfidence:double The translation module Translation Categories http://purl.org/net/translation-categories translationCategory context Resource http://purl.org/net/translation.owl Translation Module translationSource directEquivalent culturalEquivalent lexicalEquivalent 11
  12. 12. 16/06/2015 12Jorge Gracia lemon:LexicalEntry lemon:LexicalEntry lemon:LexicalSense lemon:LexicalSense lemon:Lexicon lexiconEN lemon:Lexicon  lexiconES tr:Translation “bench”@en “banco”@es lemon:entry lemon:entry lemon:isSenseOf lemon:isSenseOf tr:translationTarget tr:translationSource tr:trans lemon:lexicalForm lemon:lexicalForm lemon:Form lemon:Form lemon:writtenRep tr:TranslationSet translationSetEN‐ES lemon:writtenRep Translation example
  13. 13. 16/06/2015 13Jorge Gracia 13 Building the  Apertium RDF graph
  14. 14. 16/06/2015 14Jorge Gracia Methodology 1. Data analysis and vocabulary selection 2. Modelling 3. URIs design 4. RDF generation 5. Publication as linked data 14
  15. 15. 16/06/2015 15Jorge Gracia Modelling Mapping of data sources
  16. 16. 16/06/2015 16Jorge Gracia URIs design # Apertium English lexicon: http://linguistic.linkeddata.es/id/apertium/lexiconEN # Apertium Spanish lexicon: http://linguistic.linkeddata.es/id/apertium/lexiconES # Apertium English-Spanish translation set: http://linguistic.linkeddata.es/id/apertium/tranSetEN- ES Following ISA recommendations [Archer et al.]: Archer, P., Goedertier, S., & Loutas, N. (2012). Study on persistent URIs. Tech. rep..
  17. 17. 16/06/2015 17Jorge Gracia RDF Generation RDF generation based on Open Refine  • E.g., RDF generated: apertium:lexiconEN a lemon:Lexicon ; dc:source <http://hdl.handle.net/10230/17110> . ... apertium:lexiconEN lemon:entry apertium:lexiconEN/bench-n-en . apertium:lexiconEN/bench-n-en a lemon:LexicalEntry ; lemon:lexicalForm apertium:lexiconEN/bench- n-en-form ; lexinfo:partOfSpeech lexinfo:noun . apertium:lexiconEN/bench-n-en-form a lemon:Form ; lemon:writtenRep "bench"@en .
  18. 18. 16/06/2015 18Jorge Gracia Publication • SPARQL endpoint http://linguistic.linkeddata.es/apertium/sparql‐ editor/  • Web interface http://linguistic.linkeddata.es/apertium/ • Datahub http://datahub.io/dataset?q=apertium+rdf&organiz ation=oeg‐upm 18
  19. 19. 16/06/2015 19Jorge Gracia 19 Traversing the graph
  20. 20. 16/06/2015 20Jorge Gracia 22 generated datasets 20 Lang. pair # triples # trans. CA‐IT 180,851 7,869 EN‐CA 759,601 33,029 EN‐ES 576,316 25,830 EN‐GL 425,117 20,034 EO‐CA 426,301 19,964 EO‐EN 617,772 31,474 EO‐ES 380,198 17,212 EO‐FR 726,281 35,791 ES‐AN 71,997 3,110 ES‐AST 825,54 36,096 ES‐CA 730,501 31,291 Lang. pair # triples # trans. ES‐GL 206,284 8,985 ES‐PT 279,245 12,054 ES‐RO 400,366 17,318 EU‐ES 262,336 11,838 EU‐EN 265,466 13,089 FR‐CA 152,002 6,550 FR‐ES 495,614 21,475 OC‐CA 346,346 15,983 OC‐ES 317,162 14,561 PT‐CA 163,149 7,111 PT‐GL 234,065 10,144
  21. 21. 16/06/2015 21Jorge Gracia Apertium RDF in the LLOD cloud 21
  22. 22. 16/06/2015 22Jorge Gracia Apertium RDF in the LLOD cloud
  23. 23. 16/06/2015 23Jorge Gracia Direct translations 23 Direct translations for “bank”@en Translated written repr. Part of Speech "banc"@ca http://www.lexinfo.net/ontology/2.0/lexinfo#noun "riba"@ca http://www.lexinfo.net/ontology/2.0/lexinfo#noun "banco"@es http://www.lexinfo.net/ontology/2.0/lexinfo#noun "orilla"@es http://www.lexinfo.net/ontology/2.0/lexinfo#noun "ribera"@es http://www.lexinfo.net/ontology/2.0/lexinfo#noun "beira"@gl http://www.lexinfo.net/ontology/2.0/lexinfo#noun "banco"@gl http://www.lexinfo.net/ontology/2.0/lexinfo#noun "ourela"@gl http://www.lexinfo.net/ontology/2.0/lexinfo#noun "orela"@gl http://www.lexinfo.net/ontology/2.0/lexinfo#noun "banku"@eu http://www.lexinfo.net/ontology/2.0/lexinfo#noun "erribera"@eu http://www.lexinfo.net/ontology/2.0/lexinfo#noun "ertz"@eu http://www.lexinfo.net/ontology/2.0/lexinfo#noun "amuntegar"@ca http://www.lexinfo.net/ontology/2.0/lexinfo#verb "agolpar"@es http://www.lexinfo.net/ontology/2.0/lexinfo#verb "amontonar"@es http://www.lexinfo.net/ontology/2.0/lexinfo#verb "apelotonar"@es http://www.lexinfo.net/ontology/2.0/lexinfo#verb "hacinar"@es http://www.lexinfo.net/ontology/2.0/lexinfo#verb .... ...
  24. 24. 16/06/2015 24Jorge Gracia Lexicon CA Lexicon EN Lexicon EN Lexicon ES Translation Set EN‐ES Translation Set EN‐CA Apertium LMF Apertium RDF EN‐ES EN‐CA Monolingual lexicons Translation sets 24
  25. 25. 16/06/2015 25Jorge Gracia orilla “ribera”@es bank‐ banco TranslationSetEN‐ESLexiconES LexiconEN “orilla”@es banco‐ banco TranslationSetES‐PT LexiconPT banco “banco”@pt bank benchribera orla bank‐ ribera bank‐ orilla bench‐ banco orilla‐ orla “bench”@en “bank”@en “orla”@pt banco “banco”@es
  26. 26. 16/06/2015 26Jorge Gracia Indirect translations Indirect translations for “bank” EN‐> ES ‐> PT 26 Pivot translation written repres. Indirect translation written repres. "banco"@es "banco"@pt "orilla"@es "orla"@pt
  27. 27. 16/06/2015 27Jorge Gracia Apertium RDF graph Dijkstra algorithm to choose shortest path 27
  28. 28. 16/06/2015 28Jorge Gracia bench banco LexiconEN LexiconESLexiconCA banc orilla ribera bank riba How to measure confidence
  29. 29. 16/06/2015 29Jorge Gracia One time inverse consultation (OTIC) 29 Given a lexical entry s: 1. Get direct translations of s in the pivot language Ps 2.  p  Ps, get its translations in the target language Tp 3. For every t  Tp, (a) gets its set of translations in the pivot language (Pt) (b) calculates the score for t: |||| *2)( ts ts PP PP tscore    Tanaka, K., & Umemura, K. (1994). Construction of a bilingual dictionary intermediated by a third language. In COLING, pp. 297–303.
  30. 30. 16/06/2015 30Jorge Gracia bench banco LexiconEN LexiconESLexiconCA banc orilla ribera bank riba One time inverse consultation s = “banco”@es Pbanco={“bank”@en, “bench”@en} Tbank={“banc”@ca, “riba”@ca} Tbench={“banc”@ca} Pbanc={“bank”@en, “bench”@en} Priba={“bank”@en} score(“banc”@ca) = 1.0 score(“riba”@ca) = 0.5
  31. 31. 16/06/2015 31Jorge Gracia 31 Linking with external  sources
  32. 32. 16/06/2015 32Jorge Gracia Linking to BabelNet 32 Around 130.000 links between Apertium RDF – BabelNet
  33. 33. 16/06/2015 33Jorge Gracia Linking to BabelNet Translated  Written Repr. BabelSynset BabelNet gloss "banco" @es http://babelnet.org/rdf/s00008371n “A building in which the business  of banking transacted” "banco" @es http://babelnet.org/rdf/s00008366n “An arrangement of similar  objects in a row or in tiers” "banco" @es http://babelnet.org/rdf/s15346085n “An ocean bank, sometimes  referred to as a fishing bank or  simply bank, ...” … … … "orilla" @es http://babelnet.org/rdf/s00008363n “Sloping land (especially the  slope beside a body of water)” "ribera" @es http://babelnet.org/rdf/s00008363n “Sloping land (especially the  slope beside a body of water)” Translations for “bank”@en
  34. 34. 16/06/2015 34Jorge Gracia 34 Conclusions
  35. 35. 16/06/2015 35Jorge Gracia Conclusions • Apertium data on the Web following SW standards  • Common entry point for all the Apertium dictionaries • Direct and indirect translations can be easily obtained  via SPARQL • Confidence degree for indirect translations • Linked with BabelNet 35
  36. 36. 16/06/2015 36Jorge Gracia Conclusions Related reading…  http://kdictionaries.com/kdn/kdn23.pdf
  37. 37. 16/06/2015 37Jorge Gracia Thanks for your attention ! 37 http://linguistic.linkeddata.es/apertium/
  38. 38. 16/06/2015 38Jorge Gracia Some results of applying OTIC  38 Language path Threshold Precision Recall EN‐CA‐ES 0.0 76% 48% 0.5 77% 48% 1.0 82% 43% ES‐EN‐CA 0.0 53% 39% 0.5 55% 39% 1.0 61% 36% EN‐ES‐CA 0.0 73% 38% 0.5 76% 38% 1.0 83% 33%

×