SlideShare una empresa de Scribd logo
1 de 18
AIMS
Is ISO 639 enough for a multilingual
            thesaurus?
            The AGROVOC case

   Caterina Caracciolo, Gudrun Johannsen, Lavanya
               Kiran, Johannes Keizer
    Food and Agriculture Organization of the UN
                      AOS 2012
             Sept 4. 2012 - Kuching (MY)
Background
• AGROVOC is published in 21 languages + other
  under development
• Multilinguality has always been an issue
• Since the beginning, multilinguality was
  interpreted as “translation”:
      – One hierarchy of terms (one
        structure), translations in various languages
• This organization remained with the move
  from a term-centered to a concept-centered
  resource
9/5/2012                                                2
AGROVOC as object-centered
                 resource…
• Being mainly a resource for document
  indexing in the area of agriculture, it contains
  large amount of words referring to
  plants, animals, food in general




9/5/2012                                             3
# of concepts below top concepts
  organism
 substances
    entities
phenomena
   activities
  products
   methods
 properties
    features
     objects
  resources
    subjects
    systems
   locations                                               Series1
     groups
  measures
       state
      stages
 technology
  processes
     factors
        time
     events
         site
  strategies
9/5/2012                                                       4
                0   5000   10000   15000   20000   25000
Differentiating languages
• Salmon (en)
• Salmón (es)
• лососи (ru)




9/5/2012                               5
But distribution of languages may
              be wide…




9/5/2012                              6
… and names of food tend to vary…


Aguacate




            Palta




 9/5/2012                        7
… and names of food tend to vary…
                     Ataco morado,
                     sangorache,
                     sergorache,
                     hawarcha




Achis,
Coyos (Cajamarca),
Achita (Ayacucho),
                          Coime, coimi,
Kiwicha (Cusco)
                          cuimi, millmi
   9/5/2012                               8
Not only food names vary




9/5/2012                              9
Requirements for rendering
           multilinguality in AGROVOC
1. Unambiguously express the geographic area
   where a given word is used
      – specification of the area of use of a given word
        should be optional.
2. No limitations on the type of area allowed
      – Countries, groups of countries, geographical or
        administrative regions should be equally available
        for specification.


9/5/2012                    KISAF, Rome                    10
AGROVOC as a SKOS resource
• skos:Concept is to indicate a group of words in
  various languages, to be considered translations of
  one another
• URI are kept “abstract” to emphasize independence
  of the concept from language
      – E.g. http://aims.fao.org/aos/agrovoc/c_12332
• The words grouped are then labels of the given
  concept




9/5/2012                                                11
SKOS properties to express terms
• skos:prefLabel, skos:altLabel
      – take plain literals as values
      – and an optional language tag expressed by XML
        attribute xml:lang
• skosxl:prefLabel, skosxl:altLabel
      – Take entities with URIs, so extra infomation be
        attached to labels




9/5/2012                                                  12
AGROVOC uses ISO 639 2 digits
       to tag languages in xml:lang
• ISO 639 provides codes for languages
  independently of
      – the country where they are spoken:
           • Spanish, Basque (same country, both official languages)
           • Dutch, Flamish (different country, similar enough
             languages…)
      – And their status: French and Breton (same
        country, Breton has no status)
• Only one code for English, Spanish…
• Limitations shown from previous examples
9/5/2012                        KISAF, Rome                        13
Multilinguality
ISO 639
Language
codes




 9/5/2012                     14
Is ISO 639 3 digits an option?
• More languages are included
      – More contemporary languages
           • Bemba language
      – “Old” languages (no longer spoken)
           • Old French (842ca-1400)
      – Groups of languages
           • Cuacasian languages
      – Artificial languages
• Same approach as the 2 digit version
9/5/2012                       KISAF, Rome   15
Is IETF an option?
• Internet Engineering Task Force (IETF)
• IETF 5646 Tags for identifying languages
      – Basis is ISO for languages (639)
      – Subtags from ISO for countries (3166), ISO for
        scripts (15924)
• Examples:
      – tr-CY = Turkish from Cyprus
      – zh-Hant-HK = Chinese in traditional Chinese script

9/5/2012                   KISAF, Rome                   16
Is a relational approach an option?
• Keep tagging approach to mark the language
      – Use ISO 639 or IETF
• And introduce a relational notion of “where a
  given word is used”
• Link together a concept representing a
  geographic area, and the object to name
      – E.g., Kiwicha isNameUsedInRegion Cusco
• Aim at “standard” relations…
9/5/2012                      KISAF, Rome         17
Conclusions?
• This is work in progress
• We continue working out use cases, especially
  from Spanish and Portuguese
• Assess alternatives




9/5/2012             KISAF, Rome              18

Más contenido relacionado

Similar a Is ISO 639 enough for a multilingual thesaurus? The AGROVOC case

Representing Translations on the Semantic Web
Representing Translations on the Semantic WebRepresenting Translations on the Semantic Web
Representing Translations on the Semantic WebOscar Corcho
 
The GACS Project by Caterina Caracciolo
The GACS Project by Caterina CaraccioloThe GACS Project by Caterina Caracciolo
The GACS Project by Caterina CaraccioloCIARD Movement
 
Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Pascual Pérez-Paredes
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen
 
Darwin Core extension for genebanks (germplasm), at Kansas University (May 2012)
Darwin Core extension for genebanks (germplasm), at Kansas University (May 2012)Darwin Core extension for genebanks (germplasm), at Kansas University (May 2012)
Darwin Core extension for genebanks (germplasm), at Kansas University (May 2012)Dag Endresen
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen
 
Issues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishIssues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishGuy De Pauw
 
A Two-Speed Language Evolution - Protolang Torun - September 2011
A Two-Speed Language Evolution - Protolang Torun - September 2011A Two-Speed Language Evolution - Protolang Torun - September 2011
A Two-Speed Language Evolution - Protolang Torun - September 2011Olaf Witkowski
 
eROSA Stakeholder WS1: AgroPortal: a vocabulary and ontology repository for a...
eROSA Stakeholder WS1: AgroPortal: a vocabulary and ontology repository for a...eROSA Stakeholder WS1: AgroPortal: a vocabulary and ontology repository for a...
eROSA Stakeholder WS1: AgroPortal: a vocabulary and ontology repository for a...e-ROSA
 
ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)ICANN
 
LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...
LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...
LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...locloud
 
A Brief Introduction to SKOS
A Brief Introduction to SKOSA Brief Introduction to SKOS
A Brief Introduction to SKOSHeather Hedden
 
ISO 25964: Thesauri and Interoperability with Other Vocabularies
ISO 25964: Thesauri and Interoperability with Other VocabulariesISO 25964: Thesauri and Interoperability with Other Vocabularies
ISO 25964: Thesauri and Interoperability with Other VocabulariesMarcia Zeng
 
Corpora in language teaching
Corpora in language teachingCorpora in language teaching
Corpora in language teachingJonathan Smart
 
Towards a Web Search Service for Minority Language Communities
Towards a Web Search Service for Minority Language CommunitiesTowards a Web Search Service for Minority Language Communities
Towards a Web Search Service for Minority Language CommunitiesBaden Hughes
 
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...Daniel Vila Suero
 
Multilingual issues in the representation of international bibliographic stan...
Multilingual issues in the representation of international bibliographic stan...Multilingual issues in the representation of international bibliographic stan...
Multilingual issues in the representation of international bibliographic stan...Gordon Dunsire
 
California World Language Standards Update
California World Language Standards UpdateCalifornia World Language Standards Update
California World Language Standards UpdateCarla Piper
 

Similar a Is ISO 639 enough for a multilingual thesaurus? The AGROVOC case (20)

Representing Translations on the Semantic Web
Representing Translations on the Semantic WebRepresenting Translations on the Semantic Web
Representing Translations on the Semantic Web
 
The GACS Project by Caterina Caracciolo
The GACS Project by Caterina CaraccioloThe GACS Project by Caterina Caracciolo
The GACS Project by Caterina Caracciolo
 
Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...
 
Darwin Core extension for genebanks (germplasm), at Kansas University (May 2012)
Darwin Core extension for genebanks (germplasm), at Kansas University (May 2012)Darwin Core extension for genebanks (germplasm), at Kansas University (May 2012)
Darwin Core extension for genebanks (germplasm), at Kansas University (May 2012)
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...
 
Issues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishIssues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken Irish
 
A Two-Speed Language Evolution - Protolang Torun - September 2011
A Two-Speed Language Evolution - Protolang Torun - September 2011A Two-Speed Language Evolution - Protolang Torun - September 2011
A Two-Speed Language Evolution - Protolang Torun - September 2011
 
eROSA Stakeholder WS1: AgroPortal: a vocabulary and ontology repository for a...
eROSA Stakeholder WS1: AgroPortal: a vocabulary and ontology repository for a...eROSA Stakeholder WS1: AgroPortal: a vocabulary and ontology repository for a...
eROSA Stakeholder WS1: AgroPortal: a vocabulary and ontology repository for a...
 
ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)
 
LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...
LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...
LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...
 
A Brief Introduction to SKOS
A Brief Introduction to SKOSA Brief Introduction to SKOS
A Brief Introduction to SKOS
 
ISO 25964: Thesauri and Interoperability with Other Vocabularies
ISO 25964: Thesauri and Interoperability with Other VocabulariesISO 25964: Thesauri and Interoperability with Other Vocabularies
ISO 25964: Thesauri and Interoperability with Other Vocabularies
 
Corpora in language teaching
Corpora in language teachingCorpora in language teaching
Corpora in language teaching
 
Towards a Web Search Service for Minority Language Communities
Towards a Web Search Service for Minority Language CommunitiesTowards a Web Search Service for Minority Language Communities
Towards a Web Search Service for Minority Language Communities
 
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
 
Multilingual issues in the representation of international bibliographic stan...
Multilingual issues in the representation of international bibliographic stan...Multilingual issues in the representation of international bibliographic stan...
Multilingual issues in the representation of international bibliographic stan...
 
California World Language Standards Update
California World Language Standards UpdateCalifornia World Language Standards Update
California World Language Standards Update
 
2005 09 Dc Keynote
2005 09 Dc Keynote2005 09 Dc Keynote
2005 09 Dc Keynote
 
AgriOcean DSpace: an introduction
AgriOcean DSpace: an introductionAgriOcean DSpace: an introduction
AgriOcean DSpace: an introduction
 

Más de catecara

2015 10 panel_if_pisa_caracciolo
2015 10 panel_if_pisa_caracciolo2015 10 panel_if_pisa_caracciolo
2015 10 panel_if_pisa_caracciolocatecara
 
2015 01 godan_wageningen_gacs
2015 01 godan_wageningen_gacs2015 01 godan_wageningen_gacs
2015 01 godan_wageningen_gacscatecara
 
Agrovoc cswb training_4
Agrovoc cswb training_4Agrovoc cswb training_4
Agrovoc cswb training_4catecara
 
Agrovoc cswb training_3
Agrovoc cswb training_3Agrovoc cswb training_3
Agrovoc cswb training_3catecara
 
Agrovoc cswb training_2
Agrovoc cswb training_2Agrovoc cswb training_2
Agrovoc cswb training_2catecara
 
Agrovoc cswb training_2
Agrovoc cswb training_2Agrovoc cswb training_2
Agrovoc cswb training_2catecara
 
Agrovoc cswb training_1
Agrovoc cswb training_1Agrovoc cswb training_1
Agrovoc cswb training_1catecara
 

Más de catecara (7)

2015 10 panel_if_pisa_caracciolo
2015 10 panel_if_pisa_caracciolo2015 10 panel_if_pisa_caracciolo
2015 10 panel_if_pisa_caracciolo
 
2015 01 godan_wageningen_gacs
2015 01 godan_wageningen_gacs2015 01 godan_wageningen_gacs
2015 01 godan_wageningen_gacs
 
Agrovoc cswb training_4
Agrovoc cswb training_4Agrovoc cswb training_4
Agrovoc cswb training_4
 
Agrovoc cswb training_3
Agrovoc cswb training_3Agrovoc cswb training_3
Agrovoc cswb training_3
 
Agrovoc cswb training_2
Agrovoc cswb training_2Agrovoc cswb training_2
Agrovoc cswb training_2
 
Agrovoc cswb training_2
Agrovoc cswb training_2Agrovoc cswb training_2
Agrovoc cswb training_2
 
Agrovoc cswb training_1
Agrovoc cswb training_1Agrovoc cswb training_1
Agrovoc cswb training_1
 

Último

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Is ISO 639 enough for a multilingual thesaurus? The AGROVOC case

  • 1. AIMS Is ISO 639 enough for a multilingual thesaurus? The AGROVOC case Caterina Caracciolo, Gudrun Johannsen, Lavanya Kiran, Johannes Keizer Food and Agriculture Organization of the UN AOS 2012 Sept 4. 2012 - Kuching (MY)
  • 2. Background • AGROVOC is published in 21 languages + other under development • Multilinguality has always been an issue • Since the beginning, multilinguality was interpreted as “translation”: – One hierarchy of terms (one structure), translations in various languages • This organization remained with the move from a term-centered to a concept-centered resource 9/5/2012 2
  • 3. AGROVOC as object-centered resource… • Being mainly a resource for document indexing in the area of agriculture, it contains large amount of words referring to plants, animals, food in general 9/5/2012 3
  • 4. # of concepts below top concepts organism substances entities phenomena activities products methods properties features objects resources subjects systems locations Series1 groups measures state stages technology processes factors time events site strategies 9/5/2012 4 0 5000 10000 15000 20000 25000
  • 5. Differentiating languages • Salmon (en) • Salmón (es) • лососи (ru) 9/5/2012 5
  • 6. But distribution of languages may be wide… 9/5/2012 6
  • 7. … and names of food tend to vary… Aguacate Palta 9/5/2012 7
  • 8. … and names of food tend to vary… Ataco morado, sangorache, sergorache, hawarcha Achis, Coyos (Cajamarca), Achita (Ayacucho), Coime, coimi, Kiwicha (Cusco) cuimi, millmi 9/5/2012 8
  • 9. Not only food names vary 9/5/2012 9
  • 10. Requirements for rendering multilinguality in AGROVOC 1. Unambiguously express the geographic area where a given word is used – specification of the area of use of a given word should be optional. 2. No limitations on the type of area allowed – Countries, groups of countries, geographical or administrative regions should be equally available for specification. 9/5/2012 KISAF, Rome 10
  • 11. AGROVOC as a SKOS resource • skos:Concept is to indicate a group of words in various languages, to be considered translations of one another • URI are kept “abstract” to emphasize independence of the concept from language – E.g. http://aims.fao.org/aos/agrovoc/c_12332 • The words grouped are then labels of the given concept 9/5/2012 11
  • 12. SKOS properties to express terms • skos:prefLabel, skos:altLabel – take plain literals as values – and an optional language tag expressed by XML attribute xml:lang • skosxl:prefLabel, skosxl:altLabel – Take entities with URIs, so extra infomation be attached to labels 9/5/2012 12
  • 13. AGROVOC uses ISO 639 2 digits to tag languages in xml:lang • ISO 639 provides codes for languages independently of – the country where they are spoken: • Spanish, Basque (same country, both official languages) • Dutch, Flamish (different country, similar enough languages…) – And their status: French and Breton (same country, Breton has no status) • Only one code for English, Spanish… • Limitations shown from previous examples 9/5/2012 KISAF, Rome 13
  • 15. Is ISO 639 3 digits an option? • More languages are included – More contemporary languages • Bemba language – “Old” languages (no longer spoken) • Old French (842ca-1400) – Groups of languages • Cuacasian languages – Artificial languages • Same approach as the 2 digit version 9/5/2012 KISAF, Rome 15
  • 16. Is IETF an option? • Internet Engineering Task Force (IETF) • IETF 5646 Tags for identifying languages – Basis is ISO for languages (639) – Subtags from ISO for countries (3166), ISO for scripts (15924) • Examples: – tr-CY = Turkish from Cyprus – zh-Hant-HK = Chinese in traditional Chinese script 9/5/2012 KISAF, Rome 16
  • 17. Is a relational approach an option? • Keep tagging approach to mark the language – Use ISO 639 or IETF • And introduce a relational notion of “where a given word is used” • Link together a concept representing a geographic area, and the object to name – E.g., Kiwicha isNameUsedInRegion Cusco • Aim at “standard” relations… 9/5/2012 KISAF, Rome 17
  • 18. Conclusions? • This is work in progress • We continue working out use cases, especially from Spanish and Portuguese • Assess alternatives 9/5/2012 KISAF, Rome 18

Notas del editor

  1. DCMI initiative!