SlideShare una empresa de Scribd logo
1 de 18
AIMS
Is ISO 639 enough for a multilingual
            thesaurus?
            The AGROVOC case

   Caterina Caracciolo, Gudrun Johannsen, Lavanya
               Kiran, Johannes Keizer
    Food and Agriculture Organization of the UN
                      AOS 2012
             Sept 4. 2012 - Kuching (MY)
Background
• AGROVOC is published in 21 languages + other
  under development
• Multilinguality has always been an issue
• Since the beginning, multilinguality was
  interpreted as “translation”:
      – One hierarchy of terms (one
        structure), translations in various languages
• This organization remained with the move
  from a term-centered to a concept-centered
  resource
9/5/2012                                                2
AGROVOC as object-centered
                 resource…
• Being mainly a resource for document
  indexing in the area of agriculture, it contains
  large amount of words referring to
  plants, animals, food in general




9/5/2012                                             3
# of concepts below top concepts
  organism
 substances
    entities
phenomena
   activities
  products
   methods
 properties
    features
     objects
  resources
    subjects
    systems
   locations                                               Series1
     groups
  measures
       state
      stages
 technology
  processes
     factors
        time
     events
         site
  strategies
9/5/2012                                                       4
                0   5000   10000   15000   20000   25000
Differentiating languages
• Salmon (en)
• Salmón (es)
• лососи (ru)




9/5/2012                               5
But distribution of languages may
              be wide…




9/5/2012                              6
… and names of food tend to vary…


Aguacate




            Palta




 9/5/2012                        7
… and names of food tend to vary…
                     Ataco morado,
                     sangorache,
                     sergorache,
                     hawarcha




Achis,
Coyos (Cajamarca),
Achita (Ayacucho),
                          Coime, coimi,
Kiwicha (Cusco)
                          cuimi, millmi
   9/5/2012                               8
Not only food names vary




9/5/2012                              9
Requirements for rendering
           multilinguality in AGROVOC
1. Unambiguously express the geographic area
   where a given word is used
      – specification of the area of use of a given word
        should be optional.
2. No limitations on the type of area allowed
      – Countries, groups of countries, geographical or
        administrative regions should be equally available
        for specification.


9/5/2012                    KISAF, Rome                    10
AGROVOC as a SKOS resource
• skos:Concept is to indicate a group of words in
  various languages, to be considered translations of
  one another
• URI are kept “abstract” to emphasize independence
  of the concept from language
      – E.g. http://aims.fao.org/aos/agrovoc/c_12332
• The words grouped are then labels of the given
  concept




9/5/2012                                                11
SKOS properties to express terms
• skos:prefLabel, skos:altLabel
      – take plain literals as values
      – and an optional language tag expressed by XML
        attribute xml:lang
• skosxl:prefLabel, skosxl:altLabel
      – Take entities with URIs, so extra infomation be
        attached to labels




9/5/2012                                                  12
AGROVOC uses ISO 639 2 digits
       to tag languages in xml:lang
• ISO 639 provides codes for languages
  independently of
      – the country where they are spoken:
           • Spanish, Basque (same country, both official languages)
           • Dutch, Flamish (different country, similar enough
             languages…)
      – And their status: French and Breton (same
        country, Breton has no status)
• Only one code for English, Spanish…
• Limitations shown from previous examples
9/5/2012                        KISAF, Rome                        13
Multilinguality
ISO 639
Language
codes




 9/5/2012                     14
Is ISO 639 3 digits an option?
• More languages are included
      – More contemporary languages
           • Bemba language
      – “Old” languages (no longer spoken)
           • Old French (842ca-1400)
      – Groups of languages
           • Cuacasian languages
      – Artificial languages
• Same approach as the 2 digit version
9/5/2012                       KISAF, Rome   15
Is IETF an option?
• Internet Engineering Task Force (IETF)
• IETF 5646 Tags for identifying languages
      – Basis is ISO for languages (639)
      – Subtags from ISO for countries (3166), ISO for
        scripts (15924)
• Examples:
      – tr-CY = Turkish from Cyprus
      – zh-Hant-HK = Chinese in traditional Chinese script

9/5/2012                   KISAF, Rome                   16
Is a relational approach an option?
• Keep tagging approach to mark the language
      – Use ISO 639 or IETF
• And introduce a relational notion of “where a
  given word is used”
• Link together a concept representing a
  geographic area, and the object to name
      – E.g., Kiwicha isNameUsedInRegion Cusco
• Aim at “standard” relations…
9/5/2012                      KISAF, Rome         17
Conclusions?
• This is work in progress
• We continue working out use cases, especially
  from Spanish and Portuguese
• Assess alternatives




9/5/2012             KISAF, Rome              18

Más contenido relacionado

Similar a Caracciolo et al_2012_aos_agrovoc_multilinguality

Representing Translations on the Semantic Web
Representing Translations on the Semantic WebRepresenting Translations on the Semantic Web
Representing Translations on the Semantic WebOscar Corcho
 
The GACS Project by Caterina Caracciolo
The GACS Project by Caterina CaraccioloThe GACS Project by Caterina Caracciolo
The GACS Project by Caterina CaraccioloCIARD Movement
 
Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Pascual Pérez-Paredes
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen
 
Darwin Core extension for genebanks (germplasm), at Kansas University (May 2012)
Darwin Core extension for genebanks (germplasm), at Kansas University (May 2012)Darwin Core extension for genebanks (germplasm), at Kansas University (May 2012)
Darwin Core extension for genebanks (germplasm), at Kansas University (May 2012)Dag Endresen
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen
 
Issues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishIssues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishGuy De Pauw
 
A Two-Speed Language Evolution - Protolang Torun - September 2011
A Two-Speed Language Evolution - Protolang Torun - September 2011A Two-Speed Language Evolution - Protolang Torun - September 2011
A Two-Speed Language Evolution - Protolang Torun - September 2011Olaf Witkowski
 
eROSA Stakeholder WS1: AgroPortal: a vocabulary and ontology repository for a...
eROSA Stakeholder WS1: AgroPortal: a vocabulary and ontology repository for a...eROSA Stakeholder WS1: AgroPortal: a vocabulary and ontology repository for a...
eROSA Stakeholder WS1: AgroPortal: a vocabulary and ontology repository for a...e-ROSA
 
ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)ICANN
 
LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...
LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...
LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...locloud
 
A Brief Introduction to SKOS
A Brief Introduction to SKOSA Brief Introduction to SKOS
A Brief Introduction to SKOSHeather Hedden
 
ISO 25964: Thesauri and Interoperability with Other Vocabularies
ISO 25964: Thesauri and Interoperability with Other VocabulariesISO 25964: Thesauri and Interoperability with Other Vocabularies
ISO 25964: Thesauri and Interoperability with Other VocabulariesMarcia Zeng
 
Corpora in language teaching
Corpora in language teachingCorpora in language teaching
Corpora in language teachingJonathan Smart
 
Towards a Web Search Service for Minority Language Communities
Towards a Web Search Service for Minority Language CommunitiesTowards a Web Search Service for Minority Language Communities
Towards a Web Search Service for Minority Language CommunitiesBaden Hughes
 
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...Daniel Vila Suero
 
Multilingual issues in the representation of international bibliographic stan...
Multilingual issues in the representation of international bibliographic stan...Multilingual issues in the representation of international bibliographic stan...
Multilingual issues in the representation of international bibliographic stan...Gordon Dunsire
 
California World Language Standards Update
California World Language Standards UpdateCalifornia World Language Standards Update
California World Language Standards UpdateCarla Piper
 

Similar a Caracciolo et al_2012_aos_agrovoc_multilinguality (20)

Representing Translations on the Semantic Web
Representing Translations on the Semantic WebRepresenting Translations on the Semantic Web
Representing Translations on the Semantic Web
 
The GACS Project by Caterina Caracciolo
The GACS Project by Caterina CaraccioloThe GACS Project by Caterina Caracciolo
The GACS Project by Caterina Caracciolo
 
Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...
 
Darwin Core extension for genebanks (germplasm), at Kansas University (May 2012)
Darwin Core extension for genebanks (germplasm), at Kansas University (May 2012)Darwin Core extension for genebanks (germplasm), at Kansas University (May 2012)
Darwin Core extension for genebanks (germplasm), at Kansas University (May 2012)
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...
 
Issues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishIssues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken Irish
 
A Two-Speed Language Evolution - Protolang Torun - September 2011
A Two-Speed Language Evolution - Protolang Torun - September 2011A Two-Speed Language Evolution - Protolang Torun - September 2011
A Two-Speed Language Evolution - Protolang Torun - September 2011
 
eROSA Stakeholder WS1: AgroPortal: a vocabulary and ontology repository for a...
eROSA Stakeholder WS1: AgroPortal: a vocabulary and ontology repository for a...eROSA Stakeholder WS1: AgroPortal: a vocabulary and ontology repository for a...
eROSA Stakeholder WS1: AgroPortal: a vocabulary and ontology repository for a...
 
ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)
 
LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...
LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...
LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...
 
A Brief Introduction to SKOS
A Brief Introduction to SKOSA Brief Introduction to SKOS
A Brief Introduction to SKOS
 
ISO 25964: Thesauri and Interoperability with Other Vocabularies
ISO 25964: Thesauri and Interoperability with Other VocabulariesISO 25964: Thesauri and Interoperability with Other Vocabularies
ISO 25964: Thesauri and Interoperability with Other Vocabularies
 
Corpora in language teaching
Corpora in language teachingCorpora in language teaching
Corpora in language teaching
 
Towards a Web Search Service for Minority Language Communities
Towards a Web Search Service for Minority Language CommunitiesTowards a Web Search Service for Minority Language Communities
Towards a Web Search Service for Minority Language Communities
 
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
 
Multilingual issues in the representation of international bibliographic stan...
Multilingual issues in the representation of international bibliographic stan...Multilingual issues in the representation of international bibliographic stan...
Multilingual issues in the representation of international bibliographic stan...
 
California World Language Standards Update
California World Language Standards UpdateCalifornia World Language Standards Update
California World Language Standards Update
 
2005 09 Dc Keynote
2005 09 Dc Keynote2005 09 Dc Keynote
2005 09 Dc Keynote
 
AgriOcean DSpace: an introduction
AgriOcean DSpace: an introductionAgriOcean DSpace: an introduction
AgriOcean DSpace: an introduction
 

Más de catecara

2015 10 panel_if_pisa_caracciolo
2015 10 panel_if_pisa_caracciolo2015 10 panel_if_pisa_caracciolo
2015 10 panel_if_pisa_caracciolocatecara
 
2015 01 godan_wageningen_gacs
2015 01 godan_wageningen_gacs2015 01 godan_wageningen_gacs
2015 01 godan_wageningen_gacscatecara
 
Agrovoc cswb training_4
Agrovoc cswb training_4Agrovoc cswb training_4
Agrovoc cswb training_4catecara
 
Agrovoc cswb training_3
Agrovoc cswb training_3Agrovoc cswb training_3
Agrovoc cswb training_3catecara
 
Agrovoc cswb training_2
Agrovoc cswb training_2Agrovoc cswb training_2
Agrovoc cswb training_2catecara
 
Agrovoc cswb training_2
Agrovoc cswb training_2Agrovoc cswb training_2
Agrovoc cswb training_2catecara
 
Agrovoc cswb training_1
Agrovoc cswb training_1Agrovoc cswb training_1
Agrovoc cswb training_1catecara
 

Más de catecara (7)

2015 10 panel_if_pisa_caracciolo
2015 10 panel_if_pisa_caracciolo2015 10 panel_if_pisa_caracciolo
2015 10 panel_if_pisa_caracciolo
 
2015 01 godan_wageningen_gacs
2015 01 godan_wageningen_gacs2015 01 godan_wageningen_gacs
2015 01 godan_wageningen_gacs
 
Agrovoc cswb training_4
Agrovoc cswb training_4Agrovoc cswb training_4
Agrovoc cswb training_4
 
Agrovoc cswb training_3
Agrovoc cswb training_3Agrovoc cswb training_3
Agrovoc cswb training_3
 
Agrovoc cswb training_2
Agrovoc cswb training_2Agrovoc cswb training_2
Agrovoc cswb training_2
 
Agrovoc cswb training_2
Agrovoc cswb training_2Agrovoc cswb training_2
Agrovoc cswb training_2
 
Agrovoc cswb training_1
Agrovoc cswb training_1Agrovoc cswb training_1
Agrovoc cswb training_1
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Último (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Caracciolo et al_2012_aos_agrovoc_multilinguality

  • 1. AIMS Is ISO 639 enough for a multilingual thesaurus? The AGROVOC case Caterina Caracciolo, Gudrun Johannsen, Lavanya Kiran, Johannes Keizer Food and Agriculture Organization of the UN AOS 2012 Sept 4. 2012 - Kuching (MY)
  • 2. Background • AGROVOC is published in 21 languages + other under development • Multilinguality has always been an issue • Since the beginning, multilinguality was interpreted as “translation”: – One hierarchy of terms (one structure), translations in various languages • This organization remained with the move from a term-centered to a concept-centered resource 9/5/2012 2
  • 3. AGROVOC as object-centered resource… • Being mainly a resource for document indexing in the area of agriculture, it contains large amount of words referring to plants, animals, food in general 9/5/2012 3
  • 4. # of concepts below top concepts organism substances entities phenomena activities products methods properties features objects resources subjects systems locations Series1 groups measures state stages technology processes factors time events site strategies 9/5/2012 4 0 5000 10000 15000 20000 25000
  • 5. Differentiating languages • Salmon (en) • Salmón (es) • лососи (ru) 9/5/2012 5
  • 6. But distribution of languages may be wide… 9/5/2012 6
  • 7. … and names of food tend to vary… Aguacate Palta 9/5/2012 7
  • 8. … and names of food tend to vary… Ataco morado, sangorache, sergorache, hawarcha Achis, Coyos (Cajamarca), Achita (Ayacucho), Coime, coimi, Kiwicha (Cusco) cuimi, millmi 9/5/2012 8
  • 9. Not only food names vary 9/5/2012 9
  • 10. Requirements for rendering multilinguality in AGROVOC 1. Unambiguously express the geographic area where a given word is used – specification of the area of use of a given word should be optional. 2. No limitations on the type of area allowed – Countries, groups of countries, geographical or administrative regions should be equally available for specification. 9/5/2012 KISAF, Rome 10
  • 11. AGROVOC as a SKOS resource • skos:Concept is to indicate a group of words in various languages, to be considered translations of one another • URI are kept “abstract” to emphasize independence of the concept from language – E.g. http://aims.fao.org/aos/agrovoc/c_12332 • The words grouped are then labels of the given concept 9/5/2012 11
  • 12. SKOS properties to express terms • skos:prefLabel, skos:altLabel – take plain literals as values – and an optional language tag expressed by XML attribute xml:lang • skosxl:prefLabel, skosxl:altLabel – Take entities with URIs, so extra infomation be attached to labels 9/5/2012 12
  • 13. AGROVOC uses ISO 639 2 digits to tag languages in xml:lang • ISO 639 provides codes for languages independently of – the country where they are spoken: • Spanish, Basque (same country, both official languages) • Dutch, Flamish (different country, similar enough languages…) – And their status: French and Breton (same country, Breton has no status) • Only one code for English, Spanish… • Limitations shown from previous examples 9/5/2012 KISAF, Rome 13
  • 15. Is ISO 639 3 digits an option? • More languages are included – More contemporary languages • Bemba language – “Old” languages (no longer spoken) • Old French (842ca-1400) – Groups of languages • Cuacasian languages – Artificial languages • Same approach as the 2 digit version 9/5/2012 KISAF, Rome 15
  • 16. Is IETF an option? • Internet Engineering Task Force (IETF) • IETF 5646 Tags for identifying languages – Basis is ISO for languages (639) – Subtags from ISO for countries (3166), ISO for scripts (15924) • Examples: – tr-CY = Turkish from Cyprus – zh-Hant-HK = Chinese in traditional Chinese script 9/5/2012 KISAF, Rome 16
  • 17. Is a relational approach an option? • Keep tagging approach to mark the language – Use ISO 639 or IETF • And introduce a relational notion of “where a given word is used” • Link together a concept representing a geographic area, and the object to name – E.g., Kiwicha isNameUsedInRegion Cusco • Aim at “standard” relations… 9/5/2012 KISAF, Rome 17
  • 18. Conclusions? • This is work in progress • We continue working out use cases, especially from Spanish and Portuguese • Assess alternatives 9/5/2012 KISAF, Rome 18

Notas del editor

  1. DCMI initiative!