The document discusses challenges in representing multilingual concepts in AGROVOC, a thesaurus on agriculture. It analyzes limitations of using only ISO 639 language codes and explores options for more precisely specifying the geographical context of language usage, including using ISO 639-3 codes, IETF language tags, or a relational approach linking concepts to regions. The aim is to unambiguously express variations in language and names for foods, plants, and animals across locations to better represent multilingual data in the thesaurus.
Axa Assurance Maroc - Insurer Innovation Award 2024
Is ISO 639 enough for a multilingual thesaurus? The AGROVOC case
1. AIMS
Is ISO 639 enough for a multilingual
thesaurus?
The AGROVOC case
Caterina Caracciolo, Gudrun Johannsen, Lavanya
Kiran, Johannes Keizer
Food and Agriculture Organization of the UN
AOS 2012
Sept 4. 2012 - Kuching (MY)
2. Background
• AGROVOC is published in 21 languages + other
under development
• Multilinguality has always been an issue
• Since the beginning, multilinguality was
interpreted as “translation”:
– One hierarchy of terms (one
structure), translations in various languages
• This organization remained with the move
from a term-centered to a concept-centered
resource
9/5/2012 2
3. AGROVOC as object-centered
resource…
• Being mainly a resource for document
indexing in the area of agriculture, it contains
large amount of words referring to
plants, animals, food in general
9/5/2012 3
4. # of concepts below top concepts
organism
substances
entities
phenomena
activities
products
methods
properties
features
objects
resources
subjects
systems
locations Series1
groups
measures
state
stages
technology
processes
factors
time
events
site
strategies
9/5/2012 4
0 5000 10000 15000 20000 25000
10. Requirements for rendering
multilinguality in AGROVOC
1. Unambiguously express the geographic area
where a given word is used
– specification of the area of use of a given word
should be optional.
2. No limitations on the type of area allowed
– Countries, groups of countries, geographical or
administrative regions should be equally available
for specification.
9/5/2012 KISAF, Rome 10
11. AGROVOC as a SKOS resource
• skos:Concept is to indicate a group of words in
various languages, to be considered translations of
one another
• URI are kept “abstract” to emphasize independence
of the concept from language
– E.g. http://aims.fao.org/aos/agrovoc/c_12332
• The words grouped are then labels of the given
concept
9/5/2012 11
12. SKOS properties to express terms
• skos:prefLabel, skos:altLabel
– take plain literals as values
– and an optional language tag expressed by XML
attribute xml:lang
• skosxl:prefLabel, skosxl:altLabel
– Take entities with URIs, so extra infomation be
attached to labels
9/5/2012 12
13. AGROVOC uses ISO 639 2 digits
to tag languages in xml:lang
• ISO 639 provides codes for languages
independently of
– the country where they are spoken:
• Spanish, Basque (same country, both official languages)
• Dutch, Flamish (different country, similar enough
languages…)
– And their status: French and Breton (same
country, Breton has no status)
• Only one code for English, Spanish…
• Limitations shown from previous examples
9/5/2012 KISAF, Rome 13
15. Is ISO 639 3 digits an option?
• More languages are included
– More contemporary languages
• Bemba language
– “Old” languages (no longer spoken)
• Old French (842ca-1400)
– Groups of languages
• Cuacasian languages
– Artificial languages
• Same approach as the 2 digit version
9/5/2012 KISAF, Rome 15
16. Is IETF an option?
• Internet Engineering Task Force (IETF)
• IETF 5646 Tags for identifying languages
– Basis is ISO for languages (639)
– Subtags from ISO for countries (3166), ISO for
scripts (15924)
• Examples:
– tr-CY = Turkish from Cyprus
– zh-Hant-HK = Chinese in traditional Chinese script
9/5/2012 KISAF, Rome 16
17. Is a relational approach an option?
• Keep tagging approach to mark the language
– Use ISO 639 or IETF
• And introduce a relational notion of “where a
given word is used”
• Link together a concept representing a
geographic area, and the object to name
– E.g., Kiwicha isNameUsedInRegion Cusco
• Aim at “standard” relations…
9/5/2012 KISAF, Rome 17
18. Conclusions?
• This is work in progress
• We continue working out use cases, especially
from Spanish and Portuguese
• Assess alternatives
9/5/2012 KISAF, Rome 18