UTX is a simple glossary format developed by AAMT for terminology tools and machine translation. It allows for easy creation, sharing, and reuse of glossary data through a standardized tab-delimited format that is manageable in spreadsheets. UTX aims to be non-expert friendly and can serve as initial format for glossary work that can later be converted to more advanced formats like TBX.
1. UTX, a simple glossary format
Yamamoto Yuji
JISC member
UTX Team Leader, AAMT
Representative, CosmosHouse
(Originally delivered on June 25, 2015. Modified in July 9, 2015)
3. AAMT
(Asia-Pacific Association for MT)
MT users
MT researchersMT manufacturers
http://www.aamt.info/
EAMT
(Europe)
AAMT
(Asia-Pacific)
AMTA
(Americas)
IAMT: International Association for MT
5. What is UTX?
UTX stands for Universal Terminological eXchange
Developed by AAMT (Asia-Pacific Association for MT)
Simple glossary format for terminology tools and MT
For creating, sharing, and reusing glossary data
6. • Easy to create and manage on
a spreadsheet
Simple tab-
delimited
• Quality control via term statusReliable
• MT and termbase tools
Convertible to
other formats
• Manage bilingual glossaries for
both ways
Bidirectional
bi/multilingual
4 merits of the UTX glossary format
7. UTX is non-expert friendly
• Systematic approach to translation is
not yet fully developed in Japan.
–No translation major in universities.
• There are many individuals and small
LSPs who could benefit from
standardized glossary formats.
• UTX is especially easy to use when you
start creating a glossary.
8. UTX as a step towards a more
complicated format (TBX)
No glossary TBX
No
glossary
UTX TBX
This is the hard part!
10. UTX facilitates
sharing and reusing of glossaries
Translation Client
Language Service
Provider
Translator A
UTX
glossaries
Translator B
11. UTX is Rule-based MT friendly
• Statistical MT is less accurate with Japanese.
– difference of language structures.
• In Japan, RbMT packages (Toshiba, Fujitsu,
Cross Language, Kodensha, and more) are
available.
• Some SMT (and hybrid MT) can also use UTX.
12. UTX can handle conversions to
simple formats
• Some information might be lost, but
still useful.
• Some users/tools don’t need the
awesome power of TBX.
14. UTX glossary sample
#UTX 1.11; en-US/zh-CN; 2014-09-25; copyright: AAMT (2012); license: CC-by 3.0
#src tgt src:pos term status
Asia-Pacific Association for
Machine Translation
亚洲太平洋机器翻译协会 properNoun approved
dictionary administrator 字典管理员 noun approved
contributor 用语提交者 noun provisional
domain 领域 noun
glossary 词汇表 noun
bidirectional 双向 adjective approved
merge 合并 verb approved
Source term
(American English)
Part of
speech
Term statusTarget term
(Chinese)
Manage essential glossary data in a standardized format
Information about the glossary (creation date, license, etc.)
Term status provides reliability
15. Use case examples
• 2.2 million entries (as of 2015) are created by
the Japan Patent Office.
– Chinese-Japanese glossary
• Glossary data for MT/interpretation for
tourism.
16. Conclusion
1. UTX can help non-experts.
2. A UTX glossary serves as a basis for
a TBX glossary.
3. UTX addresses the need of MT for
non-European languages (Japanese,
Korean, etc.)