SlideShare una empresa de Scribd logo
1 de 12
Locale-Aware Sorting and Text
Handling in the
Open Toolkit
Eliot Kimber
Contrext
DITA OT Day 2017
大
话
功
夫
美丽的话
All translations via Google Translate—may be total nonsense
About the Author
关于作者
• Independent consultant focusing on DITA
analysis, design, and implementation
• Doing SGML and XML for cough 30 years cough
• Founding member of the DITA Technical
Committee
• Founding member of the XML Working Group
• Co-editor of HyTime standard (ISO/IEC 10744)
• Primary developer and founder of the DITA for
Publishers project
• Author of DITA for Practitioners, Vol 1 (XML Press)
DITA OT Day 201710/29/17 2
Requirements 要求
• Sort and group Simplified Chinese (and other
languages)
• Do rough layout of lines
– Requires finding work boundaries
– Requires finding line-break opportunities
• Target applications
– Sort and group glossaries, indexes, etc.
– Fixed-layout EPUBs
• Free, open-source solution
10/29/17 DITA OT Day 2017 3
Challenge: Simplified Chinese
挑战: 简体中文
• Simplified Chinese collates using pinyin
transliteration of words
• All other languages collate based on individual
characters
– Alphabetic and syllabic languages
– Traditional Chinese: Radical and stroke count
– Can be done with static per-character
configuration
10/29/17 DITA OT Day 2017 4
Simplified Chinese Requires a
Dictionary
简体中文需要一个词典
• Pinyin transliteration depends on whole word
– Same character may have different transliteration in
different words (多音字)
– As if “c” was pronounced “see” by itself but “tee” in
“cat”
– E.g.: 行 – xíng or hāng
• Requires a dictionary with pinyin transliteration
– Requires ability to find words
• Need word frequency as secondary sort
parameter or to disambiguate words
– Requires frequency dictionary
10/29/17 DITA OT Day 2017 5
Solution 解
• Use CC-CEDICT open-source dictionary
– ~155,000.00 entries
• ICU4J for word and line breaking
• Java 2D for text length approximation
• ICU4J for grouping and sorting of other languages
• Integrated with Saxon’s collator Java API
• XSLT functions for use from XSLT
• Configured as DITA OT plugin for use in OT
• Java and XSLT library not OT-specific
10/29/17 DITA OT Day 2017 6
Result: DITA Community i18n
Plugin
结果:DITA社区i18n插件
• Java: Custom collator for Simplified Chinese
– Implements Saxon RuleBasedCollator
– Can be used with Saxon 9.1+ or any Java code
– Locale-aware text analysis and metrics
• XSLT function library
– Grouping and sorting keys
– Word and line breaking
– Rendered text length given font details
• General grouping and sorting configuration file
mechanism
10/29/17 DITA OT Day 2017 7
Under Active Development
积极发展
• More accurate pinyin determination
• Use of word frequency data as secondary sort
key
10/29/17 DITA OT Day 2017 8
Not Yet Implemented
尚未落实
• Grouping for glossaries
– And other things you might want to group
• Extension/replacement for built-in OT index
grouping and sorting
10/29/17 DITA OT Day 2017 9
Demo 演示
If time permits
DITA OT Day 201710/29/17 10
Questions? 问
题
DITA OT Day 201710/29/17 11
Resources 资
源
• Me: ekimber@contrext.com
• DITA Community i18n project:
https://github.com/dita-community/org.dita-
community.i18n
• http://blog.tutorming.com/mandarin-chinese-
learning-tips/chinese-characters-with-various-
pronunciations
DITA OT Day 201710/29/17 12

Más contenido relacionado

Similar a Locale-Aware Sorting and Text Handling in the Open Toolkit

Can I Have a Word: Managing Shared Glossaries and References to Terms With DITA
Can I Have a Word: Managing Shared Glossaries and References to Terms With DITACan I Have a Word: Managing Shared Glossaries and References to Terms With DITA
Can I Have a Word: Managing Shared Glossaries and References to Terms With DITAContrext Solutions
 
Using CSS Paging to Render DITA Documents
Using CSS Paging to Render DITA DocumentsUsing CSS Paging to Render DITA Documents
Using CSS Paging to Render DITA DocumentsContrext Solutions
 
RELAX NG and DITA: An Almost Perfect Match
RELAX NG and DITA: An Almost Perfect MatchRELAX NG and DITA: An Almost Perfect Match
RELAX NG and DITA: An Almost Perfect MatchContrext Solutions
 
DITA Quick Start for Authors - Part I
DITA Quick Start for Authors - Part IDITA Quick Start for Authors - Part I
DITA Quick Start for Authors - Part ISuite Solutions
 
ICANN 51: IDN Program Update
ICANN 51: IDN Program UpdateICANN 51: IDN Program Update
ICANN 51: IDN Program UpdateICANN
 
Style Guides: Fashionable But Also Practical - TC Dojo, Single Sourcing
Style Guides: Fashionable But Also Practical - TC Dojo, Single SourcingStyle Guides: Fashionable But Also Practical - TC Dojo, Single Sourcing
Style Guides: Fashionable But Also Practical - TC Dojo, Single SourcingIXIASOFT
 
RoboCon 2018: How did we get here? Where do we go next?
RoboCon 2018: How did we get here? Where do we go next?RoboCon 2018: How did we get here? Where do we go next?
RoboCon 2018: How did we get here? Where do we go next?Pekka Klärck
 
Localization and DITA: What you Need to Know - LocWorld32
Localization and DITA: What you Need to Know - LocWorld32Localization and DITA: What you Need to Know - LocWorld32
Localization and DITA: What you Need to Know - LocWorld32IXIASOFT
 
Python workshop
Python workshopPython workshop
Python workshopShiraz LUG
 
DSL's with Groovy
DSL's with GroovyDSL's with Groovy
DSL's with Groovypaulbowler
 
Welcome to New Swift: Library Evolution & LSP Support
Welcome to New Swift: Library Evolution & LSP SupportWelcome to New Swift: Library Evolution & LSP Support
Welcome to New Swift: Library Evolution & LSP SupportG ABHISEK
 
DITA for Small Teams Workshop (Tekom 2017)
DITA for Small Teams Workshop (Tekom 2017)DITA for Small Teams Workshop (Tekom 2017)
DITA for Small Teams Workshop (Tekom 2017)Contrext Solutions
 
A Brief Look at DITA in Current Technical Communication Practices_SIGDOC 2017
A Brief Look at DITA in Current Technical Communication Practices_SIGDOC 2017A Brief Look at DITA in Current Technical Communication Practices_SIGDOC 2017
A Brief Look at DITA in Current Technical Communication Practices_SIGDOC 2017IXIASOFT
 
DITA Quick Start Webinar Series: Getting Started with the DITA Open Toolkit
DITA Quick Start Webinar Series: Getting Started with the DITA Open ToolkitDITA Quick Start Webinar Series: Getting Started with the DITA Open Toolkit
DITA Quick Start Webinar Series: Getting Started with the DITA Open ToolkitSuite Solutions
 
Using Markdown and Lightweight DITA in a Collaborative Environment
Using Markdown and Lightweight DITA in a Collaborative EnvironmentUsing Markdown and Lightweight DITA in a Collaborative Environment
Using Markdown and Lightweight DITA in a Collaborative EnvironmentIXIASOFT
 
Engineering Your Product Information for Local Markets
Engineering Your Product Information for Local MarketsEngineering Your Product Information for Local Markets
Engineering Your Product Information for Local MarketsTom Knupp
 
The Ring programming language version 1.7 book - Part 6 of 196
The Ring programming language version 1.7 book - Part 6 of 196The Ring programming language version 1.7 book - Part 6 of 196
The Ring programming language version 1.7 book - Part 6 of 196Mahmoud Samir Fayed
 

Similar a Locale-Aware Sorting and Text Handling in the Open Toolkit (20)

Can I Have a Word: Managing Shared Glossaries and References to Terms With DITA
Can I Have a Word: Managing Shared Glossaries and References to Terms With DITACan I Have a Word: Managing Shared Glossaries and References to Terms With DITA
Can I Have a Word: Managing Shared Glossaries and References to Terms With DITA
 
Using CSS Paging to Render DITA Documents
Using CSS Paging to Render DITA DocumentsUsing CSS Paging to Render DITA Documents
Using CSS Paging to Render DITA Documents
 
RELAX NG and DITA: An Almost Perfect Match
RELAX NG and DITA: An Almost Perfect MatchRELAX NG and DITA: An Almost Perfect Match
RELAX NG and DITA: An Almost Perfect Match
 
DITA Quick Start for Authors - Part I
DITA Quick Start for Authors - Part IDITA Quick Start for Authors - Part I
DITA Quick Start for Authors - Part I
 
ICANN 51: IDN Program Update
ICANN 51: IDN Program UpdateICANN 51: IDN Program Update
ICANN 51: IDN Program Update
 
Style Guides: Fashionable But Also Practical - TC Dojo, Single Sourcing
Style Guides: Fashionable But Also Practical - TC Dojo, Single SourcingStyle Guides: Fashionable But Also Practical - TC Dojo, Single Sourcing
Style Guides: Fashionable But Also Practical - TC Dojo, Single Sourcing
 
RoboCon 2018: How did we get here? Where do we go next?
RoboCon 2018: How did we get here? Where do we go next?RoboCon 2018: How did we get here? Where do we go next?
RoboCon 2018: How did we get here? Where do we go next?
 
Localization and DITA: What you Need to Know - LocWorld32
Localization and DITA: What you Need to Know - LocWorld32Localization and DITA: What you Need to Know - LocWorld32
Localization and DITA: What you Need to Know - LocWorld32
 
Python workshop
Python workshopPython workshop
Python workshop
 
Python workshop
Python workshopPython workshop
Python workshop
 
DSL's with Groovy
DSL's with GroovyDSL's with Groovy
DSL's with Groovy
 
Welcome to New Swift: Library Evolution & LSP Support
Welcome to New Swift: Library Evolution & LSP SupportWelcome to New Swift: Library Evolution & LSP Support
Welcome to New Swift: Library Evolution & LSP Support
 
DITA for Small Teams Workshop (Tekom 2017)
DITA for Small Teams Workshop (Tekom 2017)DITA for Small Teams Workshop (Tekom 2017)
DITA for Small Teams Workshop (Tekom 2017)
 
A Brief Look at DITA in Current Technical Communication Practices_SIGDOC 2017
A Brief Look at DITA in Current Technical Communication Practices_SIGDOC 2017A Brief Look at DITA in Current Technical Communication Practices_SIGDOC 2017
A Brief Look at DITA in Current Technical Communication Practices_SIGDOC 2017
 
Rosenblum Workflow Choices Introducing XML
Rosenblum Workflow Choices Introducing XMLRosenblum Workflow Choices Introducing XML
Rosenblum Workflow Choices Introducing XML
 
DITA Quick Start Webinar Series: Getting Started with the DITA Open Toolkit
DITA Quick Start Webinar Series: Getting Started with the DITA Open ToolkitDITA Quick Start Webinar Series: Getting Started with the DITA Open Toolkit
DITA Quick Start Webinar Series: Getting Started with the DITA Open Toolkit
 
Using Markdown and Lightweight DITA in a Collaborative Environment
Using Markdown and Lightweight DITA in a Collaborative EnvironmentUsing Markdown and Lightweight DITA in a Collaborative Environment
Using Markdown and Lightweight DITA in a Collaborative Environment
 
Engineering Your Product Information for Local Markets
Engineering Your Product Information for Local MarketsEngineering Your Product Information for Local Markets
Engineering Your Product Information for Local Markets
 
The Ring programming language version 1.7 book - Part 6 of 196
The Ring programming language version 1.7 book - Part 6 of 196The Ring programming language version 1.7 book - Part 6 of 196
The Ring programming language version 1.7 book - Part 6 of 196
 
Generator patterns
Generator patternsGenerator patterns
Generator patterns
 

Más de Contrext Solutions

Stupid DITA Tricks: After-The-Fact Specialization: Treating Aircraft Manuals ...
Stupid DITA Tricks:After-The-Fact Specialization: Treating Aircraft Manuals ...Stupid DITA Tricks:After-The-Fact Specialization: Treating Aircraft Manuals ...
Stupid DITA Tricks: After-The-Fact Specialization: Treating Aircraft Manuals ...Contrext Solutions
 
Loose Leaf Publishing Using Antenna House Formatter and CSS for Pagination
Loose Leaf Publishing Using Antenna House Formatter and CSS for PaginationLoose Leaf Publishing Using Antenna House Formatter and CSS for Pagination
Loose Leaf Publishing Using Antenna House Formatter and CSS for PaginationContrext Solutions
 
Definition of the DITA Glossary: Or How to Get Some Cool Glossary Tools for Free
Definition of the DITA Glossary: Or How to Get Some Cool Glossary Tools for FreeDefinition of the DITA Glossary: Or How to Get Some Cool Glossary Tools for Free
Definition of the DITA Glossary: Or How to Get Some Cool Glossary Tools for FreeContrext Solutions
 
Twisted XSL Tricks: Column Switching for FOP
Twisted XSL Tricks: Column Switching for FOPTwisted XSL Tricks: Column Switching for FOP
Twisted XSL Tricks: Column Switching for FOPContrext Solutions
 
Can I Have a Word: Managing Shared Glossaries and References to Terms With DITA
Can I Have a Word: Managing Shared Glossaries and References to Terms With DITACan I Have a Word: Managing Shared Glossaries and References to Terms With DITA
Can I Have a Word: Managing Shared Glossaries and References to Terms With DITAContrext Solutions
 
Ki, Qi, Key: The Way of DITA Harmony With Keys and Key References
Ki, Qi, Key: The Way of DITA Harmony With Keys and Key ReferencesKi, Qi, Key: The Way of DITA Harmony With Keys and Key References
Ki, Qi, Key: The Way of DITA Harmony With Keys and Key ReferencesContrext Solutions
 
Content Management on Zero Budget: DITA for Small Teams
Content Management on Zero Budget: DITA for Small TeamsContent Management on Zero Budget: DITA for Small Teams
Content Management on Zero Budget: DITA for Small TeamsContrext Solutions
 
XSLT Magic Tricks with DITA and FrameMaker
XSLT Magic Tricks with DITA and FrameMakerXSLT Magic Tricks with DITA and FrameMaker
XSLT Magic Tricks with DITA and FrameMakerContrext Solutions
 
FrameMaker and the DITA Open Toolkit
FrameMaker and the DITA Open ToolkitFrameMaker and the DITA Open Toolkit
FrameMaker and the DITA Open ToolkitContrext Solutions
 
DITA Reuse Challenges and Response
DITA Reuse Challenges and ResponseDITA Reuse Challenges and Response
DITA Reuse Challenges and ResponseContrext Solutions
 
Managing Multiple Open Toolkit Configurations Using git Lightning Talk
Managing Multiple Open Toolkit Configurations Using git Lightning TalkManaging Multiple Open Toolkit Configurations Using git Lightning Talk
Managing Multiple Open Toolkit Configurations Using git Lightning TalkContrext Solutions
 
DITA OT Day 2015 Lightning Talk On The DITA Community Project
DITA OT Day 2015 Lightning Talk On The DITA Community ProjectDITA OT Day 2015 Lightning Talk On The DITA Community Project
DITA OT Day 2015 Lightning Talk On The DITA Community ProjectContrext Solutions
 
They Worked Before, What Happened? Understanding DITA Cross-Book Links
They Worked Before, What Happened? Understanding DITA Cross-Book Links They Worked Before, What Happened? Understanding DITA Cross-Book Links
They Worked Before, What Happened? Understanding DITA Cross-Book Links Contrext Solutions
 
No Ki Magic: Managing Complex DITA Hyperdocuments
No Ki Magic: Managing Complex DITA HyperdocumentsNo Ki Magic: Managing Complex DITA Hyperdocuments
No Ki Magic: Managing Complex DITA HyperdocumentsContrext Solutions
 
Poster: Cross-Document Linking in DITA
Poster: Cross-Document Linking in DITAPoster: Cross-Document Linking in DITA
Poster: Cross-Document Linking in DITAContrext Solutions
 
Managing Deliverable-Specific Link Anchors: New Suggested Best Practice for Keys
Managing Deliverable-Specific Link Anchors: New Suggested Best Practice for KeysManaging Deliverable-Specific Link Anchors: New Suggested Best Practice for Keys
Managing Deliverable-Specific Link Anchors: New Suggested Best Practice for KeysContrext Solutions
 
What's New in DITA 1.3 (Tekom, Nov 2014)
What's New in DITA 1.3 (Tekom, Nov 2014)What's New in DITA 1.3 (Tekom, Nov 2014)
What's New in DITA 1.3 (Tekom, Nov 2014)Contrext Solutions
 
Taking Cross References to the Next Level: Reltables for Non-Topic Elements
Taking Cross References to the Next Level: Reltables for Non-Topic ElementsTaking Cross References to the Next Level: Reltables for Non-Topic Elements
Taking Cross References to the Next Level: Reltables for Non-Topic ElementsContrext Solutions
 

Más de Contrext Solutions (20)

Stupid DITA Tricks: After-The-Fact Specialization: Treating Aircraft Manuals ...
Stupid DITA Tricks:After-The-Fact Specialization: Treating Aircraft Manuals ...Stupid DITA Tricks:After-The-Fact Specialization: Treating Aircraft Manuals ...
Stupid DITA Tricks: After-The-Fact Specialization: Treating Aircraft Manuals ...
 
Loose Leaf Publishing Using Antenna House Formatter and CSS for Pagination
Loose Leaf Publishing Using Antenna House Formatter and CSS for PaginationLoose Leaf Publishing Using Antenna House Formatter and CSS for Pagination
Loose Leaf Publishing Using Antenna House Formatter and CSS for Pagination
 
Definition of the DITA Glossary: Or How to Get Some Cool Glossary Tools for Free
Definition of the DITA Glossary: Or How to Get Some Cool Glossary Tools for FreeDefinition of the DITA Glossary: Or How to Get Some Cool Glossary Tools for Free
Definition of the DITA Glossary: Or How to Get Some Cool Glossary Tools for Free
 
Twisted XSL Tricks: Column Switching for FOP
Twisted XSL Tricks: Column Switching for FOPTwisted XSL Tricks: Column Switching for FOP
Twisted XSL Tricks: Column Switching for FOP
 
Can I Have a Word: Managing Shared Glossaries and References to Terms With DITA
Can I Have a Word: Managing Shared Glossaries and References to Terms With DITACan I Have a Word: Managing Shared Glossaries and References to Terms With DITA
Can I Have a Word: Managing Shared Glossaries and References to Terms With DITA
 
Ki, Qi, Key: The Way of DITA Harmony With Keys and Key References
Ki, Qi, Key: The Way of DITA Harmony With Keys and Key ReferencesKi, Qi, Key: The Way of DITA Harmony With Keys and Key References
Ki, Qi, Key: The Way of DITA Harmony With Keys and Key References
 
Content Management on Zero Budget: DITA for Small Teams
Content Management on Zero Budget: DITA for Small TeamsContent Management on Zero Budget: DITA for Small Teams
Content Management on Zero Budget: DITA for Small Teams
 
XSLT Magic Tricks with DITA and FrameMaker
XSLT Magic Tricks with DITA and FrameMakerXSLT Magic Tricks with DITA and FrameMaker
XSLT Magic Tricks with DITA and FrameMaker
 
FrameMaker and the DITA Open Toolkit
FrameMaker and the DITA Open ToolkitFrameMaker and the DITA Open Toolkit
FrameMaker and the DITA Open Toolkit
 
DITA Reuse Challenges and Response
DITA Reuse Challenges and ResponseDITA Reuse Challenges and Response
DITA Reuse Challenges and Response
 
Managing Multiple Open Toolkit Configurations Using git Lightning Talk
Managing Multiple Open Toolkit Configurations Using git Lightning TalkManaging Multiple Open Toolkit Configurations Using git Lightning Talk
Managing Multiple Open Toolkit Configurations Using git Lightning Talk
 
DITA OT Day 2015 Lightning Talk On The DITA Community Project
DITA OT Day 2015 Lightning Talk On The DITA Community ProjectDITA OT Day 2015 Lightning Talk On The DITA Community Project
DITA OT Day 2015 Lightning Talk On The DITA Community Project
 
Why Is DITA So Hard?
Why Is DITA So Hard?Why Is DITA So Hard?
Why Is DITA So Hard?
 
They Worked Before, What Happened? Understanding DITA Cross-Book Links
They Worked Before, What Happened? Understanding DITA Cross-Book Links They Worked Before, What Happened? Understanding DITA Cross-Book Links
They Worked Before, What Happened? Understanding DITA Cross-Book Links
 
No Ki Magic: Managing Complex DITA Hyperdocuments
No Ki Magic: Managing Complex DITA HyperdocumentsNo Ki Magic: Managing Complex DITA Hyperdocuments
No Ki Magic: Managing Complex DITA Hyperdocuments
 
Poster: Cross-Document Linking in DITA
Poster: Cross-Document Linking in DITAPoster: Cross-Document Linking in DITA
Poster: Cross-Document Linking in DITA
 
DITA for Small Teams
DITA for Small TeamsDITA for Small Teams
DITA for Small Teams
 
Managing Deliverable-Specific Link Anchors: New Suggested Best Practice for Keys
Managing Deliverable-Specific Link Anchors: New Suggested Best Practice for KeysManaging Deliverable-Specific Link Anchors: New Suggested Best Practice for Keys
Managing Deliverable-Specific Link Anchors: New Suggested Best Practice for Keys
 
What's New in DITA 1.3 (Tekom, Nov 2014)
What's New in DITA 1.3 (Tekom, Nov 2014)What's New in DITA 1.3 (Tekom, Nov 2014)
What's New in DITA 1.3 (Tekom, Nov 2014)
 
Taking Cross References to the Next Level: Reltables for Non-Topic Elements
Taking Cross References to the Next Level: Reltables for Non-Topic ElementsTaking Cross References to the Next Level: Reltables for Non-Topic Elements
Taking Cross References to the Next Level: Reltables for Non-Topic Elements
 

Último

Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 

Último (20)

Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 

Locale-Aware Sorting and Text Handling in the Open Toolkit

  • 1. Locale-Aware Sorting and Text Handling in the Open Toolkit Eliot Kimber Contrext DITA OT Day 2017 大 话 功 夫 美丽的话 All translations via Google Translate—may be total nonsense
  • 2. About the Author 关于作者 • Independent consultant focusing on DITA analysis, design, and implementation • Doing SGML and XML for cough 30 years cough • Founding member of the DITA Technical Committee • Founding member of the XML Working Group • Co-editor of HyTime standard (ISO/IEC 10744) • Primary developer and founder of the DITA for Publishers project • Author of DITA for Practitioners, Vol 1 (XML Press) DITA OT Day 201710/29/17 2
  • 3. Requirements 要求 • Sort and group Simplified Chinese (and other languages) • Do rough layout of lines – Requires finding work boundaries – Requires finding line-break opportunities • Target applications – Sort and group glossaries, indexes, etc. – Fixed-layout EPUBs • Free, open-source solution 10/29/17 DITA OT Day 2017 3
  • 4. Challenge: Simplified Chinese 挑战: 简体中文 • Simplified Chinese collates using pinyin transliteration of words • All other languages collate based on individual characters – Alphabetic and syllabic languages – Traditional Chinese: Radical and stroke count – Can be done with static per-character configuration 10/29/17 DITA OT Day 2017 4
  • 5. Simplified Chinese Requires a Dictionary 简体中文需要一个词典 • Pinyin transliteration depends on whole word – Same character may have different transliteration in different words (多音字) – As if “c” was pronounced “see” by itself but “tee” in “cat” – E.g.: 行 – xíng or hāng • Requires a dictionary with pinyin transliteration – Requires ability to find words • Need word frequency as secondary sort parameter or to disambiguate words – Requires frequency dictionary 10/29/17 DITA OT Day 2017 5
  • 6. Solution 解 • Use CC-CEDICT open-source dictionary – ~155,000.00 entries • ICU4J for word and line breaking • Java 2D for text length approximation • ICU4J for grouping and sorting of other languages • Integrated with Saxon’s collator Java API • XSLT functions for use from XSLT • Configured as DITA OT plugin for use in OT • Java and XSLT library not OT-specific 10/29/17 DITA OT Day 2017 6
  • 7. Result: DITA Community i18n Plugin 结果:DITA社区i18n插件 • Java: Custom collator for Simplified Chinese – Implements Saxon RuleBasedCollator – Can be used with Saxon 9.1+ or any Java code – Locale-aware text analysis and metrics • XSLT function library – Grouping and sorting keys – Word and line breaking – Rendered text length given font details • General grouping and sorting configuration file mechanism 10/29/17 DITA OT Day 2017 7
  • 8. Under Active Development 积极发展 • More accurate pinyin determination • Use of word frequency data as secondary sort key 10/29/17 DITA OT Day 2017 8
  • 9. Not Yet Implemented 尚未落实 • Grouping for glossaries – And other things you might want to group • Extension/replacement for built-in OT index grouping and sorting 10/29/17 DITA OT Day 2017 9
  • 10. Demo 演示 If time permits DITA OT Day 201710/29/17 10
  • 11. Questions? 问 题 DITA OT Day 201710/29/17 11
  • 12. Resources 资 源 • Me: ekimber@contrext.com • DITA Community i18n project: https://github.com/dita-community/org.dita- community.i18n • http://blog.tutorming.com/mandarin-chinese- learning-tips/chinese-characters-with-various- pronunciations DITA OT Day 201710/29/17 12