The session provides a presentation and status update on the IDN Variant TLD Program. Update includes progress made on the implementation of the IDN Root LGR Procedure, status of the Maximal Starting Repertoire, community and Generation Panels' updates.
3. TextText
#ICANN50
Agenda
• Program Update – 15 min
• MSR - 15 min
• Community updates:
o Arabic Generation Panel – 15 min
o CJK Coordination Report – 15 min
o Neo-Brahmi Prospective Generation
Panel - 15 min
• Q&A - 15 min
5. Text
#ICANN49
Phase 1: 2011
o Script Case Studies Conducted: Arabic, Chinese, Cyrillic, Devanagari, Greek, Latin
IDN Variant Program: A Brief Overview
Phase 2: 2011 – 2012
o Integrated Issues Report development and publication
Phase 3: 2012 – 2013
o Creation of Procedure to develop and maintain the Label Generation Rules for the Root
Zone in Respect of IDNA labels (LGR Procedure)
o Development of “Study on Examining the User Experience Implications of Active Variant
TLDs” and XML specification for representing Label Generation Rules
Phase 4: 2013 – 2015 (In Progress)
o Implementation of LGR procedure
o Processes development for incorporating the LGR
o Ongoing work on XML specification
o Implementation of LGR procedure
6. Text
#ICANN50
LGR Procedure Overview
Generation
Panel
Generation
Panel
Integration
Panel
Unified
LGR for
the
Root
Zone
One Generation
Panel per script or
writing system
Propose
Reject / Accept Reject / Accept
Integrate
Generation Panels
• Generate proposals for script
specific LGRs, based on community
expertise and requirements
Integration Panel
• Integrates them into common Root
Zone LGR while minimizing the risk
to Root Zone as shared resource
Label Generation Rules (LGR)
• Which labels are permissible
• Which variant labels exist
• Which variant labels may be allocated
TO BE FORMED BY SCRIPT COMMUNITIES
Arabic GP seated
Chinese GP in-formation
Integration
Panel formed
MSR published
* URLs available on Slide 10 (Resources) and last slide
7. TextText
#ICANN50
LGR Project Status – Milestones
• Community Work
o Arabic Generation Panel seated
o Chinese Generation Panel in formation
o Japanese, Korean and Neo-Brahmi Generation Panels being
organized
o Individual expression of interests for other scripts
• ICANN/Integration Panel Work
o MSR-1 released for Public Comments
o MSR-1 published – 20 June
o Ongoing outreach efforts
8. TextText
#ICANN50
MSR-1 Public Comment Process
• Public Comment Report released on 20 June 2014:
https://www.icann.org/en/system/files/files/report-comments-msr-20jun14-en.pdf
• Inputs received:
o General:
§ Community support for the conservative approach
o Process:
§ Need for addition in MSR and LGR (scripts already included)
§ Need for addition in MSR and LGR (scripts not included)
§ Need for review by relevant script community
§ Need for outreach to additional script communities
o Code points:
§ Inclusion of specific code points
§ Considering inclusion of languages spoken by smaller communities
9. TextText
#ICANN50
Cyclical, additive nature of MSR & LGR
• Motivation
o Portions of MSR are not reviewed by relevant script community
o At this time, there is not sufficient data to decide on a code
point
o The status of a code point may change over time
• Recommendation for further feedback
o Cyclical releases of MSR and LGR so communities and ICANN
can organize their work – based on community need and
practical considerations
o Additive releases of MSR and LGR, if new evidence for a code
point and no impact on security and stability of existing system
10. TextText
#ICANN50
Outreach Efforts
• Outreach efforts focused on organizing GPs
o Quick Guide Kit for Generation Panels
o IDN interviews and videos
o Email to ICANN executive mailing list to reach out to their contacts to
create new GPs
• Keeping community informed
o Targeted events such as IGF, regional IGFs and ICANN meetings
o Web announcements and updates on project Community Wiki
o Brochures and collateral materials
• Facilitate GP-IP interaction
11. TextText
#ICANN50
Moving Forward
• Suggested Plan
o MSR-2 to cover additional scripts by Q1 2015
o LGR-1 by Q3 2015 (anticipated; based on community proposals)
• Call to Action
o Current panels finishing proposals in early 2015 to submit for LGR-1
o New panels to provide input for future releases of LGR
12. Text
#ICANN50
LGR Procedure Depends on Community Work
• Generation Panels and LGR proposals
are REQUIRED for IDN variants to be
considered for delegation
• Get involved:
o Form a generation panel
o Volunteer to join a generation panel
o Take part in public review of the MSR, LGR proposals,
integrated LGR, etc.
o Disseminate information to interested individuals
communities
Arabic
Bengali
Chinese
Cyrillic
Devanagari
Georgian
Greek
Gujarati
Gurmukhi
Hebrew
Japanese
Korean
Latin
Sinhala
Tamil
Telugu
Thai
13. TextText
#ICANN50
Want To Know More?
Join us for the LGR Workshop!
• IDN Root Zone LGR Generation Panels Workshop
Wednesday, 25 June 2014 — 13:00–15:00 BST
Balmoral Room
14. TextText
#ICANN50
Resources
• Toolkit for ‘How to form a Generation Panel’
o Quick Guide Kit for Generation Panels
• Project mailing lists:
o LGR@icann.org: Communicate with LGR community members and the
Integration Panel on matters related to LGR work
o IntegrationPanel@icann.org: Contact directly the Integration Panel members
on all matters related to LGR work
o ArabicGP@icann.org, ChineseGP@icann.org, CyrillicGP@icann.org,
KoreanGP@icann.org, NeoBrahmiGP@icann.org : script community
dedicated mailing lists
o idntlds@icann.org: Contact ICANN to submit Generation Panel proposals,
individual statement of interests, work reports, updates, etc.
o Discuss issues related to the IDN Variant TLDs Program by subscribing to
vip@icann.org here: https://mm.icann.org/mailman/listinfo/vip
16. TextText
#ICANN50
MSR-1 Available
• MSR-1 released on 20 June 2014:
https://www.icann.org/news/announcement-2-2014-06-20-en
• Work can now proceed for 22 scripts to create LGRs for the
Root Zone
• Generation Panels will:
o pick repertoire from within the MSR
o decide whether code point variants exist
§ decide whether these should lead to allocatable or blocked variant
labels
o generate an LGR proposal for public comment and review (and
integration) by Integration Panel
17. TextText
#ICANN50
MSR-1 Content in Numbers
• 22 scripts
o Arabic, Bengali, Cyrillic, Devanagari, Georgian, Greek, Gujarati,
Gurmukhi, Han, Hangul, Hebrew, Hiragana, Kannada, Katakana,
Lao, Latin, Malayalam, Oriya, Sinhala, Tamil, Telugu and Thai
• ‘Common’ and ‘Inherited’ (shared)
• 32,790 code points
o From 97,973 PVALID/CONTEXT code points defined in Unicode 6.3
o 11,172 Hangul syllables and 19,850 Han ideographs
18. TextText
#ICANN50
MSR-1 Public Comment Process
• Analysis of inputs received:
o Requests for revision in code point analysis
o Request for attention to languages with smaller communities
• Response:
o 7 additional code points
o Updated MSR Overview and Rationale document to address
inputs received, including explicit use of languages on EGIDS
to assess ‘established vitality’ of scripts in use
• Public Comment Report:
https://www.icann.org/en/system/files/files/report-comments-msr-20jun14-en.pdf
19. TextText
#ICANN50
Expanded Graded Intergenerational Disruption
Scale (EGIDS)
• EGIDS
o Not based on population size, but on “established vitality”
• Used as proxy for “effective demand” for the writing system
o Not a perfect correlation, some writing systems not stable
• For the MSR the IP used the cut-off between Level 4 and Level 5
• 4: Educational
o Language in vigorous use, with standardization and literature being
sustained through a widespread system of institutionally supported
education
• 5: Developing
o Language in vigorous use, with literature in a standardized form being
used by some though this is not yet widespread or sustainable
https://www.ethnologue.com/about/language-status
20. TextText
#ICANN50
MSR - Next Steps
• MSR-1 is only the start
• MSR-2 will complete the repertoire
o Adds some or all of the deferred scripts:
§ Armenian, Ethiopic, Khmer, Myanmar, Thaana and Tibetan
o Possible further extensions where warranted, including existing script
repertoire
• In the meantime, MSR-1 is the basis for LGR-1
21. Text
#ICANN50
MSR and LGR Timeline
GP1
GP2
GP
3
GP
4
GP5
LGR-1
LGR-2
MSR-2
MSR-1
GP4’s script is not included
in MSR-1, but is otherwise
eligible. It can engage in
early dialogue with ICANN
and IP, and may pre-
emptively begin work on its
LGR proposal.
GP3 requires a code point
excluded from MSR-1, and
p o s s e s s c o n v i n c i n g
evidence that the code point
is eligible. It would pre-
emptively work on its LGR
proposal, and submits it
after MSR-2 is released
(incorporating the requested
code point)
22. TextText
#ICANN50
IP-GP Communications
• Integration Panel available to help Generation Panels to
make progress and to ensure successful submissions of
script LGRs
• When scripts are related, coordination between GP is
needed, so that consistency between LGRs is agreed
between GPs before submitting the LGR to IP
• Use integrationpanel@icann.org to reach the Integration
Panel
o Mailing list is archived and public
24. TextText
#ICANN50
XML Status and Next Steps
• Finalize specification and move tool(s) to production state
• Internet draft currently under review by the community
o http://tools.ietf.org/html/draft-davies-idntables-07
o Send feedback or discuss on public mailing list: vip@icann.org
• Use the specification as the basis for Root LGR work
• Creating LGRs or Converting to XML Format, generally a
straightforward process
• The internet draft contains detailed examples
o Including how to convert RFC 3743-style IDN tables
• MSR-1 can be used as a template for simple LGR with no
variants
• ICANN working on setting up a process to support community in
syntactically correct submissions
26. Task
Force
on
Arabic
Script
IDNs:
Overview
and
Progress
@
ICANN
London
Mee-ng
(June
‘14)
Task
Force
on
Arabic
Script
IDNs
(TF-‐AIDN)
Middle
East
Strategy
Working
Group
(MESWG)
I-‐aidn@meswg.org
27. Community
Driven
Way
forward:
Task
Force
on
Arabic
Script
IDNs
• Crea-on
and
oversight
by
community
based
Middle
East
Strategy
Working
Group
(MESWG;
hRps://community.icann.org/display/MES/MESWG+Members
)
• TF-‐AIDN
Objec-ves:
a
holis-c
approach
– Arabic
Script
Label
Genera-on
Ruleset
(LGR)
for
the
Root
Zone
– Second
level
LGRs
for
the
Arabic
script
– Arabic
script
Interna-onalized
Registra-on
Data
– Universal
acceptability
of
Arabic
script
IDNs
– Technical
challenges
around
registra-on
of
Arabic
script
IDNs
– Opera-onal
so[ware
for
registry
and
registrar
opera-ons
– DNS
security
maRers
specifically
related
to
Arabic
script
IDNs
– Technical
training
material
around
Arabic
script
IDNs
2
Task
Force
on
Arabic
Script
IDNs
28. Membership
• Currently
26
members
–
applica-ons
s-ll
being
received
• From
15
countries
–
Australia,
Egypt,
England,
Ethiopia,
Germany,
Iran,
Jordan,
Lebanon,
Malaysia,
Morocco,
Pakistan,
Pales-ne,
Saudi
Arabia,
Sudan,
and
UAE
• Speaking
than
nine
languages
–
Arabic,
Malay,
Saraiki,
Sindhi,
Pashto,
Persian,
Punjabi,
Torwali,
Urdu,
with
experFse
in
use
of
Arabic
script
from
East
Asia,
South
Asia,
Middle
East,
North
Africa
and
Africa
• Coming
from
diverse
disciplines
–
academia
(linguis-cs
and
technical),
registries,
registrars,
na-onal
and
regional
policy
bodies,
community
based
organiza-ons,
technical
community
3
Task
Force
on
Arabic
Script
IDNs
29. Task
Force
on
Arabic
Script
IDNs
• Membership
open,
community
based
• Details
and
interests
of
members
posted
by
MESWG
• Discussions
publicly
archived
• Details
at
hRp://lists.meswg.org/mailman/lis-nfo/I-‐aidn
• Background
and
Introduc-on
to
TF-‐AIDN
– hRps://community.icann.org/display/MES/Task+Force+on+Arabic+Script
+IDNs
• Workspace,
news
and
document
archive
– hRps://community.icann.org/display/MES/TF-‐AIDN+Work+Space
• Email
Archive
– hRp://lists.meswg.org/pipermail/I-‐aidn/
4
Task
Force
on
Arabic
Script
IDNs
31. SUMMARY
OF
THE
CODE
POINTS
Color
Descrip-on
No.
Of
codes
DISALLOWED
by
MSR
48
ALLOWED
by
MSR
227
Not
Allowed
by
IDN2008
64
Total
339
SubmiHed
to
MSR
172
Discussion
on
codes
to
be
handed
over
to
LGR
Work
is
under
process
32. IDN
Variants
Needs
and
Challenges
Security
and
Stability
Needs
ﭘﺎکﭘﺎ ﺳﺘﺎنك
ﺳﺘﺎن
U+0643
U+06A9
xn-‐-‐mgbai9a5eva00b
xn-‐-‐mgbai9azgqp6j
• 120+
cases
of
visually
same
or
similar
Arabic
script
characters
iden-fied
by
case
study
team
– Variants
must
not
be
allocated
independently
– Variants
may
need
ac-va-on
to
allow
user
access
(w/
different
KB)
• 16
IDN
ccTLD
applica-ons
with
4
applica-ons
with
variants
Security
and
Stability
Challenges
• Consistency
and
innumerability
– Consistent
across
and
within
TLDs
– Minimal
ac-va-on
for
manageability
• Management
tools
– Registra-on
– Configura-on
and
Maintenance
– Security
and
Monitoring
• Usability
in
applica-ons
– Browsing,
emailing,
etc.
– Searching,
privacy,
etc.
7
Task
Force
on
Arabic
Script
IDNs
33. Progress
Work
Accomplished
– Arabic
Script
Genera-on
Panel
– Principles
for
Inclusion,
Exclusion,
and
Deferral
of
Arabic
Script
Variants
– MSR
Analysis
and
Feedback
– Principles
on
Variants
– Code
Points
for
LGR
Outreach
to
the
Community
– Launch
at
the
Arab
IGF
Mee-ng
in
Algiers
– Presenta-on
during
the
IGF
in
Bali
– Outreach
during
the
ME
DNS
Forum
– Presenta-on
to
the
community
at
ICANN
Singapore
– Presenta-on
to
the
community
at
the
APTLD
Mee-ng
Task
Force
on
Arabic
Script
IDNs
8
34. Current
Work
and
Next
Steps
• XML
Manual
[June
30,
2014]
• Finalize
the
discussions
on
Code
Points
[August
28,
2014]
• Finalize
the
discussions
on
Variants
[September
30,
2014]
• Whole
Label
Rules
–
Aug
–
Oct
14
– Document
principles
for
whole
label
variants
– Define
whole
label
variants
– Release
for
Public
Comments
• Finaliza-on
–
Nov
–Dec
14
– Finalize
LGR
for
Arabic
script
– Submit
to
ICANN/IP
– Release
for
Public
Comments
9
Task
Force
on
Arabic
Script
IDNs
40. Constraints for CJK LGR
4
Independent Tasks
Each CJK Panel creates an LGR
Each LGR includes a repertoire
and variants
Define labels permission
Define variants labels
Assign dispositions
•Allocatable
•Block
Coordination Tasks
If an LGR includes Han
characters:
The variant *mappings*
must agree for all the
panels
The variant *types* may be
different
The repertoires may be
different
*Presented by Lee Han Chuan & IP, Shanghai 2014 May 29
42. High Level Conflict Strategies
6
ID Strategy Pros Cons Rank
1 Adopt X
Abandon Rcjk
Permit X No label rule
2 Adopt X
Intersection ∩ (Rcjk)
Permit X
Permit ∩(variants/disp)
Rules changed
3 Adopt X
Union ∪(Rcjk)
Permit X
Permit ∪(variants/disp)
Rules changed
4 Abandon X and Rcjk No conflict Label not available
5 Adopt rules based on
frequency of use
Fair & scientific
approach
Rules changed; fairness
doesn’t mean appropriate
CJK overlap
C: rule Rc
J : rule Rj
K: rule Rk
44. CJK Integration Methodology
Divide & Conquer (D&C)
Unified CJK Rules
Variant
Dispositions
Minimal Viable
Solution
CJK Rules
Root Zone Admin
Strategic Direction
Plan and Define
CJK Overlap
Resources
JK Overlap
CJ Usage Pattern
CJ Overlap
CK Usage Pattern
CK Overlap
Services
LGR
Constrains
Evaluation
Method
Diversified CJK DemandsRequires
C Demands
J Demands
8
Requires
Split
Merge
45. Splitting Non-overlapping Code Points From
Repertories
9
C/J
Overlap:
6181
C-Han : 19520 (CNNIC/TWNIC)
J-Han : 6356 (JPRS) K-Han : 0 (KRNIC)
Develop Conflict Strategy No conflict
Rc
Rk
Rj
13339
175
1
unified code points
13339
175
13514
+
CJK Han-overlap in IANA IDN Repository
Problem Domain (Unsolved Overlap) : 6181
Rc
Rj
Rk
Chinese LGR
Japanese LGR
Korean LGR
46. Engineering Design
10
2
TC : Apple News
SC : Sina News
JP : Mainichi News
Computation for Word Usage and Frequency
C/J overlap
code points
Matching
usage
frequency of
use
Split unused code points Split code points of
low frequency of use
Sample size is statistical significant
47. Splitting Unused Code Points from The Overlap
11
J only : 203
C only : 1927Rc
Rj
total unused : 2739
3
C / J Overlap Data Set : 6181
unified code points
2739
203
1927
4869
+
C / J usage
overlap : 1312
total used : 3442
Problem Domain (Unsolved Overlap) : 1312
52. 16
0.0222
0.0144
0.0112
0.0056 0.0056
0.0022
0.0012 0.0012 0.0012 0.0012
0
0.005
0.01
0.015
0.02
0.025
8FCE 7D20 675F 541B 846C 79E9 82BD 96C0 5857 5353
C-Freq
J-Freq
FrequencyofUse% Chinese Frequency of Use = Japanese Frequency of Use
Generated Data Set : 10
53. Frequency of Use Reassembly
17
unified code points
363
939
1302
+
Problem Domain (Unsolved Overlap) : 10
C / J Usage Overlap Data Set : 1312
Freq C > J : 939
Freq J > C : 363
J = C
10
Rc
Rj
54. Data Processing & Computation Recap
18
>20K Han Code Points
6181 CJK Overlap
1312 Usage Overlap
Splitting Non-overlapping
Frequency of Use
Computation
Filtering Process
Filtering Process
LOGICDesign
Splitting Unused
Methodology
Review
CJK
Coordination
Re-Sampling &
Computation
Statistical
Justification
10 Code Points
Problem domain was effectively reduced
56. Re-consider Language Tag
20
K
tag
J
tag
TLD
registries
IANA/Verisign
provisioning
root server
operators
publication
Internet query
Policy
C
tag
Language tag support
•RFC 2860 : The name space of language tags is administered by IANA
•ISO Standard 639 :
•when a language has both an IANA-registered tag and a tag
derived from an ISO registered code, one MUST use the ISO tag.
•Maintenance Agency : International Information Centre for
Terminology (Austria)
Sources of Language Tag
distribution
masters
root
servers
DNS
resolvers
60. What is Brahmi?
• An ancient script
• Most of the modern scripts in Indian subcontinent
have been derived from Brahmi
• Geographically the scripts being used in Central• Geographically the scripts being used in Central
Asia, South Asia and South-East Asia
• These scripts are used by multiple language families:
Largely by Indo-Aryan and Dravidian
61. What Neo-Brahmi?
• Of all the scripts derived from “Brahmi”, not all are in
modern usage
• Approach is in consonance with the conservatism
principle of the LGR procedureprinciple of the LGR procedure
62. Neo-Brahmi Generation Panel
• Currently the group is of 10 members
• Mixed bag expertise like linguistic, Unicode
• Need more members to cover the diversity within the
group
Will try to cover possibly all the major scripts/languages of
4
• Will try to cover possibly all the major scripts/languages of
Brahmi family.
• The group is currently working on gaining more participation
within and outside India.
• Interested individuals can send their expression of interest to
neobrahmiGP@icann.org and idntlds@icann.org
63. Progress so far…
• Reviewed and commented on Maximal Starting Repertoire for
the Root
• A workshop is planned in AprIGF on “Bringing diverse linguistic
communities together for a unified IDN ruleset” for reaching
out to the community for the wider participation in the panel
5
out to the community for the wider participation in the panel
• Working on the Neo-Brahmi Generation panel proposal – May
submit to ICANN by end of August/early September
66. TextText
#ICANN50
ICANN IDN Team: Thank You
USEFUL LINKS:
• The LGR Procedure:
http://www.icann.org/en/resources/idn/variant-tlds/lgr-procedure-20mar13-en.pdf
• MSR-1 Public Comment: https://www.icann.org/public-comments/msr-2014-03-03-en
• MSR-1 released: https://www.icann.org/news/announcement-2-2014-06-20-en
• V07 Internet Draft for LGR Rules Toolset Project Published:
http://tools.ietf.org/html/draft-davies-idntables-07
• Call for Generation Panels to Develop Root Zone Label Generation Rules:
http://www.icann.org/en/news/announcements/announcement-11jul13-en.htm
• Setting up and running a Generation Panel:
https://community.icann.org/display/croscomlgrprocedure/Generation+Panels
• Community Wiki LGR Project website:
https://community.icann.org/display/croscomlgrprocedure/Root+Zone+LGR+Project
• For more info on the IDN Variant related pages, please visit:
https://www.icann.org/resources/pages/variant-tlds-2012-05-08-en
• To submit expressions of interest, or if you have additional questions, please contact
ICANN at: idntlds@icann.org