1. Lingoport, Inc.
3985 Wonderland Hill Ave.
Boulder, Colorado
USA 80304
+1 303 444 8020
www.lingoport.com
Successful I18n Project
Planning using Static Analysis
Olivier Libouban Adam Asnes
G11n Lead Grand Poisson
Copyright: March 2011
Please do not reproduce without authorized permission
3. Agenda
• Business Case
• I18n issues
• Static Analysis Background
• Requirements Gathering
• Static Analysis Detail
• Project Plan Example
• Agile planning
• Continuous Integration for i18n
4. Engineering for Locale Support
• Globalization (g11n) has two components :
– Internationalization (i18n) : software engineering to
enable localization
– Localization (L10n): culture specific resources
(translation, etc.)
6. I18n Needs – Biz vs. Tech
Our Software must be in
Japanese, French,
German, Chinese, and
Engineering thinks about…
Spanish by November
1. Multi-tiered web application?
2. Complex Interface?
3. Database components?
4. Embedded Strings?
5. Locale aware application?
6. Can it manage multiple data formats?
7. I18n testing plan?
8. Tactics to get it done
7. I18n is Business Driven
• Global initiatives
– Expanding opportunities, New customers
• Competitive pressure
• Lost time to market
• Iterative code fixing, problems keep slipping
through
• Development costs in the hundreds of
thousands to millions of dollars
8. You Need a Plan – Scope 1st, design later
• Project becomes real with $$$
• CFO thinking in terms of ROI
– Deal Based
• Revenue – Costs = Profit
– Strategic
• Revenue over X years – Costs +
effect on equity – risk
• Leverage global investment of
organization
– Cost of Time to Market
• If you‟re late or lousy, that has
significant opportunity cost
9. Engineering:
Localization is a Downstream Concern
• “Somebody else‟s problem” in the world of many
developers
• Creates an opportunity to educate and shepherd
teams through globalization
10. Is It Internationalized?
• Typically underestimate i18n requirements
• Most don‟t know the answer
• Agile or other feature and release requirements
often overrun less formally measured i18n
requirements
• There is a Management Value in being able to
confirm global readiness
11. Example: Hard-Coded English Text
1 million lines of source code
Found:
20,000 Embedded Strings which cannot be efficiently translated
String orderStatus = “Your order has been
processed. A confirmation e-mail will be
sent to you shortly.”;
12. Character Sets/Encodings
• Character set (e.g. Unicode)
– A set of characters used to support a given language or series of
languages
• Character encoding (e.g. UTF-16, UTF-8)
– A set of code points that defines numeric values for each
character within a character set (coded character set)
17. New Internationalization Project!
• What to do?
– Large amount of code
– Change in requirements
– Change in architecture
– Change in development practices
– Change in testing requirements
18. Practical Challenges
• Sift through hundreds of thousands or millions of
lines of code
• Managing fixing complex problems
• Creating a product that looks, feels and behaves
natively to its worldwide users
• Source code must be adapted to seamlessly
adapt to any language, streamlining support and
updates
19. Code Review
• What to Identify
– Embedded strings
– Locale-Sensitive methods/functions/classes
– Image references
– Unsafe programming constructs (ex: regular expressions
needing US Alphabetical Order, Pointer arithmetic and more)
20. Code Analysis
• How to Identify Issues
– “Brute force”
• Engineers search for and resolve known issues
• Count display pages
• Pseudo-localization
• Scripts and page by page analysis
– Globalyzer-assisted review, static analysis
• An I18n code analysis tool is employed to examine source
code for a large range of potential and known issues
• Issues can be identified and resolved in a more systematic
fashion
21. Traditional Approach - repeat, and repeat, and repeat, and repeat
Localize and see what
you‟re missing
GREP, overwhelm
Test, Pseudo-Localize
developers
View pages. Pour
Externalize and refactor through code for strings,
one by one methods, etc.
22. Globalyzer Server and Clients
Static Analysis on the Source Code
Server
Client Command Line
Globalyzer is methodology agnostic. Project Managers may
use it in a „traditional‟ approach or Agile approach.
23. Globalyzer Principles - Customization
• Globalyzer Server manages Rule
Sets Configuration
– Globalyzer Rule Sets are used to
identify i18n issues in the code base
– Rules embody the i18n issue
detection logic
– One rule set targets one
programming language (& variant)
– Default rule sets are based on
research and years of experience
– Rules must be tailored to a specific
project
– Rules can be shared amongst team
members
24. Globalyzer Principles – Desktop Analysis
• Globalyzer desktop client:
– Scan source code using
Globalyzer Rule Sets
– Detect and report i18n potential
issues
– Manage i18n issues
– Assist Fixing the code to become
i18n compliant
25. Globalyzer Principles - Automation
• Globalyzer Command Line
– For integration in the overall software process to run at given
frequencies
– Generate reports once a setup has been established
– Different strategies
• Segment the code base into small scan projects that
reflect the i18n effort
• Focus on i18n scope
27. Merging Requirements and
• Architectural Changes • Code Analysis
What‟s not in the code What‟s in the code
– Locale support – Strings
– Changes to how data – Refactoring Locale-
is passed around limiting
methods/functions
– Discuss and Analyze
technical requirements – Find and count issues
28. I18n Architectural Challenge – what’s not in the code
Marketing Requirements
Locale behavior
Database Application Code
Character e.g. Java, C++, VB U/I
encoding
support e.g. JSP,
ASP, ASPX
3rd Party Products
Business Logic
Platforms, Browser Support Requirements
31. Release Path
• Internationalization, • Feature Release
1st Time – 3 week sprint?
– Most of U/I – Focus on code subset
– Breaks the DB – Concentrated testing
– Data I/O • Static analysis with
Globalyzer
– Test entire product
Code branch, merge,
testing strategy
32. Factors to Plan On
• Programming languages
• How many tiers, what do they do
• Database support
• Locale Requirements
• 3rd Party Products – support for Unicode?
• Size of Application – Lines of Code
• Amount of Embedded Strings to be Externalized
• Estimate of concatenation
• DB refactoring
• Methods/Functions/Classes replacement
33. Tiers and Technologies
• Java
1 • C#
• JavaScript
2 • VB
• C++
3 • Older languages: e.g. RPG
Time and effort increase
34. Other Issues
• Stability of the build
• Quality of the code
– History
• Focus of the developers
• Source code management approach
• New concurrent development introducing new
i18n problems
35. Questions & Answers
Adam Asnes Resources
adam@lingoport.com http://www.lingoport.com
Olivier Libouban Globalyzer
http://www.globalyzer.com
olivier@lingoport.com
Blog
http://i18nblog.com
37. Why go through requirements?
• I18n work is software engineering
• To determine the scope of the i18n work, the
i18n cannot simply look at the code and come
up with an i18n project
• Scope also leads to planning, cost, resources
• How to describe i18n requirements?
38. Focus on one requirement: Locale
• One product instance per locale?
• Multi-locale support
• Locale detection?
• User account support?
39. Ex: WebSphere Portal Locale
Determination
– User logged in: display user‟s preferred language
– No preferred user language: look for user‟s browser
language
• If supports of that language, displays in that language.
• If browser has more than one language defined, uss the first
language in the list to display the content.
– If no browser language can be found, for example if the
browser used does not send a language, the portal
resorts to its own default language.
– If the user has a portlet that does not support the
language that was determined by the previous steps,
that portlet is shown in its own default language.
42. More of the typical i18n requirements
• Target date(s)
• System requirements
• Existing & potential use cases for UI text entry,
• Text display
• Text processing
• Collation
• Handling of locale-sensitive data (dates,
numbers, currencies, etc.).
• Client Installer considerations
44. Conceptual illustrative architecture
Specific development and integration
CODE
UI
Business
Persistance
RDBMS LDAP CMS
Workflow
Web Services Rules Engine JMS 3rd Parties
Engine
April 19, 2011 – p 45
45. Specific i18n software engineering focus
Specific development and integration
CODE
UI
• UI : html, server side, JavaScript,
Business
Persistance
RDBMS LDAP CMS
input forms, css, content
presentation, etc. Web Services Rules Engine
Workflow
Engine
JMS 3rd Parties
• Business logic, searches,
comparisons, data exchange with
external systems
• Persistence : exchanges with
RDMBS, Content Management,
LDAP, file based persistence
(xml, etc.)
April 19, 2011 – p 46
46. Specific development i18n issues
Specific development and integration
CODE
• String externalization (outside of
UI
Business
Persistance
code) and i18n resource bundles RDBMS LDAP CMS
• Locale sensitive methods : Web Services Rules Engine
Workflow
Engine
JMS 3rd Parties
searching, retrieving, sorting, date
and time, string operations,
character operations, etc.
• Code resources (images, etc.)
• Overall programming language
specifics
April 19, 2011 – p 47
47. Data stores i18n issues
Specific development and integration
•
CODE
PL/SQL UI
Business
Persistance
• Encoding RDBMS LDAP CMS
• Locale files (xml, xls, csv, etc) Web Services Rules Engine
Workflow
Engine
JMS 3rd Parties
• Database specific issues, date/time,
conversion, sorting, soundex, etc.
• Storing and retrieving local data in local
language (vs. a “generic” schema)
• User entered data
• Columns requiring translation
• Attributes, user names, postal
addresses, etc
• Database design
April 19, 2011 – p 48
48. Content Management i18n issues
Specific development and integration
CODE
UI
Business
• Accessing the proper locale
Persistance
RDBMS LDAP CMS
Workflow
Web Services Rules Engine JMS 3rd Parties
Engine
• Encoding of content
April 19, 2011 – p 49
49. External system i18n issues
Specific development and integration
CODE
UI
Business
• Modality of data exchange /
Persistance
RDBMS LDAP CMS
data loss Web Services Rules Engine Workflow
Engine
JMS 3rd Parties
• Accessing the proper locale
• Encoding/persistence of
content on external system
April 19, 2011 – p 50
50. I18n Engineering Considerations
• Locale Handling • Honorific titles
• Character encoding • Telephone formats
• Strings • Postal formats
–
–
External, Grammar, Segments, Plurals, Wrapping
String Handling (char *, etc.)
• Region-specific functions
– Tabs, spaces, delimiters, etc. • Shipping conditions
• Resource management – • Numerical formats
centralized, normalized, re-usable • Page layout, LTR, RTL
• Dates - Calendar • Fonts and attributes
• Times • Icons, colors
• Sorting & searching • Reporting, workflow
• Currency • Database support
• Transaction process • Multi-byte enabling
• Character set conversions • Business logic
• On line help • Measurements, units
• Sounds • Input Methods
• Data exchange
April 19, 2011 – p 51
51. Process requirements:
how to fit into an existing environment
• Lifecycle • Build
• Documentation • Source control
• Integration • Branching
• QA • Reporting structure
• Type of meetings • Review boards
• JUnit
• Globalyzer
• Bug Reporting
52. Questions & Answers
Adam Asnes Resources
adam@lingoport.com http://www.lingoport.com
Olivier Libouban Globalyzer
http://www.globalyzer.com
olivier@lingoport.com
Blog
http://i18nblog.com
53. Static Analysis Detail
Globalyzer example – Running and Reporting
Adam Asnes Olivier Libouban
President & CEO Globalization Lead
Lingoport Lingoport
57. Agile in one slide (smallest nutshell)
• Roles (Product Owner, Scrum Master, Team)
• Product Backlog
• Sprints (user stories are designed, implemented,
tested in a „short‟ timeframe, e.g. 3 weeks)
• Sprint Backlog
• Daily Scrums
• Demonstrable
• „Shippable‟
58. i18n and Agile Challenges
• Traditionally, Legacy i18n has followed a waterfall
model:
– i18n cuts across the code, for instance:
• Encoding problems …in all the code
• Formatting issues … in all the code
• Externalize strings …
– i18n needs a systemic approach
– I18n tend to have long project life cycles
– (L10n: must get an entire locale done)
• From a methodology perspective Agile:
– is feature driven
– runs in “short” Sprint
• Sometimes a Hybrid approach works best
60. Lingoport Project Assessment - Legacy
• Uncover i18n potential issues from 2 perspectives:
– Code perspective: Globalyzer reporting/metrics
– Architectural: Locale/technical i18n requirements
• Allows to create the initial „i18n product backlog‟
• Can, but does not need to be part of a Sprint
• Allows to have an overall scope and effort
estimate
• Can feed into a number of processes
– TDD, ADD, Waterfall, … Agile
• Involve the Product Owner: communication
resource
61. Lingoport Project Organization
Backlog identification and Scoping
• The i18n product backlog is a prioritized list of
requirements, stories, features, etc.
• What the customer wants, described using the
(Product Owner‟s) customer‟s terminology
ID Name Imp Est How to demo Notes
If no login before,
1 Locale Setting and Tracking 30 5 Log in, default locale
Splash screen for If first time, otherwise
Locale remembers
… …
… …
Log in for an 'en US'
2 Locale for languages 10 8 user Locale is default
Go to page 'www.'
Check pseudo
Change Locale localization
… ..
62. Lingoport Project Organization
Sprint Management
• i18n code branching
• Agile typically uses development build, CI
environments
• Must pass „regular‟ dev criteria
• Must be able to push i18n code branching easily
and vice versa
• I18n tests must be available to other teams in CI
• Some items are more sensitive than others
– Database schema changes and implications on all source