The document summarizes the experience of implementing a DITA content management system (CMS) at AMD's graphics division over almost four years. Key points include:
1) Productivity increased 2.3-3x through content reuse, automation, and fewer formatting issues in localization. Output increased while staff decreased.
2) Localization costs dropped to less than half of pre-CMS levels due to greater content reuse and streamlined processes.
3) Tracking metadata allows comprehensive measurement of productivity, including topics created/modified, translation auto-matches, and topic reuse rates. This data aligns with product release cycles.
(Almost) Four Years On: Metrics, ROI, and Other Stories from a Mature DITA CMS Installation
1. (Almost) Four Years On:
Metrics, ROI, and Other
Stories from a Mature DITA
CMS Installation
Keith Schengili-Roberts | November 15, 2010
2. Agenda
• Intro + ROI
• Things We Didn’t Expect
• Measuring Productivity: Uses of Metadata
2
3. Who is This Guy?
Keith Schengili-Roberts
• Manager for documentation and
localization for AMD’s Professional
Graphics division (formerly ATI)
Prior to becoming manager of the
group, was its information architect
• Lecturer at University of Toronto’s
Professional Learning Center since
1999, teaching courses on information
architecture and content management
(sample slide decks available from:
http://www.infoarchcourse.com/)
• Author of four titles on Internet
technologies; last title was “Core CSS,
2nd Edition” (2001)
3
4. ROI Executive Summary
Proven return on investment (ROI) benefits from
using a CMS-based DITA over the previous toolchain:
Productivity/output increases
– Somewhere between 2.3 and 3 times more efficient
Can “do more with what we’ve already got”
– Minimalism and content re-use goes a long way
– We have fewer writers than when we started while our
output rate continues to increase
Localization cost savings
– Localization budget is now less than half of what we
needed from the year before we started using the DITA
CMS
– We are much more productive
4
5. What We Do
Documentation & Localization Group at AMD's Graphics Product
Group (GPG)
Formerly ATI
Based in Markham, Ontario
4 writers, 2 process engineers, 2 localizers, 1 manager
CMS: DITA CMS from Ixiasoft (www.ixiasoft.com)
Responsible for:
End-user documentation, including online help (20%)
Engineering documentation for ODM/OEM partners (60%)
Technical training documentation for partners (20%)
Localize in up to 25 languages (mostly end-user and UI)
Primary outputs are PDF and XHTML
5
6. Where We Started (i.e., “The Bad Old Days”)
Circa 2003-2006:
• Used unstructured FrameMaker
Localization costs very high
Code page issues made localization QA work hard
Could not reliably keep in sync with major software releases
(monthly cadence required for online help; could only do it twice
a year)
Writers were deeply siloed
Very little content shared
Content re-use (especially between different docs) very low
Output was efficient but quality was highly variable
6
7. Where We Are Now
Have been using Ixiasoft’s DITA CMS in production since
February 2007
Have published more than 2,200 documents in that time
46% in English
54% in the languages to which we localize (21 maximum)
Writers and the documentation process are more nimble
Any writer can take on another’s projects
Content re-use rate is good (slightly more than 50% monthly)
Quality is uniformly better; re-used topics are edited topics
Localization process is streamlined, with more time now
available to focus on QA than on administration or fixing
formatting issues
7
8. Getting ROI by Doing More with What We’ve Already Got
• Using the old toolchain, we spent about 50% of our time
formatting content; this equates to an almost equal boost in
productivity using the DITA CMS.
• We automate things that can (and should) be automated; no
more TOCs or Indexes built by hand.
• Through attrition, we have fewer personnel writing/localizing
content; despite this, our output rate has increased.
An information architecture content audit of existing materials
emphasized minimalism and re-use within and between
document types.
Content re-use is considerable; now, de-siloed writers are more
flexible on what they can work on.
We continue our effort to find out what customers find useful,
and to give them only the information they require.
8
9. ROI: Doing More with Less
Comparative numbers from 2007:
• Numbers show equivalent work on engineering docs (size
types/sizes of docs/product release cycle)
• DITA CMS made us faster
• More than doubled output using the same headcount while
taking on an expanded range of document types
9
10. ROI: Doing More with Less (cont.)
What’s happened since 2007?
10
11. ROI: Doing More with Less (cont.)
In 2009, 4 writers were responsible for 366 docs.
• On average, each writer produced 91.5 docs in a year = ~23 per
writer per quarter
This figure does include revisions; however, on average, we do same
number of revisions as we did under the old toolchain (we just do them
faster).
• Compare this to some roughly equivalent numbers from another
Tech Writing team cover a similar subject area using our old
toolchain:
They produced 360 docs using 9 over the course of a year; their docs
roughly the same size, type and having a similar release cadence
This = 40 docs per writer per year, or 10 per writer per quarter
– By these numbers, use of the DITA CMS improves efficiency by 2.3 times
(your own results may vary)
• The two localization coordinators were responsible for producing
432 docs in the system during 2009.
11
12. ROI: Localization Cost Savings
• Content re-use in English corresponds directly to
translated content re-use
• Eliminated desktop publishing (DTP) charges
• As a result, we are able to produce publications
more quickly and reliably and less expensively
than with our old toolchain:
One example is our Catalyst Control Center online help:
prior to the DITA CMS, we could only hope to do this at
most every 6 months; now, we can keep up with the
monthly software release cycle.
12
13. CMS-based DITA and Localization Costs
CMS ROI
“Bad Old Days”
Content audit +
Single-sourcing
Blue line= localization budget for quarter; Red line= actual localization spend
Our annual localization budget is now 2.5 times less than the year before we started using
the CMS (2006)
• DITA CMS has more than paid for itself based only on reduced localization costs
The volume of localized content has increased over this time period
13
14. DITA Advantages from a Writer’s Perspective
Moving and implementing DITA is typically a
management decision, but there are advantages for
the writers:
Learning a new and valued skill (I've had two writers
hired out from under me by another firm looking to "do
DITA").
As content re-use increases over time, the writers act
more as editors, so have a higher "value-add" to the
content process.
Significant topic re-use means that writers learn more
about other subjects using other writers’ topics,
effectively de-siloing the writing team.
Programmatic skills increasingly called into play because
there is a need for people who understand XSL and text-
parsing languages (such as Python) and also understand
publishing.
14
15. Things We Didn’t Expect
• Need for a “house” DITA Style Guide
Also found ways to help enforce it
• Conrefs vs. Cloning
• More nimble options available for doing localization
• Use of tracking-based metadata allows us to do
thorough productivity measures
And allows us to measure useful things we had not
initially anticipated
16. How Much DITA Do You Need?
In terms of the number of tags
you need to use, it may be less
than you think:
Our initial approach was
evolutionary; writers could use
any tag they felt necessary, and
over time DITA tagging styles
were established and made
uniform (DITA Style Guide).
Using fewer tags decreases
formatting issues/clashes when
creating XSL output types.
In all, we actively use fewer than
half of all DITA 1.1 tags.
16
17. Cloud of Relative Tag Usage
• 67 tags displayed, with a threshold of +20 min. usage
• Tags not included because they are auto-populated/included in
our topic templates: othermeta, metadata, prolog, searchtitle,
shortdesc, titlealts, navtitle
17
• Created using “Wordle” from www.wordle.net
18. Creating a DITA Style Guide
A recommendation for any tech docs group that uses
DITA extensively:
Helps new writers/contributors come up to speed
Usefully narrows the scope of the XSL work that needs to
be done
Many things are “legal” in DITA but may be poor from a
“house style” standpoint, for example:
– Can have unformatted block content between a header and a table
in a section
– Tables and figures do not have to have a title
– Can have unlimited nested lists
– Alpha lists can contain more than 26 items
– Lists can contain only a single item
18
19. Schematron Can Help Enforce DITA Style
What is Schematron? “Schematron is a rule-based validation
language for making assertions about the presence or absence
of patterns in XML trees.” (www.wikipedia.org)
We use Schematron to point out to the writers potential
errors/lapses in our DITA House Style:
Text between a section and table not wrapped in block tags:
A list ought to have more than one item (otherwise, why make it
a list?):
19
20. XSL Can Help Enforce DITA House Style
We have a DITA house style that says nested lists should be no
more than two levels deep.
Here’s Schematron doing it’s job:
And here is the result if you try to output it:
20
21. Conrefs vs. Cloning
At a very early stage we decided not to use conrefs in our DITA
content
• Made localization programmatically complicated/inefficient
• Creating a localization kit would mean finding all conrefs in a doc
(however many levels they are nested) and then “flattening” them;
leads to inefficient segment-matching
• Did not seem cost-effective from an author’s perspective
• Would seem to limit reuse as conref targets become “fixed”; dare
not change without affecting many docs
• Searching and then defining a single phrase or paragraph to reuse
not always an efficient use of time
21
22. Conrefs vs. Cloning
• We instead chose a “clone” approach to topic re-use:
• Essentially, make a copy of an existing topic and use only the
parts that you need in your current document
• Original topic and cloned are completely separate (though
trackable; parent/child relationship is retained in CMS)
• Cloning is only done when the amount of change is sufficient
that the original topic cannot accommodate it
• Writers can more freely re-use existing topics for their own
needs
• When a localization kit is made, the segment matching process
is efficient
22
23. Nimble Localization Processes with DITA XML
Under the old toolchain, localizing a 200+ page document to a
single language within a week (without huge expense) was
impossible.
DITA XML allows us to be more nimble: for critical large
documents, we can send the localization firm finished “parts” as
we get them (“70/20/10”):
When roughly 70% of a large document is done, we send it off
for translation, followed a week or two later with another 20%
of new and updated material, then the last 10% when we
complete it.
While this process does cost more than sending in a whole
document at once, it reduces the turnaround time from weeks
to days, and quality is much improved because it is not done in
a rush.
This approach was simply not feasible using our old toolchain;
ultimately, the new toolchain is still cheaper and much faster.
23
24. Measuring Productivity: Uses of Metadata
There are three main purposes for metadata:
Retrieval
Re-use
Tracking
• Everyone who has used a search engine is familiar with
the “Retrieval” part.
• Authors can add their own metadata to topics to aid in
later retrieval for re-use.
Topic and map dependencies can be checked, and
associated topics re-used in other publications.
25. Tracking Metadata
Tracking metadata (in our case, mainly dates, author, and
topic/map status) is used for understanding trends and
managing workflow.
The types of questions we can readily answer include:
Who created the content (author)?
When was it created (date)?
Who modified it (editor)?
Who reviewed it (reviewer/approver)?
Where has it been re-used (map relation)?
Has it been published or translated (status/language)?
25
26. How We Measure Productivity
Metric we use is a combination of topics created + topics
modified in a monthly/quarterly timeframe:
Each new topic created counts as 1.
Modified topics are also counted, though again only as 1.
Subsequent revisions to the same topic in a given
timeframe are not counted.
Provides us with a very good view of ongoing work, and
the numbers align with known product release cycles.
Works both as an aggregate measure (total output per
month), and as a measure of a writer’s individual
productivity.
Maps are also tracked, but are not as good for measuring
productivity since they come in many sizes and have widely
varying development timelines.
26
28. Topic Production Matches Product Cadence
Product
Product Product Release Cycle
Release Cycle Release Cycle #3
#1 #2
Secondary Peak
Secondary Peak
Secondary Peak
Main Peak
Main Peak
Main Peak
• Regular peak of production in Q3, typically followed by secondary peak in Q1
28
29. Localization Segments Auto-translated within CMS Monthly
• Portion in orange is
the percentage
that were 100%
matches, and were
never sent to a
localization vendor
= pure ROI!
• From July 2008 to
July 2009, an avg.
of 54% of
segments were
auto-translated
within the system.
29
30. Sample Topic Reuse Rate (Monthly)
From Jan 2008 to June 2009, average monthly topic reuse rate = 53.53%
30
31. An Interesting Trend: Topic Ratios
Except in year one, reference topics steadily make up ~74% of all topics used
31
32. What is the Average Size of a Topic?
Maps avg. = 3.47 kb
Concepts avg. = 2.46 kb
References avg. = 7.88 kb
Tasks avg. = 3.20 kb
1 byte = 1 character
1000 bytes (1 kb) = 1000 characters
• Concepts avg. 0.65 of a page of
Lorem ipsum text in Word
• References avg. 2.6 pages
Smallest: half a page
Largest: ~200 pages
• Tasks avg. 1 page
32