3. SCHEMAGroup2015–Allrightsreserved
Definitions and Terminology:
Marcus Kesseler, SCHEMA & DERCOM
Marcus Kesseler
Computer Scientist with a heavy Artificial Intelligence background.
One of two founders and managing directors of SCHEMA GmbH.
SCHEMA
A software company based in Nürnberg.
SCHEMA is 20 years old and we have been making
and selling CCMS from day one.
DERCOM
Is the Association of German Manufacturers of Authoring and
Content Management Systems.
Currently 7 companies, with 1,400 customers between them.
4. SCHEMAGroup2015–Allrightsreserved
Definitions and Terminology:
CCMS
CCMS
Component Content Management System.
The main difference between a CMS and a CCMS:
A CCMS has the ability to aggregate content
components into larger documents.
A CCMS is able to publish content as “classic”
documents or as Web portal content or app content,
all with very high quality.
5. SCHEMAGroup2015–Allrightsreserved
Definitions and Terminology:
DITA
DITA
Darwin Information Typing Architecture, an XML and files-based standard
for the representation of componentized and interlinked content.
Although there are several DITA-based CCMS implementations, DITA can
be used with just an XML Editor, the file system and the DITA Open
Toolkit.
What we like about DITA, is the visibility it brings to the enormous
advantages of componentized content.
We fully agree with the DITA community, that there really is no alternative
to working with components (or topics) in large-scale, state-of-the-art
technical content authoring, management and distribution.
6. SCHEMAGroup2015–Allrightsreserved
More Terminology: Essential
and Incidental Complexity
Essential complexity, also called intrinsic or inherent complexity, is the
complexity you cannot hide or get rid of in a software implementation.
It is directly derived from the domain you are modelling.
Example: When moving from a document based content authoring to
a componentized one, the number of objects you have to deal with
goes up by two or three orders of magnitude. The only way to hide this
increase would be to hide the components, which, of course, would
defeat the purpose.
Incidental complexity, is an extra dose of complexity added on top of
the essential complexity by bad choices of architecture, data
representation or user experience design.
8. SCHEMAGroup2015–Allrightsreserved
Our context is not the Lone
Technical Content Ranger
All arguments in this talk assume that we are talking
about the processes and needs of large technical content
department operating at a high level of maturity.
We are not talking about the perspective of the Lone
Technical Content Ranger.
Russell Ward presented this perspective in his great
talk last year here at tekom 2014:
Five reasons not to use DITA
[http://conferences.tekom.de/fileadmin/tx_doccon/slides/742_5_Reasons_Not_to_Use_DITA.pdf]
9. SCHEMAGroup2015–Allrightsreserved
Large Technical Content
Departments: Some Parameters
So, what is a Large Technical Content Department?
5 to several dozen technical writers.
Publications have to be regularly updated in 5 to 30
(or more) languages.
Multiple publication formats, including:
Paginated formats, like PDF (directly or via InDesign,
FrameMaker or Word).
Online formats, like HTML, HTML5, EPUB, etc.
Custom XML formats.
10. SCHEMAGroup2015–Allrightsreserved
Large Technical Content
Departments: Processes & Worflows
The following are defined and enforced:
Writing standards and terminology
Translation standards and workflows
Artwork & media standards and workflows
Publication workflows
Release workflows
Distribution Workflows
11. SCHEMAGroup2015–Allrightsreserved
Large Technical Content
Departments: Core Challenges
Layout has to be of the highest quality, strictly
adhering to Corporate Design standards.
Products are highly modular or organized in product
families with common base features, both of which
are key requirements for effective and massive
content reuse.
Product innovation is fast and relentless,
the technical content team is always under pressure
to keep product and information life cycles in sync.
So, just another great day in the wonderful world
of technical content publishing. Life is good!
12. Reason 1
Coverage of Component Content
Management Requirements in
DITA is Surprisingly Small
13. SCHEMAGroup2015–Allrightsreserved
Requirements Coverage of
XML, DITA and CCMS
# Process Name & Requirements
Max
Points
XML DITA CCMS
1
Topics management
(classes, workflows, versioning, ownership, access control).
10 0 3 9
2
Manage the links between topics
(classes, workflows, versioning, ownership, referential integrity).
10 0 3 9
3
Management of the maps that build the publications out of the underlying
components
(versioning, ownership, referential integrity).
10 0 3 9
4
Manage the metadata on topics, links and maps
(classes, workflows, versioning, ownership).
10 1 2 9
5
Translation management with automatic flagging of topics needing re-translation
(ownership, workflow, dataflow).
10 1 1 8
6
Media assets management
(classes, workflows, ownership, guidelines, conversion, translation).
10 1 2 7
7
Publication formats and layout management
(design within corporate guidelines, implementation, revisions).
10 0 4 8
8
Automatic publication generation and channel specific distribution
(workflow, IT systems integration).
10 0 2 6
9
Overall content, links and publications quality assurance and approval processes
(correctness, writing style, terminology, translations, links, publication maps,
graphics and layout).
10 2 3 8
14. SCHEMAGroup2015–Allrightsreserved
Requirements Coverage of
XML, DITA and CCMS
# Process Name & Requirements
Max
Points
XML DITA CCMS
10
Information model management
(conceptual design, classes, roles, rights, workflows, evolution).
10 0 2 9
11
Performance & costs management
(financial controlling, key performance indicators monitoring, tracking, corrective
actions)
10 0 2 4
12
Security
(user management, user roles, access control, change tracking).
10 0 0 8
13
IT and software infrastructure management
(change, updates and upgrades).
10 0 0 4
14
Manage the communication with adjacent departments, like product
management, engineering and marketing
(responsibilities, workflows).
10 0 0 3
15
Team management
(skills, training, structure, responsibilities, motivation).
10 0 0 0
Coverage [Points] 150 5 27 101
Coverage [Percent] 3% 18% 67%
Coverage with CCMS baseline [Percent] 27% 100%
15. SCHEMAGroup2015–Allrightsreserved
Requirements Coverage of
XML, DITA and CCMS
XML DITA CCMS
[DITA]
CCMS
[DERCOM]
Business
Logic in
DITA
Open
Toolkit
Business
Logic in
Database,
Workflow
System,
TMS
Interfaces,
Media
Assets
Management,
etc
Non-DITA CCMSs bonus for
being on the market for at
least 10 years longer
?
16. SCHEMAGroup2015–Allrightsreserved
Drawbacks of a Small
Requirements Coverage
Comparing CCMSs based on their level of DITA compliance
would not yield much insights, since most requirements
are outside of DITA’s scope.
All features not within DITA’s scope would not be trivially
portable to other DITA-based systems. Some examples:
Versioning
Translation states & dataflow
Release and ongoing workflow states
Media assets management
Access rights & user management
Note: Even with a DITA-based CCMS, you would
incur a significant amount of vendor lock-in!
18. SCHEMAGroup2015–Allrightsreserved
Evolution of DITA is too Slow
An update every five years is just not compatible with the
demands of an ever accelerating market (variables? scoped
keys?).
Fast evolution of DITA is impeded by the following two
inherently conflicting requirements:
The need to add features that are crucially missing in real-
life application scenarios.
The need to prevent new features that would add even more
incidental complexity to the standard.
19. SCHEMAGroup2015–Allrightsreserved
Evolution of DITA is too slow
Scoped keys are a good example:
Under heavy reuse scenarios you are very, very
likely to need them.
On the other hand, should tech writers really need
to be trained in programming languages scoping
concepts, just to be able to handle reuse
variability?
21. SCHEMAGroup2015–Allrightsreserved
How is a DITA Topic
Represented in a File System?
TOP
[XML]
DITA
Topic
File
File Metadata
(Name, Owner,
LastWriteDate, …)
Metadata within
XML DITA Topic
(class, author,
target audience, …)
XML
Content
23. SCHEMAGroup2015–Allrightsreserved
… and some versions …
TOP
EN
V1
. . .TOP
FR
V1
TOP
JA
V1
TOP
PT
V1
TOP
EN
V2
. . .TOP
FR
V2
TOP
JA
V2
TOP
PT
V2
TOP
EN
Vn
. . .TOP
FR
Vn
TOP
JA
Vn
TOP
PT
Vn
...
24. SCHEMAGroup2015–Allrightsreserved
… and after several years, a single topic
may have proliferated into m × n files!
TOP
EN
V1
TOP
FR
V1
TOP
JA
V1
TOP
PT
V1
TOP
EN
V2
TOP
FR
V2
TOP
JA
V2
TOP
PT
V2
TOP
EN
Vn
TOP
FR
Vn
TOP
JA
Vn
TOP
PT
Vn
n versions
m languages
25. SCHEMAGroup2015–Allrightsreserved
How m × n Topics are
accessed in DITA
In DITA each single translation or version is a unique,
individual file and hence a distinct topic.
The user has to know exactly what language and version is
being referenced.
Keys or file names will likely follow some pattern like this:
Topic_Intro_en_V1
Topic_Intro_fr_V1
Topic_Intro_ja_V1
Topic_Intro_en_V2
Topic_Intro_fr_V2
Topic_Intro_ja_V2
26. SCHEMAGroup2015–Allrightsreserved
How m × n Topics are
Accessed in a CCMS
In a CCMS implemented on top of a database, all these m × n
topics can be addressed with a single key:
[ID_Intro, Language, LatestReleasedVersion]
where Language and LatestReleasedVersion are variables,
that the system will automatically populate as needed.
In Computer Science this is called a composite key, and was
invented over 45 years ago at IBM.
Composite keys capture and optimally encode the regularities in
the target domain and let the computer do the tedious book-
keeping. This is what computers are good at!
27. SCHEMAGroup2015–Allrightsreserved
How m × n Topics are Accessed
by the Author in a CCMS
Authors will rarely need to see, insert or handle
full CCMS composite topic keys:
[ID_Intro, Language, LatestReleasedVersion]
Since the composite key structure is universal within the system,
there is no need to explicitly represent the variable parts. They are
optional and will be implicitly added at document aggregation time.
What the author sees and handles is just:
[ID_Intro]
And, of course, usually even this is hidden by the GUI.
28. SCHEMAGroup2015–Allrightsreserved
Advantages of Composite Keys
DITA would be so much easier, if references were defined as
composite keys:
Maps would be directly reusable. No need to create and
maintain a map for each language. A change to the map
structure in English is automatically available in all other
languages.
New languages (or versions) can be added to your pool without
touching the maps at all!
No need to develop, train and enforce sophisticated file name
or key patterns to manually capture and encode these rather
trivial domain regularities.
Authors need only insert a reference to the topic, the system
does the tedious and error-prone book-keeping.
29. SCHEMAGroup2015–Allrightsreserved
Representation of m × n
Topics in a CCMS
EN FR JA PT
TOPIC
Metadata for
this version in
this language
Metadata for
all versions in
this language
Metadata for
all versions in
all languages
Topic
container
Language
container
XML
container XML
V1
XML
V2
XML
Vn
XML
V1
XML
V2
XML
Vn
XML
V1
XML
V2
XML
Vn
XML
V1
XML
V2
XML
Vn
XML
content
36. SCHEMAGroup2015–Allrightsreserved
DITA‘s XML-first Paradigm vs.
a Database-first Paradigm
In DITA, every information or data that is needed to drive business
processes has to be inside the XML files together with the content as
such (= DITA’s XML first paradigm).
This goes against quite a few Computer Science information model
designing principles.
Any change, however minimal, to a topic can affect content, structure,
linking or metadata and therefore has to be carefully scrutinized to
identify what exactly changed and if any consistency rules were
broken.
Enforcing the principles of Atomicity, Consistency and Isolation in DITA
is quite a challenge (cf. The ACID Principles of Database Design).
37. SCHEMAGroup2015–Allrightsreserved
DITA‘s XML-first vs.
Database-first
Please note that DITA’s XML first is a huge incidental complexity driver
for DITA-based CCMS implementations:
There is pressure to improve metadata handling by keeping them in
the database, but, with XML-first, you also have to keep them in the
DITA files. Now there are two distinct and separate representations.
You’ve lost your single source of truth.
The database value and the DITA XML value can get inconsistent
from update conflicts and may have to be manually corrected by the
users.
Controlling change permissions for individual metadata values in a
file is also a huge challenge. It is possible to do it in good XML
editors. But users can still open the XML file in Notepad…
39. SCHEMAGroup2015–Allrightsreserved
Trend in CCMS: Content
Model Complexity Reduction
In the last 10 years, there has been a very strong trend in the CCMS
market to reduce content model complexity (aka semantic DTDs).
Content departments observed, that in the long term, they never got back
their investment into design, implementation, training and especially
maintenance of their sophisticated, made-to-order content models.
The trend is simply to move the needed business data from the XML
content into the database, where it is much easier to implement, manage,
interface with, retrieve and use productively.
40. SCHEMAGroup2015–Allrightsreserved
Examples of Content
Model Complexity Reduction
Some examples:
Topic types or classes are just metadata in the database. The variability on
the XML Editor (DTD) level is reduced to an absolute minimum.
All metadata assigned to a topic is moved from the XML into the database.
Fine grained variability in the content is handled by variables, which on the
XML content level are just very simple references into the database. The data
model for variables in the database is very powerful and table oriented
(=EXCEL), so that it is easy to maintain versions, languages and taxonomic
dependencies of variable names and values without touching the XML
content.
41. SCHEMAGroup2015–Allrightsreserved
DITA Specialization
As a Computer Scientist, I think DITA Specialization is a really impressive and
elegant solution for the implementation of sophisticated content models.
But again, DITA is adding all this sophistication to the XML level, where it will
incur a big cost in incidental complexity.
I think that there is a consensus, that even the default DITA content model is
already challenging for most technical writers new to component-based
authoring.
42. SCHEMAGroup2015–Allrightsreserved
DITA Specialization
There is a paradox, in that just to trim the content model down to a more
manageable scope, you already need a significant amount of consulting and
configuration.
The OASIS Lightweight DITA Initiative, chaired by Michael Priestley (IBM), is
trying to remedy this situation, so that you can start simple and add more
features later, when you understand the principles and can be sure that you
really need them.
44. SCHEMAGroup2015–Allrightsreserved
Summary of our
5 Reasons against DITA
1. Coverage of Component Content Management
Requirements in DITA is Surprisingly Small.
2. Evolution of the DITA Standard is too Slow.
3. How DITA deals with the Number of Files Explosion.
4. DITA‘s XML-first Paradigm.
5. The Default DITA Content Model is too Complex.
45. SCHEMAGroup2015–Allrightsreserved
Conclusion
As long as the DITA standard is based on a non-negotiable
XML-first paradigm, it will always incur a tremendous
incidental complexity cost on multiple levels:
Initial configuration, even if just to trim DITA back, is
significant.
Integrating DITA into a CCMS (or database) is fragile and
expensive.
Technical writers will need a lot of training and close
motivation monitoring.
46. SCHEMAGroup2015–Allrightsreserved
Recommendation
Our recommendation would be to decouple the DITA
business logic from the XML-first principle.
In the end, this means the DITA Open Toolkit would not
be just a smart topic aggregation compiler, but behave
much more like an integrated database application, in
short: just like a state-of-the-art CCMS.
Tekom 2015 presents a very convenient opportunity to
take a closer look at these systems!