2. Who Are We? Our Mission To increase our customers’ profitability by significantly improving the efficiency of their information development and delivery processes. Qualitative Advantage Content Lifecycle Implementation (CLI) is Suite Solutions’ comprehensive approach – from concept to publication – to maximizing the value of your information assets. Our professionals are with you at every phase, determining, recommending and implementing the most cost-effective, flexible and long term solution for your business.
4. Introduction We will discuss how the DITA-OT is constructed and why We’ll give some insight into this:
5. Overview Problem Statement Solution: The DITA-OT Pipeline Overview of Preprocessing Overview of XHTML Output Overview of PDF Output
6. A Sample DITA Topic <?xmlversion='1.0' encoding='UTF-8'?> <!DOCTYPEtopic PUBLIC "-//OASIS//DTD DITA Topic//EN" "c:itaottdopic.dtd"> <topicid="topic" xml:lang="fr-fr"> <title>Sample Topic</title> <body> <pconref="conrefs.xml#conrefid/intropara"/> <p>For more information, please see <xrefhref="infotable.xml#topicid/tableid"/>.</p> <paudience="developers">Information that only developers want to know.</p> </body> </topic>
7. Items That Need Preprocessing <?xmlversion='1.0' encoding='UTF-8'?> <!DOCTYPEtopic PUBLIC "-//OASIS//DTD DITA Topic//EN" "c:itaottdopic.dtd"> <topicid="topic" xml:lang="fr-fr"> <title>Sample Topic</title> <body> <pconref="conrefs.xml#conrefid/intropara"/> <p>For more information, please see <xrefhref="infotable.xml#topicid/tableid"/>.</p> <paudience="developers">Informationthat only developers want to know.</p> </body> </topic> 1 2 3
8. Items That Need Preprocessing In the previous slide we saw: Conrefs Cross References Conditional Text All of these need some modification in our final output But there’s more!
9. Items That Need Preprocessing Lots more! Navtitle Keyref Moving metadata from the map Reltables Sibling and Parent / Child related links Chunking Copy-to Coderef And more…
10. Problem: Process DITA Correctly We want a solution that delivers on all of the features in the DITA spec Our solution should make it easy to reason about correctness Our solution should use common processing for most DITA features for many output formats
11. Solution: DITA Pipeline Our Solution: We should build the DITA-OT as a pipeline Each step makes one type of change to the DITA, and the output is also DITA Output from each step is the input to the next step This makes it easier to reason about the correctness of the implementation: each step should do one thing well It doesn’t solve all of our problems We still need to make sure each step is correct, of course We need ensure the order is correct
12. Example of a Pipeline Step: Conref Before: <p class="- topic/p " conref="conrefs.xml#conrefid/intropara" xtrf="C:ygwinomeamilyebinarest.xml" xtrc="p:1"></p> After: <p class="- topic/p " xtrf="C:ygwinomeamilyebinaronrefs.xml" xtrc="p:1">This is an introductory paragraph.</p> Note if you happen to try this and compare the files: this isn’t the only difference between the two files, but it’s the only meaningful difference (e.g. <xref/> vs. <xref></xref>)
13. Technical Details About the Log The DITA-OT is mostly implemented in a mixture of three languages: Ant Java XSLT Ant drives the whole process Not used the way that’s familiar to Java users – it is not used to manage dependencies on changed files, mostly just to run the steps in sequence java –jar dost.jar is just a wrapper around Ant, which sets up clearer logging, provides an easier way to enter some parameters, and runs integrator
14. Technical Details About the Log If you run via Java, your log shows messages like: Debug and filter input files... Debug and filter input files... Copy image files... Copy html files... Copy flag files... Copy subsidiary files... Copy generated files... Resolve conref push... Resolve conref in input files... If you run via Ant, your log shows messages like: debug-filter-flag-check: debug: debug-and-filter: debug-filter: copy-image-check: copy-image: copy-html-check:
15. Technical Details About the Log Both the Ant logs and the Java logs are reporting the same events Ant is more verbose, since it logs each “target”, some of which aren’t very interesting The Java version tries to give messages that are closer to English using the target description instead of the name The “pipeline” that we’ve been discussing isn’t logged separately The “pipeline” is implemented just as a set of steps in Ant So there’s no direct way to view the pipeline itself Confusing: There’s an Ant task called <pipeline> that runs some parts of the pipeline, but not all
16. Order in the Pipeline Best current source of information, from Robert Anderson: http://dita.xml.org/node/2469 Some steps must come before others: If you want to figure out the text for a cross reference to a topic, you need to know the name of the topic – but if you use navtitle and locktitle in your map, the title will change Therefore, extract the title after you process locktitle Some steps should come before others: Process conditional attributes early, so that you don’t have to waste time with other processing if you will remove the element anyway At that link, you’ll find much more detailed information about each step
17. Maintaining Valid DITA Each step during preprocessing outputs valid DITA This reduces the dependence between steps – if you skip a step, everything’s fine. The OT does skip steps if it knows they’re not needed, e.g. it doesn’t do conref processing if there are no conrefs This also helps catch errors, since the output gets validated at each step The DITA specification has features to help: xtrc, xtrf attributes for debugging – the toolkit fills these in at the beginning, and the values are maintained throughout processing. related-links section in each topic to hold the links gathered from reltables and generated by relationships Similar metadata that allows metadata from the map to be pushed into the topics
18. Goal of Preprocessing All DITA features should be processed into simpler but valid DITA Example:Conrefs are eliminated Example:Descriptions are filled in within cross references Each DITA file should stand alone All the information needed for output is now in the individual files There’s a single DITA map that stands alone All the information needed for output is in that map; all the submaps are joined together New files created from chunk and copy-to are also “ready to go” All that’s left to do with the DITA is switching to a new vocabulary – such as HTML or XSL-FO
19. Performance Issues In theory, the pipeline reads and processes files many times DTDs (once for every step for every file) XSLT Stylesheets (once for every time each is run) The DITA files themselves (once for every step for every file) Mitigation Latest DITA-OTs have a patch from Eliot Kimber that only reads the DTDs once and caches them Ant caches stylesheets There’s a price to pay: it costs memory – if you run out, you can shut off this cache with dita.preprocess.reloadstylesheet=true There’s no cache for the DITA files themselves yet There have been some discussions on the developer group about changing the pipeline implementation, so this might be provided as part of that
20. Overview of HTML Processing HTML itself is “simple” – it translates the DITA into corresponding XHTML elements The stylesheets do have to do work to change between structures that aren’t quite similar, such as between certain DITA tables and HTML tables Most are straightforward: <uicontrol> becomes <span class=“uicontrol”> Formatting is handled in the CSS file, not in processing DITA topics are processed to make the HTML files The merged map is processed to make the TOC
21. Overview of PDF Processing PDF processing in the toolkit is more complicated You need at least one more step – convert DITA to an intermediate format that is straightforward to convert to PDF In theory, it didn’t have to be so complicated, since you can generate PDFs from HTML and CSS In practice, CSS used to be less sophisticated than it is now, and people have more demands from their PDFs Index Language specific font control You might want your Chinese characters in a Chinese font, and your English characters in an English font – no font covers all languages We’re not going to discuss all the things that PDF output is doing, just the steps it takes
22. PDF Processing Steps (1) Topicmerge – merge all the topics and the map into one big file This is a good opportunity to do certain kinds of processing, such as creating fake topics that say “MISSING TOPIC” if a topic is missing This step already does create fake topics to correspond to <topichead> in the map, since the main PDF processing is done topic by topic Note: topicmerge is done in Java, then post-processed by topicmerge.xsl in the FO plugin _MERGED.ditamap to stage1.xml Collects all the indexterms from the topics, and puts them at the end for later processing to create the index
23.
24. PDF Processing Steps (3) stage2.fo to stage3.fo Java step which looks for characters in other languages, which will be marked for font processing stage3.fo to topic.fo Substitutes logical fonts for real font names from font-mappings.xml If you specify that Chinese should have a different Sans font, then English characters will have the English Sans font, and Chinese characters will have the Chinese Sans font, even if they both appear in the same element topic.fo to PDF Actual PDF is created by an FO Processor, probably one of: Apache FOP (free, but doesn’t support indexes) RenderX XEP Antenna House Formatter