SlideShare una empresa de Scribd logo
1 de 20
Speeding Up Your DITA-OT Processing Aryeh Sanders, Suite Solutions
Who Are We? Our Mission To increase our customers’ profitability by significantly improving the efficiency of their  information development and delivery processes. Qualitative Advantage Content Lifecycle Implementation (CLI) is Suite Solutions’   comprehensive approach – from concept to publication – to maximizing the value of your information assets. Our professionals are with you at every phase, determining, recommending and implementing the most cost-effective, flexible and long term solution for your business.
Clients and Partners 3 Private and Confidential Suite Solutions©2009
Introduction Performance in the DITA-OT “No Silver Bullet” Design of the DITA-OT puts limits on performance without a redesign Some of which is underway Performance relative to what? Try to examine needs to figure out which performance issues should be tackled and which can be ignored No hard and fast rules Performance can be assessed only with your data, in your environment Measurement
Overview Overview of the webinar Performance Pain Points in the DITA-OT Hardware and Software Changes for Performance Memory Settings for Java Stylesheet Performance and Code Changes
Performance Issues With the DITA-OT The DITA-OT sacrifices speed for simplicity Constructed as a pipeline of transformations, each step of which does one thing Each step must at least reparse the DITA files Each read of a DITA file with DOCTYPE reparsed the DTDs Now it doesn’t – Eliot Kimber added a patch to cache the DTDs Best takeaway from this talk – upgrade to a version with this patch - 1.5.1 XSLT High level language, far removed from the practicalities of performance Often, the easiest way to do something is XSLT involves repeated searches through the document
Importance of Measurement A Case Study Since the DITA-OT writes many files repeatedly, we have to wait for the hard disk to complete the write, even to temporary files where long term integrity isn’t that important.  This certainly holds up processing, right? Test: Stop those writes ImBench – ramdisk tool Create a temporary disk in memory and use that as the temp directory Now, no writes have to wait for the disk Run the OT 20 times with the same data I used a slightly complicated map (98 pages on output) 41.1 seconds average with disk vs. 39.1 seconds in memory For most people, not worth it; on the other hand, saves  5% of the time
Hardware Issues Anecdotal: I’ve run the same data and stylesheets on my laptop, and on a client’s server 10 minutes on the server vs. 1.5 minutes on the laptop And it’s not a new laptop Since the DITA-OT is doing a lot of processing, it’s worth using a machine that’s capable of reasonable performance Measure! But a modern low-end $250 Dell desktop is about as fast as my laptop Don’t throw it on an old computer and then make people wait Make sure there’s one core free to run the OT so it doesn’t have to compete with other processes
Hardware Issues (2) Make sure there’s enough memory Very workload dependent For very large workloads (roughly > 600 pages, or > 1000 topics), consider a 64-bit machine with a 64-bit JVM Eliot Kimber is working on a patch to pass the right memory parameters to the OT – if this is an issue, check the developer mailing list or contact him If there’s not enough physicalmemory, you can get thrashing JVM memory on next slide
Memory Once you have enough, it won’t help to have extra Slightly surprising to me, but I tested at least one data set -Xmx tells Java the maximum heap size The reason this is slightly surprising is that before Java gives up, it will try garbage collection Frequent garbage collection can be slow Possibly the OT doesn’t tend to release memory Some datasets run out of memory, then the standard advice is to set reloadstylesheets=“true” Slows down processing, since stylesheets are re-read Much better to figure out how to give the OT enough memory if possible One customer solved their memory issues with JRockit as JVM
XSLT Performance Stylesheet developers don’t necessarily think about what needs to happen behind the scenes Example: <xsl:variable name=“example” select=“//*[@id=$refid]”/> ,[object Object],In the context of a document where @id is unique, both would behave the same, but one would be slower than the other Except:this could theoretically be optimized if the @id attribute was an ID type, and you have a DTD, and the stylesheet processor has that optimization built in, which leads us back to … Measurement is also useful for stylesheets Saxon comes in a free version and commercial versions Not that expensive, with more optimizations, which might matter for your workload – or might not
Profiling Good idea, many commercial tools Oxygen, StylusStudio, fancier editions of Visual Studio Essentially another example of measurement to find the real pain points Not always necessary if the pain points are evident
XSLT Performance (2) XPath tends to have one line requests, but that one line can hide a lot of computation What needs to happen to process this?preceding-sibling::*[following-sibling::*[contains(@class, ‘ topic/ul ‘)]] Preceding-sibling has to check each previous sibling For each one, following-sibling has to check every following-sibling And contains() itself can’t be that efficient because it needs to hunt within @class for ‘ topic/ul ‘ Some numbers: Let’s look at 100 nodes, and let’s pretend that there is no topic/ul, so the test never succeeds.  Let’s run this test on all 100 nodes in sequence We could do the math, but it’s easier to write a program
XSLT Performance Example (Calculated in Perl, sorry) for $a (1..100) {           #for each of our 100 nodes     for $b (1..$a-1) {      #look at the preceding-siblings         for $c ($b+1..100) {  #look at the following-sibling of each of those             $contains++;    #and call contains()         }     } } print $contains, ""; Running this tells us there are 328350 (!) calls to contains() Of course, with 10 nodes, there are only 285 calls, but the point remains – one line in XSLT might be doing a LOT of computation
Tips From Mike Kay Eight tips for how to write efficient XSLT: Avoid repeated use of "//item". Don't evaluate the same node-set more than once; save it in a variable. Avoid <xsl:number> if you can. For example, by using position(). Use <xsl:key>, for example to solve grouping problems. Avoid complex patterns in template rules. Instead, use <xsl:choose> within the rule. Be careful when using the preceding[-sibling] or following[-sibling] axes. This often indicates an algorithm with n-squared performance. Don't sort the same node-set more than once. If necessary, save it as a result tree fragment and access it using the node-set() extension function. To output the text value of a simple #PCDATA element, use <xsl:value-of> in preference to <xsl:apply-templates>.
Commentary On Those Tips Use <xsl:number> when appropriate – I’m pretty sure that the cases where his comment applies aren’t found that often in the OT By all means, use xsl:key! This is probably where to find low-hanging fruit in speeding up the built-in stylesheets We can’t realistically avoid complex patterns in template rules, but it’s worth considering why he gave that advice Every <xsl:apply-templates/> runs through each child node For each child node, it has to run the test in the match in every one of the <xsl:template>s Each match test takes some amount of processing, and it runs for every node, so we’d like to minimize that If you can move processing to an xsl:choose or a moded template, then you only need to run those tests on a smaller subset of nodes
What is an XSLT Key? Somewhere on the top level of the stylesheet, you can use something like:<xsl:key name="mapTopics" match="//opentopic:map//*" use="@id" /> Then, later in your stylesheets, you can look up items with that key:select="key('mapTopics', $id)…" This lets you do the search once, instead of searching through opentopic:map elements many times.  Note that this is part of the code that had a 40% speedup in generating the TOC in a large book I’ll mention on the next slide, despite that <xsl:key name="mapTopics" match="/*/opentopic:map//*" use="@id" />would have been much more efficient.
More On Slow XSLT Consider what’s inside a loop Example: If you have a template, and the template defines a variable: <xsl:variable name=“topicrefs” select=“//*[contains(@class, ‘ map/topicref ‘)]”/> (This isn’t a good idea to start with because of //) This variable will have the same value every time So why not only construct it once? Move it out of the template and make it a global variable One customer speeded up TOC generation by around 40% on a huge book
PDF Stylesheet Development Tips Not a general performance issue, but a timesaver for stylesheet developers If, like us, you need to repeatedly tweak a stylesheet and test the tweak, but each test is slow First, try directly editing the topic.fo file and view it, before you change the stylesheet, so you won’t have to run the OT at all Second, you can configure the toolkit to have another Ant “target” – simply run your DITA once, and after that, let the toolkit start the PDF stylesheets from the files in the temp directory, skipping the earlier processing Contact us for more information – we don’t have a nicely packaged version of this yet, but we can give you the pieces
Questions? Any questions? Be in touch!Aryeh Sandersaryehs@suite-sol.com

Más contenido relacionado

La actualidad más candente

course slides -- powerpoint
course slides -- powerpointcourse slides -- powerpoint
course slides -- powerpoint
webhostingguy
 
Open Source Package Php Mysql 1228203701094763 9
Open Source Package Php Mysql 1228203701094763 9Open Source Package Php Mysql 1228203701094763 9
Open Source Package Php Mysql 1228203701094763 9
isadorta
 
Advanced PHP: Design Patterns - Dennis-Jan Broerse
Advanced PHP: Design Patterns - Dennis-Jan BroerseAdvanced PHP: Design Patterns - Dennis-Jan Broerse
Advanced PHP: Design Patterns - Dennis-Jan Broerse
dpc
 
P H P Part I I, By Kian
P H P  Part  I I,  By  KianP H P  Part  I I,  By  Kian
P H P Part I I, By Kian
phelios
 

La actualidad más candente (20)

Object Oriented Design Patterns for PHP
Object Oriented Design Patterns for PHPObject Oriented Design Patterns for PHP
Object Oriented Design Patterns for PHP
 
XML and Web Services with PHP5 and PEAR
XML and Web Services with PHP5 and PEARXML and Web Services with PHP5 and PEAR
XML and Web Services with PHP5 and PEAR
 
Developing Plugins
Developing PluginsDeveloping Plugins
Developing Plugins
 
The Big Documentation Extravaganza
The Big Documentation ExtravaganzaThe Big Documentation Extravaganza
The Big Documentation Extravaganza
 
Component and Event-Driven Architectures in PHP
Component and Event-Driven Architectures in PHPComponent and Event-Driven Architectures in PHP
Component and Event-Driven Architectures in PHP
 
XML Transformations With PHP
XML Transformations With PHPXML Transformations With PHP
XML Transformations With PHP
 
Go OO! - Real-life Design Patterns in PHP 5
Go OO! - Real-life Design Patterns in PHP 5Go OO! - Real-life Design Patterns in PHP 5
Go OO! - Real-life Design Patterns in PHP 5
 
PEAR For The Masses
PEAR For The MassesPEAR For The Masses
PEAR For The Masses
 
Session Server - Maintaing State between several Servers
Session Server - Maintaing State between several ServersSession Server - Maintaing State between several Servers
Session Server - Maintaing State between several Servers
 
XML and PHP 5
XML and PHP 5XML and PHP 5
XML and PHP 5
 
course slides -- powerpoint
course slides -- powerpointcourse slides -- powerpoint
course slides -- powerpoint
 
Open Power Template 2 presentation
Open Power Template 2 presentationOpen Power Template 2 presentation
Open Power Template 2 presentation
 
PHP MySQL
PHP MySQLPHP MySQL
PHP MySQL
 
Open Source Package Php Mysql 1228203701094763 9
Open Source Package Php Mysql 1228203701094763 9Open Source Package Php Mysql 1228203701094763 9
Open Source Package Php Mysql 1228203701094763 9
 
Project Automation
Project AutomationProject Automation
Project Automation
 
Standards For Java Coding
Standards For Java CodingStandards For Java Coding
Standards For Java Coding
 
AD215 - Practical Magic with DXL
AD215 - Practical Magic with DXLAD215 - Practical Magic with DXL
AD215 - Practical Magic with DXL
 
Advanced PHP: Design Patterns - Dennis-Jan Broerse
Advanced PHP: Design Patterns - Dennis-Jan BroerseAdvanced PHP: Design Patterns - Dennis-Jan Broerse
Advanced PHP: Design Patterns - Dennis-Jan Broerse
 
Dxl As A Lotus Domino Integration Tool
Dxl As A Lotus Domino Integration ToolDxl As A Lotus Domino Integration Tool
Dxl As A Lotus Domino Integration Tool
 
P H P Part I I, By Kian
P H P  Part  I I,  By  KianP H P  Part  I I,  By  Kian
P H P Part I I, By Kian
 

Similar a Ot performance webinar

Article link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docxArticle link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docx
fredharris32
 
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdfDatabase & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
InSync2011
 
Capacity Management from Flickr
Capacity Management from FlickrCapacity Management from Flickr
Capacity Management from Flickr
xlight
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
supertom
 
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud
 

Similar a Ot performance webinar (20)

Practical catalyst
Practical catalystPractical catalyst
Practical catalyst
 
Article link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docxArticle link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docx
 
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
 
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdfDatabase & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5
 
Capacity Management from Flickr
Capacity Management from FlickrCapacity Management from Flickr
Capacity Management from Flickr
 
How Xslate Works
How Xslate WorksHow Xslate Works
How Xslate Works
 
Performant Django - Ara Anjargolian
Performant Django - Ara AnjargolianPerformant Django - Ara Anjargolian
Performant Django - Ara Anjargolian
 
#GDC15 Code Clinic
#GDC15 Code Clinic#GDC15 Code Clinic
#GDC15 Code Clinic
 
Graylog Engineering - Design Your Architecture
Graylog Engineering - Design Your ArchitectureGraylog Engineering - Design Your Architecture
Graylog Engineering - Design Your Architecture
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
 
Database Performance Tuning
Database Performance Tuning Database Performance Tuning
Database Performance Tuning
 
Oracle Sql Tuning
Oracle Sql TuningOracle Sql Tuning
Oracle Sql Tuning
 
pm1
pm1pm1
pm1
 
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
 
Best practices in Java
Best practices in JavaBest practices in Java
Best practices in Java
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
 
Entenda de onde vem toda a potência do Intel® Xeon Phi™
Entenda de onde vem toda a potência do Intel® Xeon Phi™ Entenda de onde vem toda a potência do Intel® Xeon Phi™
Entenda de onde vem toda a potência do Intel® Xeon Phi™
 
Sge
SgeSge
Sge
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!
 

Más de Suite Solutions

DITA Quick Start for Authors Part II
DITA Quick Start for Authors Part IIDITA Quick Start for Authors Part II
DITA Quick Start for Authors Part II
Suite Solutions
 
Dita ot pipeline webinar
Dita ot pipeline webinarDita ot pipeline webinar
Dita ot pipeline webinar
Suite Solutions
 
CustomizingStyleSheetsForHTMLOutputs
CustomizingStyleSheetsForHTMLOutputsCustomizingStyleSheetsForHTMLOutputs
CustomizingStyleSheetsForHTMLOutputs
Suite Solutions
 
StrategiesForUsingMetadata
StrategiesForUsingMetadataStrategiesForUsingMetadata
StrategiesForUsingMetadata
Suite Solutions
 

Más de Suite Solutions (20)

SuiteHelp 4.0: Latest Features in Enterprise Webhelp
SuiteHelp 4.0: Latest Features in Enterprise WebhelpSuiteHelp 4.0: Latest Features in Enterprise Webhelp
SuiteHelp 4.0: Latest Features in Enterprise Webhelp
 
Moving your Organization up the Knowledge Value Chain (Proposal for Lavacon 2...
Moving your Organization up the Knowledge Value Chain (Proposal for Lavacon 2...Moving your Organization up the Knowledge Value Chain (Proposal for Lavacon 2...
Moving your Organization up the Knowledge Value Chain (Proposal for Lavacon 2...
 
Increasing Findability with Subject Schemes (Advanced DITA Webinar)
Increasing Findability with Subject Schemes (Advanced DITA Webinar)Increasing Findability with Subject Schemes (Advanced DITA Webinar)
Increasing Findability with Subject Schemes (Advanced DITA Webinar)
 
SuiteHelp 3.2.5 Latest Features
SuiteHelp 3.2.5 Latest FeaturesSuiteHelp 3.2.5 Latest Features
SuiteHelp 3.2.5 Latest Features
 
Using Taxonomy for Customer-centric Dynamic Publishing
Using Taxonomy for Customer-centric Dynamic PublishingUsing Taxonomy for Customer-centric Dynamic Publishing
Using Taxonomy for Customer-centric Dynamic Publishing
 
DITA Quick Start Webinar: Defining Your Style Sheet Requirements
DITA Quick Start Webinar: Defining Your Style Sheet RequirementsDITA Quick Start Webinar: Defining Your Style Sheet Requirements
DITA Quick Start Webinar: Defining Your Style Sheet Requirements
 
DITA Quick Start Webinar Series: Building a Project Plan
DITA Quick Start Webinar Series: Building a Project PlanDITA Quick Start Webinar Series: Building a Project Plan
DITA Quick Start Webinar Series: Building a Project Plan
 
DITA Quick Start Webinar Series: Building a Project Plan
DITA Quick Start Webinar Series: Building a Project PlanDITA Quick Start Webinar Series: Building a Project Plan
DITA Quick Start Webinar Series: Building a Project Plan
 
DITA Quick Start: System Architecture of a Basic DITA Toolset
DITA Quick Start: System Architecture of a Basic DITA ToolsetDITA Quick Start: System Architecture of a Basic DITA Toolset
DITA Quick Start: System Architecture of a Basic DITA Toolset
 
DITA Quick Start Webinar Series: Getting Started with the DITA Open Toolkit
DITA Quick Start Webinar Series: Getting Started with the DITA Open ToolkitDITA Quick Start Webinar Series: Getting Started with the DITA Open Toolkit
DITA Quick Start Webinar Series: Getting Started with the DITA Open Toolkit
 
DITA Quick Start Webinar Series: Getting Started with Information Architecture
DITA Quick Start Webinar Series: Getting Started with Information ArchitectureDITA Quick Start Webinar Series: Getting Started with Information Architecture
DITA Quick Start Webinar Series: Getting Started with Information Architecture
 
Introduction to S1000D
Introduction to S1000DIntroduction to S1000D
Introduction to S1000D
 
DITA Quick Start for Authors Part II
DITA Quick Start for Authors Part IIDITA Quick Start for Authors Part II
DITA Quick Start for Authors Part II
 
DITA Quick Start for Authors - Part I
DITA Quick Start for Authors - Part IDITA Quick Start for Authors - Part I
DITA Quick Start for Authors - Part I
 
Suite Labs: Generating SuiteHelp Output
Suite Labs: Generating SuiteHelp OutputSuite Labs: Generating SuiteHelp Output
Suite Labs: Generating SuiteHelp Output
 
Overview of SuiteHelp 3.1 for DITA
Overview of SuiteHelp 3.1 for DITAOverview of SuiteHelp 3.1 for DITA
Overview of SuiteHelp 3.1 for DITA
 
Svg and graphics
Svg and graphicsSvg and graphics
Svg and graphics
 
Dita ot pipeline webinar
Dita ot pipeline webinarDita ot pipeline webinar
Dita ot pipeline webinar
 
CustomizingStyleSheetsForHTMLOutputs
CustomizingStyleSheetsForHTMLOutputsCustomizingStyleSheetsForHTMLOutputs
CustomizingStyleSheetsForHTMLOutputs
 
StrategiesForUsingMetadata
StrategiesForUsingMetadataStrategiesForUsingMetadata
StrategiesForUsingMetadata
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Ot performance webinar

  • 1. Speeding Up Your DITA-OT Processing Aryeh Sanders, Suite Solutions
  • 2. Who Are We? Our Mission To increase our customers’ profitability by significantly improving the efficiency of their information development and delivery processes. Qualitative Advantage Content Lifecycle Implementation (CLI) is Suite Solutions’ comprehensive approach – from concept to publication – to maximizing the value of your information assets. Our professionals are with you at every phase, determining, recommending and implementing the most cost-effective, flexible and long term solution for your business.
  • 3. Clients and Partners 3 Private and Confidential Suite Solutions©2009
  • 4. Introduction Performance in the DITA-OT “No Silver Bullet” Design of the DITA-OT puts limits on performance without a redesign Some of which is underway Performance relative to what? Try to examine needs to figure out which performance issues should be tackled and which can be ignored No hard and fast rules Performance can be assessed only with your data, in your environment Measurement
  • 5. Overview Overview of the webinar Performance Pain Points in the DITA-OT Hardware and Software Changes for Performance Memory Settings for Java Stylesheet Performance and Code Changes
  • 6. Performance Issues With the DITA-OT The DITA-OT sacrifices speed for simplicity Constructed as a pipeline of transformations, each step of which does one thing Each step must at least reparse the DITA files Each read of a DITA file with DOCTYPE reparsed the DTDs Now it doesn’t – Eliot Kimber added a patch to cache the DTDs Best takeaway from this talk – upgrade to a version with this patch - 1.5.1 XSLT High level language, far removed from the practicalities of performance Often, the easiest way to do something is XSLT involves repeated searches through the document
  • 7. Importance of Measurement A Case Study Since the DITA-OT writes many files repeatedly, we have to wait for the hard disk to complete the write, even to temporary files where long term integrity isn’t that important. This certainly holds up processing, right? Test: Stop those writes ImBench – ramdisk tool Create a temporary disk in memory and use that as the temp directory Now, no writes have to wait for the disk Run the OT 20 times with the same data I used a slightly complicated map (98 pages on output) 41.1 seconds average with disk vs. 39.1 seconds in memory For most people, not worth it; on the other hand, saves 5% of the time
  • 8. Hardware Issues Anecdotal: I’ve run the same data and stylesheets on my laptop, and on a client’s server 10 minutes on the server vs. 1.5 minutes on the laptop And it’s not a new laptop Since the DITA-OT is doing a lot of processing, it’s worth using a machine that’s capable of reasonable performance Measure! But a modern low-end $250 Dell desktop is about as fast as my laptop Don’t throw it on an old computer and then make people wait Make sure there’s one core free to run the OT so it doesn’t have to compete with other processes
  • 9. Hardware Issues (2) Make sure there’s enough memory Very workload dependent For very large workloads (roughly > 600 pages, or > 1000 topics), consider a 64-bit machine with a 64-bit JVM Eliot Kimber is working on a patch to pass the right memory parameters to the OT – if this is an issue, check the developer mailing list or contact him If there’s not enough physicalmemory, you can get thrashing JVM memory on next slide
  • 10. Memory Once you have enough, it won’t help to have extra Slightly surprising to me, but I tested at least one data set -Xmx tells Java the maximum heap size The reason this is slightly surprising is that before Java gives up, it will try garbage collection Frequent garbage collection can be slow Possibly the OT doesn’t tend to release memory Some datasets run out of memory, then the standard advice is to set reloadstylesheets=“true” Slows down processing, since stylesheets are re-read Much better to figure out how to give the OT enough memory if possible One customer solved their memory issues with JRockit as JVM
  • 11.
  • 12. Profiling Good idea, many commercial tools Oxygen, StylusStudio, fancier editions of Visual Studio Essentially another example of measurement to find the real pain points Not always necessary if the pain points are evident
  • 13. XSLT Performance (2) XPath tends to have one line requests, but that one line can hide a lot of computation What needs to happen to process this?preceding-sibling::*[following-sibling::*[contains(@class, ‘ topic/ul ‘)]] Preceding-sibling has to check each previous sibling For each one, following-sibling has to check every following-sibling And contains() itself can’t be that efficient because it needs to hunt within @class for ‘ topic/ul ‘ Some numbers: Let’s look at 100 nodes, and let’s pretend that there is no topic/ul, so the test never succeeds. Let’s run this test on all 100 nodes in sequence We could do the math, but it’s easier to write a program
  • 14. XSLT Performance Example (Calculated in Perl, sorry) for $a (1..100) { #for each of our 100 nodes for $b (1..$a-1) { #look at the preceding-siblings for $c ($b+1..100) { #look at the following-sibling of each of those $contains++; #and call contains() } } } print $contains, ""; Running this tells us there are 328350 (!) calls to contains() Of course, with 10 nodes, there are only 285 calls, but the point remains – one line in XSLT might be doing a LOT of computation
  • 15. Tips From Mike Kay Eight tips for how to write efficient XSLT: Avoid repeated use of "//item". Don't evaluate the same node-set more than once; save it in a variable. Avoid <xsl:number> if you can. For example, by using position(). Use <xsl:key>, for example to solve grouping problems. Avoid complex patterns in template rules. Instead, use <xsl:choose> within the rule. Be careful when using the preceding[-sibling] or following[-sibling] axes. This often indicates an algorithm with n-squared performance. Don't sort the same node-set more than once. If necessary, save it as a result tree fragment and access it using the node-set() extension function. To output the text value of a simple #PCDATA element, use <xsl:value-of> in preference to <xsl:apply-templates>.
  • 16. Commentary On Those Tips Use <xsl:number> when appropriate – I’m pretty sure that the cases where his comment applies aren’t found that often in the OT By all means, use xsl:key! This is probably where to find low-hanging fruit in speeding up the built-in stylesheets We can’t realistically avoid complex patterns in template rules, but it’s worth considering why he gave that advice Every <xsl:apply-templates/> runs through each child node For each child node, it has to run the test in the match in every one of the <xsl:template>s Each match test takes some amount of processing, and it runs for every node, so we’d like to minimize that If you can move processing to an xsl:choose or a moded template, then you only need to run those tests on a smaller subset of nodes
  • 17. What is an XSLT Key? Somewhere on the top level of the stylesheet, you can use something like:<xsl:key name="mapTopics" match="//opentopic:map//*" use="@id" /> Then, later in your stylesheets, you can look up items with that key:select="key('mapTopics', $id)…" This lets you do the search once, instead of searching through opentopic:map elements many times. Note that this is part of the code that had a 40% speedup in generating the TOC in a large book I’ll mention on the next slide, despite that <xsl:key name="mapTopics" match="/*/opentopic:map//*" use="@id" />would have been much more efficient.
  • 18. More On Slow XSLT Consider what’s inside a loop Example: If you have a template, and the template defines a variable: <xsl:variable name=“topicrefs” select=“//*[contains(@class, ‘ map/topicref ‘)]”/> (This isn’t a good idea to start with because of //) This variable will have the same value every time So why not only construct it once? Move it out of the template and make it a global variable One customer speeded up TOC generation by around 40% on a huge book
  • 19. PDF Stylesheet Development Tips Not a general performance issue, but a timesaver for stylesheet developers If, like us, you need to repeatedly tweak a stylesheet and test the tweak, but each test is slow First, try directly editing the topic.fo file and view it, before you change the stylesheet, so you won’t have to run the OT at all Second, you can configure the toolkit to have another Ant “target” – simply run your DITA once, and after that, let the toolkit start the PDF stylesheets from the files in the temp directory, skipping the earlier processing Contact us for more information – we don’t have a nicely packaged version of this yet, but we can give you the pieces
  • 20. Questions? Any questions? Be in touch!Aryeh Sandersaryehs@suite-sol.com