The mature XSLT developer has an inner seeing about how a stylesheet works that can seem almost mystical to an outsider. But demystification is possible using an XSLT visualizer, making the structure of a transformation visible. Due to its functional nature, XSLT is particularly well-suited to software visualization, because an XSLT transformation can be represented and viewed as a static dataset. A subset of XSLT visualization (using a “trace-enabled” stylesheet to generate representations of transformation relationships) was used to empower non-programming staff to predict, understand, and manipulate content enrichment rules. We would like to generalize these case-specific techniques into a general tool for XSLT. There are challenges including scalability (memory usage), what to visualize and what not to, avoiding noise for the user, and whether to store annotations externally or within the result document.
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
The Mystical Principles of XSLT: Enlightenment through Software Visualization
1. Evan Lenz, President
August 2, 2016
The Mystical Principles of XSLT
Enlightenment through Software Visualization
2. 2
Grokking XSLT
Easy to learn just enough to be dangerous
− Lots of terrible code in the wild
Understanding template rules is elusive
− Not what procedural programmers expect
− Lots of explanations that are partial, imprecise, or ambiguous
XSLT really isn’t that difficult a language
− But only if you grok template rules
A lot that’s invisible
5. By analogy to OO
From XSLT 1.0 Pocket Reference:
− http://lenzconsulting.com/how-xslt-works/#applying_template_rules
• “In object-oriented (OO) terms, xsl:apply-templates is like a function that
iterates over a list of objects (nodes) and, for each object, calls the same
polymorphic function. Each template rule in your stylesheet defines a different
implementation of that single polymorphic function. Which implementation is
chosen depends on the runtime characteristics of the object (node). Loosely
speaking, you define all the potential bindings by associating a “type” (pattern)
with each implementation (template rule).”
− http://lenzconsulting.com/how-xslt-works/#modes
• “Furthering the OO analogy, a mode name identifies which polymorphic
function to execute repeatedly in a call to xsl:apply-templates. When
mode="foo" is set, foo acts as the name of a polymorphic function, and each
template rule with mode="foo" defines an implementation of the foo ‘function.’”
5
6. By analogy to other instructions
This:
<xsl:apply-templates select="animal"/>
…
<xsl:template match="animal[@type eq 'cat']">…</xsl:template>
<xsl:template match="animal[@type eq 'dog']">…</xsl:template>
is (more or less) equivalent to this:
<xsl:for-each select="animal">
<xsl:choose>
<xsl:when test="@type eq 'cat'">…</xsl:when>
<xsl:when test="@type eq 'dog'">…</xsl:when>
</xsl:choose>
</xsl:for-each>
6
7. De-mystifying XSLT wizardry
Mysticism:
− “a theory postulating the possibility of direct and intuitive
acquisition of ineffable knowledge or power”
In relation to XSLT:
− Belief in the possibility that there is a way to make XSLT’s
behavior more directly and intuitively understandable
• By making implicit behavior explicit and the abstract concrete, through an
interactive visualization
7
8. Elusiveness of the task
Alternating between faith and doubt that what I’m
imagining is possible or useful
Grand schemes that collapse into themselves
Making abstract ideas concrete
− Shapes in my mind converted to shapes on the screen
Disillusionment
− Finding out that certain ideas weren’t well-formed
− The crucible of incarnation
The feeling that there’s nothing really “there”
8
9. Like a debugger?
Debugging is like the scientific method
− The debugger is the instrument of measurement
− Start with a hypothesis
− Set:
• breakpoints
• watch variables
− One step at a time, careful not to “step over” when we want to
“step into”—otherwise we’ve missed our chance and have to start
all over (because we can’t go backwards in time)
9
10. Not like a debugger
Software visualization for XSLT:
− A holistic experience, rather than a narrow line of inquiry
− A space in which to freely explore, in any direction
− Not bound by our previous steps
− No need to start with a question
− Just start playing around and see what we will see
10
11. Not like a debugger
Sri Aurobindo on gnostic knowledge:
− “For while the reason [like a debugger] proceeds from moment to
moment of time and loses and acquires and again loses and
again acquires, the gnosis [or visualization tool] dominates time
in a one view and perpetual power and links past, present and
future in their indivisible connections, in a single continuous map
of knowledge, side by side.” (The Synthesis of Yoga, p. 464)
11
12. Escaping the bounds of time
A transformation is fundamentally:
− a data set
− not a process
XSLT’s functional nature
− More like an abstract sculpture, showing relationships
− Less like a movie
Time is just one way to traverse the relationships:
− we can go forward and backward in time (result document order)
− we can step outside of time
− we can traverse the relationships using a different dimension
12
14. Transformation as microcosm
“The Big Bang”
− the initial context node
“The Engine of Creation”
− template rules
“Emergence/Manifestation”
− the result tree
14
15. Scope and method
Scope:
− Template rules (and maybe xsl:for-each)
• XSLT’s core processing model
− XPath, though foundational, is out of scope
Method:
− Explode a transformation into all its parts
− Put them back together
15
16. Fundamental unit: the focus
As defined in the XSLT Recommendation, consists of:
− the context item (.),
− the context position (position()), and
− the context size (last()).
16
17. An expanded definition of focus
We will define “focus” as:
− the particular instance of a focus
− i.e. each instantiation of a focus-changed sequence constructor
• body of <xsl:template>, body of <xsl:for-each>, etc.
Then we can consider the shallow result chunk implied by
a focus to be an intrinsic property of that focus
− 1:1 relationship
17
18. For example
Given a single instantiation of this rule:
<xsl:template match="heading">
<title>
<xsl:value-of select="."/>
</title>
</xsl:template>
We could represent the “focus” like this:
<trace:focus context-id="n1a833c7b3a005151"
context-position="1"
context-size="1"
rule-id="nfa49863d62353482"
invocation-id="bc08a736-13f7-e97b-a272">
<title>This is the title</title>
</trace:focus>
18
19. For example
Given a single instantiation of this rule:
<xsl:template match="heading">
<title>
<xsl:value-of select="."/>
</title>
</xsl:template>
We could represent the “focus” like this:
<trace:focus context-id="n1a833c7b3a005151"
context-position="1"
context-size="1"
rule-id="nfa49863d62353482"
invocation-id="bc08a736-13f7-e97b-a272">
<title>This is the title</title>
</trace:focus>
19
Cosmic address
(GUID) for the
current invocation
of <xsl:apply-
templates/>
20. For example
Given a single instantiation of this rule:
<xsl:template match="heading">
<title>
<xsl:value-of select="."/>
</title>
</xsl:template>
We could represent the “focus” like this:
<trace:focus context-id="n1a833c7b3a005151"
context-position="1"
context-size="1"
rule-id="nfa49863d62353482"
invocation-id="bc08a736-13f7-e97b-a272">
<title>This is the title</title>
</trace:focus>
20
How many
nodes are in the
list being
processed by
that invocation
21. For example
Given a single instantiation of this rule:
<xsl:template match="heading">
<title>
<xsl:value-of select="."/>
</title>
</xsl:template>
We could represent the “focus” like this:
<trace:focus context-id="n1a833c7b3a005151"
context-position="1"
context-size="1"
rule-id="nfa49863d62353482"
invocation-id="bc08a736-13f7-e97b-a272">
<title>This is the title</title>
</trace:focus>
21
In other words:
how many foci
belong to the
current
invocation
22. For example
Given a single instantiation of this rule:
<xsl:template match="heading">
<title>
<xsl:value-of select="."/>
</title>
</xsl:template>
We could represent the “focus” like this:
<trace:focus context-id="n1a833c7b3a005151"
context-position="1"
context-size="1"
rule-id="nfa49863d62353482"
invocation-id="bc08a736-13f7-e97b-a272">
<title>This is the title</title>
</trace:focus>
22
The position of
the current
focus in that list
23. For example
Given a single instantiation of this rule:
<xsl:template match="heading">
<title>
<xsl:value-of select="."/>
</title>
</xsl:template>
We could represent the “focus” like this:
<trace:focus context-id="n1a833c7b3a005151"
context-position="1"
context-size="1"
rule-id="nfa49863d62353482"
invocation-id="bc08a736-13f7-e97b-a272">
<title>This is the title</title>
</trace:focus>
23
generated ID of
the node being
processed (a
<heading>
element)
Cosmic address
(generated ID) of
the node being
processed
(a <heading>
element)
24. For example
Given a single instantiation of this rule:
<xsl:template match="heading">
<title>
<xsl:value-of select="."/>
</title>
</xsl:template>
We could represent the “focus” like this:
<trace:focus context-id="n1a833c7b3a005151"
context-position="1"
context-size="1"
rule-id="nfa49863d62353482"
invocation-id="bc08a736-13f7-e97b-a272">
<title>This is the title</title>
</trace:focus>
24
generated ID of
the node being
processed (a
<heading>
element)
Cosmic address
(generated ID)
of the matching
template rule
in the stylesheet
25. For example
Given a single instantiation of this rule:
<xsl:template match="heading">
<title>
<xsl:value-of select="."/>
</title>
</xsl:template>
We could represent the “focus” like this:
<trace:focus context-id="n1a833c7b3a005151"
context-position="1"
context-size="1"
rule-id="nfa49863d62353482"
invocation-id="bc08a736-13f7-e97b-a272">
<title>This is the title</title>
</trace:focus>
25
The shallow
result chunk that
this focus creates
26. For example
Given a single instantiation of this rule:
<xsl:template match="heading">
<title>
<xsl:value-of select="."/>
</title>
</xsl:template>
We could represent the “focus” like this:
<trace:focus context-id="n1a833c7b3a005151"
context-position="1"
context-size="1"
rule-id="nfa49863d62353482"
invocation-id="bc08a736-13f7-e97b-a272">
<title>This is the title</title>
</trace:focus>
26
The shallow
result chunk that
this focus creates
27. Why it’s called a “shallow” result chunk
It doesn’t include the results of nested invocations:
<trace:focus context-id="n292122aca28a94ca"
context-position="1"
context-size="1"
rule-id="n8ef8c66cac078ff"
invocation-id="4dd7580b-d44e-a0e4-714d">
<html>
<head>
<trace:invocation id="bc08a736-13f7-e97b-a272"/>
</head>
<body>
<trace:invocation id="6129c19f-2632-e99c-53b5"/>
</body>
</html>
</trace:focus>
27
28. Why it’s called a “shallow” result chunk
It doesn’t include the results of nested invocations:
<trace:focus context-id="n292122aca28a94ca"
context-position="1"
context-size="1"
rule-id="n8ef8c66cac078ff"
invocation-id="4dd7580b-d44e-a0e4-714d">
<html>
<head>
<trace:invocation id="bc08a736-13f7-e97b-a272"/>
<!--invocation of: <xsl:apply-templates select="/doc/heading"/>-->
</head>
<body>
<trace:invocation id="6129c19f-2632-e99c-53b5"/>
<!--invocation of: <xsl:apply-templates select="/doc/para"/>-->
</body>
</html>
</trace:focus>
28
29. Why it’s called a “shallow” result chunk
It doesn’t include the results of nested invocations:
<trace:focus context-id="n292122aca28a94ca"
context-position="1"
context-size="1"
rule-id="n8ef8c66cac078ff"
invocation-id="4dd7580b-d44e-a0e4-714d">
<html>
<head>
<trace:invocation id="bc08a736-13f7-e97b-a272"/>
<!--invocation of: <xsl:apply-templates select="/doc/heading"/>-->
</head>
<body>
<trace:invocation id="6129c19f-2632-e99c-53b5"/>
<!--invocation of: <xsl:apply-templates select="/doc/para"/>-->
</body>
</html>
</trace:focus>
29
Links to zero or
more other foci
having the given
invocation ID
30. Mystical principle: To see is to create
Inner seeing results in outer manifestation
− They are not separate; they are two sides of the same coin.
To focus on the input is to create the output
− (as defined by the template rule)
They are inextricably linked, 1-to-1
− As defined here, the focus and the result chunk are intrinsic to
each other.
30
31. Mystical principle: All is accessible
Each focus is connected to every other focus via
successive invocation-id links
Every entity (node, rule, or focus) has a “cosmic address”
(GUID or generated node ID)
− Unique, globally accessible address within the world of the
transformation
You can begin anywhere in the world and traverse to
anywhere else in the world, including past, present, future
Nothing is lost through the passage of time
31
32. Mystical principle: Inner upholds outer
The result tree is not only isomorphic to the tree of foci
(the XSLT execution tree)
It is an adornment of that tree
− The result is the part we see
− The inner structure upholds the outer result
The visible world is a decoration of the invisible
− a manifestation of a deeper, unseen reality
32
33. Mystical principle: There is no separation
Distinctions are only by definition and thus arbitrary
A focus and result chunk are “intrinsic” to each other only
because we are looking at it that way
A focus is “extrinsic” to another focus only because we
defined it that way
Ultimately, there is only an unbroken whole
But:
− Division and distinction are functions we possess
− There is joy in the division and joy in the reunion
− That’s why we do it
33
35. Bringing this down to earth: a case study
Client project to rewrite an enrichment engine
Technical articles are automatically enriched with search
terms based on their content
− (using a taxonomy of terms and reverse queries in MarkLogic)
There are various business rules about what types of
article should be enriched with what types of terms
The rules need to evolve over time
Most importantly, the rule behavior needs to be
transparent to the analysts and taxonomy managers
35
36. XSLT pipeline for enrichment rules
For the rewritten engine’s enrichment rules, I decided to
use XSLT as the rules language (surprise, surprise!)
− declarative and powerful
− simplified by using a multi-stage pipeline
− good in conjunction with the modified identity transform
The article passes through unchanged, except that new
metadata (“enrichment”) is inserted into each section
Example stages:
36
− get-terms
− add-inherited
− remove-subsumed
− remove-excluded
− add-more-terms
− filter-terms
37. Distinction between “rules” and “engine”
Entire enrichment engine consists of XSLT template rules
− (and imported XQuery libraries)
But not all template rules are created equal
Some modes are “engine-level”
− not meant to be regularly customized
Some modes are “rules-level”
− direct implementations of the business rules
− meant to be customized to handle different content scenarios
The latter are the ones that need to be made transparent
37
38. Making the business rules transparent
Modes are grouped by pipeline stage
Each mode has a default behavior
− e.g. mode="terms" by default uses a reverse query to find all
matching terms
− can be overridden to provide a different behavior
• (such as to fix the term for a particular article type, regardless of the content)
The default rules don’t need to be shown
− Only the custom overrides
38
40. Trace-enabling the XSLT
Used XSLT to pre-process (“trace-enable”) the original
engine and rules XSLT
The trace-enabled stylesheet additionally generates:
− inline trace data
• <trace:match-start> and <trace:match-end> markers in the result tree
− out-of-band trace data
• using xdmp:set() in MarkLogic
The “rule-level” modes are annotated in the original
engine code
− so the trace-enabler knows which template rules to augment
− [show example, line 538 of engine.xsl]
− [also show trace data example in oXygen]
40
41. Other techniques used
Documenting the rules inline, so they get fed straight to
the tracer interface
Storing the trace data for each input document into the
(MarkLogic) database for faster subsequent renders
Automatically invalidating (and thus forcing re-generation
of) the cached trace data whenever a rule or data change
is detected
41
43. Characteristics to preserve
Colored lines to depict relationships
Visually represent XML as XML
Interactive slider for building/un-building the result
43
44. Difficulties
How to visualize temporary trees?
− i.e. results stored in <xsl:variable>
Modifying template rule results could break the stylesheet
− type errors
− node count dependencies, etc.
44
45. Solution: Store trace data out-of-band
Via side effects
− Write a document to a database, or
− Update a global variable
Benefits:
− avoid type errors
− avoid unexpected (altered) stylesheet behavior
− shredded trace data may suggest more scalable visualization
implementations
• (e.g. for large source documents)
45