SlideShare a Scribd company logo
1 of 40
xproc.xq
Jim Fuller, MarkLogic 2013
Architecture of an xproc processor
http://exslt.org
http://www.xmlprague.cz
http://jim.fuller.name
@xquery
@perl6
Perlmonks
Pilgrim
XSLT UK 2001
Senior engineer
Overview
• Why XProc?
• xproc.xq project
• xproc.xq Architecture
• Points of Interests
• Summary
XProc Overview
Xproc Goals
• The language must be expressed as declarative XML and be rich enough to
address practical interoperability concerns but also beconcise
• The language must allow the inputs, outputs, and other parameters of a
components to be specified with information passed between steps using
XML
• The language must define the basic minimal set of processing options and
associated error reporting options required to achieve interoperability.
• Given a set of components and a set of documents, the language must allow
the order of processing to be specified.
• Agnostic in terms of parallel, serial or streaming processing
• The model should be extensible enough so that applications can define new
processes and make them a component in a pipeline.
• The model could allow iteration and conditional processing which also allow
selection of different components as a function of run-time evaluation.
Make xml pipelines
Xproc Refresher
<p:pipeline version=“1.0” name="main">
<p:count/>
</p:pipeline>
(<root/>,<root/>,<test/>)
<c:result>3</c:result>
Show simple xproc
That’s not quite the whole story …
<p:declare-step version='1.0’ name="main">
<p:input port="source"/>
<p:output port="result”/>
<p:count name=”step1”/>
</p:declare-step>
Really, not at all…
<p:declare-step name="main"
xmlns:p="http://www.w3.org/ns/xproc" version="1.0">
<p:input port="source"/>
<p:output port="result">
<p:pipe step="step1" port="result"/>
</p:output>
<p:count name="step1">
<p:input port="source">
<p:pipe step="main" port="source"/>
</p:input>
</p:count>
</p:declare-step>
Why Xproc ?
• ‘pipes’ are a natural idiom for xml processing
• Flow based versus FSM (draw diagram)
• State is the enemy of dynamic computation
• Complex workflow still possible but YAGNI
• Main control loop
Current news
• http://mesonet.info/
• http://code.google.com/p/daisy-
pipeline/wiki/XProcOverview-
• http://balisage.net/Proceedings/vol8/html/
Williams01/BalisageVol8-Williams01.html
• https://github.com/gimsieke/epubcheck-
xproc
• https://github.com/josteinaj/xprocspec
Current news
W3C XML Processing WG working on Xproc vnext
1. Improve ease of use (syntactic improvements)
2. Increase the scope for working with non XML
content
3. Address known shortcomings in the language
4. Improve relationship with streaming and parallel
processing
Fix params, non xml doc processing, drop
Xpath 1.0, let options+variables contain
fragments, allow AVT
xproc.xq
xproc.xq Project
• Xproc processor built with Xquery 3.0 on MarkLogic
• Github Project - https://github.com/xquery/xproc.xq
• Build/test system
– xray
– run w3c unit test suite
• dist layout
– compact
– extensible
• Xquery entry points
– flags
Architecture
• Parse: consume and parse Xproc pipeline
• Dynamic evaluation: runtime engine
• Serializer: output results
Parsing
Decorated pipeline
demo
Static analysis - unordered
<p:declare-step version='1.0' name="main">
<p:input port="source"/>
<p:output port="result">
<p:pipe step="i1" port="result"/>
</p:output>
<p:identity>
<p:input port="source">
<p:pipe step="main" port="source"/>
</p:input>
</p:identity>
<p:identity name="i3"/>
<p:identity name="i1">
<p:input port="source">
<p:pipe step="i3" port="result"/>
</p:input>
</p:identity>
</p:declare-step>
Static analysis - ordered
<p:declare-step version='1.0' name="main">
<p:input port="source"/>
<p:output port="result">
<p:pipe step="i1" port="result"/>
</p:output>
<p:identity>
<p:input port="source">
<p:pipe step="main" port="source"/>
</p:input>
</p:identity>
<p:identity name="i3"/>
<p:identity name="i1">
<p:input port="source">
<p:pipe step="i3" port="result"/>
</p:input>
</p:identity>
</p:declare-step>
Runtime
Runtime
TIMECHECK
Pipeline decomposition
Pipeline decomposition
Pipeline decomposition
Serializer
File Layout
XRAY testing
points of interest
Xquery 3.0 to the rescue
• Using a Reducer, such as left-fold(), in combination with
dynamic function calls underpin the heart of xproc.xq
dynamic evaluation engine.
• XQuery 3.0 annotations feature is employed to identify in
the codebase step functions, making it straightforward to
author new steps in pure XQuery.
• The choice of the 'flow' work flow model is a perfect match
for a functional programming language which has functions
as first class citizens. All step inputs and outputs are written
once and never mutated thereafter. Changing state 'in-
place' is destructive and can represent a loss of fidelity.
Steps with XSLT
• ‘XSLT's polymorphism and dynamic dispatch
makes static analysis difficult.’ – Mkay 2009
• Spent many years pipelining XSLT
• XProc dependency on XSLT match patterns
combined with the fact that many of the steps
lent themselves to implementation using XSLT
v2.0,
BYOSR
demo
morefun with a fold engine
• Graph out steps
• Journaling/Logging … etc (inject in ML
properties)
• Architectural side effects = powerful
runtime idiom
Extensibility and reuse
• Create new step libs at xproc level
• Easily create custom xproc libs from
xquery
• Use steps in your own xquery programs as
functions
Create new extension step
1. Add src/steps/ext.xqy
2. Add src/extensions/pipeline-extensions.xml
Invoke xproc step in XQuery
xquery version "3.0";
import module namespace std = "http://xproc.net/xproc/std" at
"/xquery/steps/std.xqy";
declare namespace p="http://www.w3.org/ns/xproc";
declare namespace xproc = "http://xproc.net/xproc";
std:count( (<test><a>test</a></test>, <test/>),
(),
<xproc:options>
<p:with-option name="limit" select="1000"/>
</xproc:options>,
())
summary
Review
• XProc has many favorable characteristics
• XProc getting better
• xproc.xq will get better
The Future
• Support other xquery engines (Saxon …)
• Deeper integration, better compliance
• Analyze performance in database
• CXAN integration
References
[apache-lic-v2] [Apache v2 License] Apache v2.0 License - http://www.apache.org/licenses/LICENSE-2.0.html
[xml-calabash] [XML Calabash] Norm Walsh's XProc processor XML Calabash - http://xmlcalabash.com/
[EXIST] eXist XML Database - http://exist.sourceforge.net
[fsm] [Finite State Machine] Finite State Machine (FSM) entry at wikipedia - https://en.wikipedia.org/wiki/Finite- state_machine
[saxon] [SAXON] Michael Kay's XSLT & XQUERY Processor - http://www.saxonica.com [transform-xq] [transform.xq] John Snelson's transform.xq -
https://github.com/jpcs/transform.xq [MarkLogic] MarkLogic - http://www.marklogic.com
[kay2010] [Kay2010] Kay, Michael. “A Streaming XSLT Processor.” Presented at Balisage: The Markup Conference 2010, Montréal, Canada, August 3 - 6, 2010. In
Proceedings of Balisage: The Markup Conference 2010. Balisage Series on Markup Technologies, vol. 5 (2010). doi:10.4242/BalisageVol5.Kay01. -
http://www.balisage.net/Proceedings/vol5/html/Kay01/ BalisageVol5-Kay01.html
[xmlprocwg] [XMLPROCWG] XML Processing Working Group - http://www.w3.org/XML/Processing/
[xml-catalog] [XML-CATALOG] XML catalog - http://en.wikipedia.org/wiki/XML_Catalog
[xproc-spec] [XProc] XProc: An XML Pipeline LanguageW3C Recommendation 11 May 2010 - http:// www.w3.org/TR/xproc
[xproc-use-case-note] [XProc Requirements] XProc XML Processing Model Requirements W3C Working Group Note 05 April 2004 - http://www.w3.org/TR/2004/NOTE-
proc-model-req-20040405/
29xproc.xq - Architecture of an XProc processor
[xproc-member-submission-xpl] [XPL] XML Pipeline Language (XPL) Version 1.0 (Draft) W3C Member Submission 11 April 2005 - http://www.w3.org/Submission/xpl/
[xproc-use-cases] [XProc Use Cases] XProc Use Cases - http://www.w3.org/TR/xproc-requirements/
[xprov-vnext-requirements] [XProc vnext] XProc vnext language requirements - http://www.w3.org/XML/XProc/ docs/langreq-v2.html
[known-impl] [XProc Implementations] Known Implementations - http://xproc.org/implementations/
[XPROC-PARALLELISM] [XPROC- PARALLELISM] XProc specification H. Sequential Steps, parallelism, and side-effects - http://www.w3.org/TR/xproc/#parallelism
[XPROC-TEST-SUITE] [XPROC-TEST-SUITE] XProc Test Suite - http://tests.xproc.org/
[xray] [XRAY] - Rob Whitby's XRay - https://github.com/robwhitby/xray
[xproc-system-properties] [XPROC-SYSTEM-PROPERTIES] XProc system properties - http://www.w3.org/TR/ xproc/#f.system-property
[xslt-2] [XSLT 2.0] XSL Transformations (XSLT) Version 2.0. Michael Kay, editor. W3C Recommendation. 23 January 2007. - http://www.w3.org/TR/xslt20/
[xproc.xq] [xproc.xq project] xproc.xq github - https://github.com/xquery/xproc.xq
[zergaoui2009] [Zergaoui2009] Mohamed Zergaoui. Memory management in streaming: Buffering, lookahead, or none. Which to choose? Int Symp on Processing XML
Efficiently. 10 Aug 2009, Montreal, Canada. Balisage Series on Markup Technologies, vol. 4 (2009). doi: 10.4242/BalisageVol4.Zergaoui02. - http://
www.balisage.net/Proceedings/vol4/html/Zergaoui02/BalisageVol4-Zergaoui02.html

More Related Content

Viewers also liked

XML Prague 2005 EXSLT
XML Prague 2005 EXSLTXML Prague 2005 EXSLT
XML Prague 2005 EXSLTjimfuller2009
 
Family Learning Centers @ Sjpl Final
Family Learning Centers @ Sjpl FinalFamily Learning Centers @ Sjpl Final
Family Learning Centers @ Sjpl FinalMTominaga
 
Scary story 13
Scary story 13Scary story 13
Scary story 13ferr2
 
Book club master slides
Book club master slidesBook club master slides
Book club master slidesjseher
 
Book club master slides
Book club master slidesBook club master slides
Book club master slidesjseher
 
Smart Presentation
Smart PresentationSmart Presentation
Smart Presentationjfmahon
 

Viewers also liked (7)

XML Prague 2005 EXSLT
XML Prague 2005 EXSLTXML Prague 2005 EXSLT
XML Prague 2005 EXSLT
 
Family Learning Centers @ Sjpl Final
Family Learning Centers @ Sjpl FinalFamily Learning Centers @ Sjpl Final
Family Learning Centers @ Sjpl Final
 
Scary story 13
Scary story 13Scary story 13
Scary story 13
 
Book club master slides
Book club master slidesBook club master slides
Book club master slides
 
Book club master slides
Book club master slidesBook club master slides
Book club master slides
 
Smart Presentation
Smart PresentationSmart Presentation
Smart Presentation
 
Introduction To Blackboard
Introduction To BlackboardIntroduction To Blackboard
Introduction To Blackboard
 

Similar to XML London 2013 - Architecture of xproc.xq an XProc processor

XML Amsterdam 2012 Keynote
XML Amsterdam 2012 KeynoteXML Amsterdam 2012 Keynote
XML Amsterdam 2012 Keynotejimfuller2009
 
Balisage - EXPath Packaging
Balisage - EXPath PackagingBalisage - EXPath Packaging
Balisage - EXPath PackagingFlorent Georges
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogJoe Stein
 
Distributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaDistributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaMax Alexejev
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...C4Media
 
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...Timothy McPhillips
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningGuido Schmutz
 
XForms workshop slides
XForms workshop slidesXForms workshop slides
XForms workshop slidesewg118
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceAntonio García-Domínguez
 
Collector Web Services
Collector Web ServicesCollector Web Services
Collector Web Servicespublisyst
 
Application Integration with XProc
Application Integration with XProcApplication Integration with XProc
Application Integration with XProcVojtech Toman
 
Balisage - EXPath - A practical introduction
Balisage - EXPath - A practical introductionBalisage - EXPath - A practical introduction
Balisage - EXPath - A practical introductionFlorent Georges
 
EXPath: the packaging system and the webapp framework
EXPath: the packaging system and the webapp frameworkEXPath: the packaging system and the webapp framework
EXPath: the packaging system and the webapp frameworkFlorent Georges
 
20090925 HTML5の過去、現在、未来
20090925 HTML5の過去、現在、未来20090925 HTML5の過去、現在、未来
20090925 HTML5の過去、現在、未来Takeo Kunishima
 
201511 - Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...
201511 -  Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...201511 -  Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...
201511 - Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...Symphony Software Foundation
 
Extending XForms with Server-Side Functionality
Extending XForms with Server-Side FunctionalityExtending XForms with Server-Side Functionality
Extending XForms with Server-Side FunctionalityMarkku Laine
 

Similar to XML London 2013 - Architecture of xproc.xq an XProc processor (20)

XML Pipelines
XML PipelinesXML Pipelines
XML Pipelines
 
XML Amsterdam 2012 Keynote
XML Amsterdam 2012 KeynoteXML Amsterdam 2012 Keynote
XML Amsterdam 2012 Keynote
 
Balisage - EXPath Packaging
Balisage - EXPath PackagingBalisage - EXPath Packaging
Balisage - EXPath Packaging
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Distributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaDistributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and Scala
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
 
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 
Linq To XML Overview
Linq To XML OverviewLinq To XML Overview
Linq To XML Overview
 
XForms workshop slides
XForms workshop slidesXForms workshop slides
XForms workshop slides
 
XML Converters
XML ConvertersXML Converters
XML Converters
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
 
Collector Web Services
Collector Web ServicesCollector Web Services
Collector Web Services
 
Application Integration with XProc
Application Integration with XProcApplication Integration with XProc
Application Integration with XProc
 
Balisage - EXPath - A practical introduction
Balisage - EXPath - A practical introductionBalisage - EXPath - A practical introduction
Balisage - EXPath - A practical introduction
 
EXPath: the packaging system and the webapp framework
EXPath: the packaging system and the webapp frameworkEXPath: the packaging system and the webapp framework
EXPath: the packaging system and the webapp framework
 
20090925 HTML5の過去、現在、未来
20090925 HTML5の過去、現在、未来20090925 HTML5の過去、現在、未来
20090925 HTML5の過去、現在、未来
 
Java one2013
Java one2013Java one2013
Java one2013
 
201511 - Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...
201511 -  Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...201511 -  Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...
201511 - Alfresco Day - Platform Update and Roadmap - Gabriele Columbro - Bo...
 
Extending XForms with Server-Side Functionality
Extending XForms with Server-Side FunctionalityExtending XForms with Server-Side Functionality
Extending XForms with Server-Side Functionality
 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

XML London 2013 - Architecture of xproc.xq an XProc processor

  • 1. xproc.xq Jim Fuller, MarkLogic 2013 Architecture of an xproc processor
  • 3. Overview • Why XProc? • xproc.xq project • xproc.xq Architecture • Points of Interests • Summary
  • 5. Xproc Goals • The language must be expressed as declarative XML and be rich enough to address practical interoperability concerns but also beconcise • The language must allow the inputs, outputs, and other parameters of a components to be specified with information passed between steps using XML • The language must define the basic minimal set of processing options and associated error reporting options required to achieve interoperability. • Given a set of components and a set of documents, the language must allow the order of processing to be specified. • Agnostic in terms of parallel, serial or streaming processing • The model should be extensible enough so that applications can define new processes and make them a component in a pipeline. • The model could allow iteration and conditional processing which also allow selection of different components as a function of run-time evaluation. Make xml pipelines
  • 6. Xproc Refresher <p:pipeline version=“1.0” name="main"> <p:count/> </p:pipeline> (<root/>,<root/>,<test/>) <c:result>3</c:result>
  • 8. That’s not quite the whole story … <p:declare-step version='1.0’ name="main"> <p:input port="source"/> <p:output port="result”/> <p:count name=”step1”/> </p:declare-step>
  • 9. Really, not at all… <p:declare-step name="main" xmlns:p="http://www.w3.org/ns/xproc" version="1.0"> <p:input port="source"/> <p:output port="result"> <p:pipe step="step1" port="result"/> </p:output> <p:count name="step1"> <p:input port="source"> <p:pipe step="main" port="source"/> </p:input> </p:count> </p:declare-step>
  • 10. Why Xproc ? • ‘pipes’ are a natural idiom for xml processing • Flow based versus FSM (draw diagram) • State is the enemy of dynamic computation • Complex workflow still possible but YAGNI • Main control loop
  • 11. Current news • http://mesonet.info/ • http://code.google.com/p/daisy- pipeline/wiki/XProcOverview- • http://balisage.net/Proceedings/vol8/html/ Williams01/BalisageVol8-Williams01.html • https://github.com/gimsieke/epubcheck- xproc • https://github.com/josteinaj/xprocspec
  • 12. Current news W3C XML Processing WG working on Xproc vnext 1. Improve ease of use (syntactic improvements) 2. Increase the scope for working with non XML content 3. Address known shortcomings in the language 4. Improve relationship with streaming and parallel processing Fix params, non xml doc processing, drop Xpath 1.0, let options+variables contain fragments, allow AVT
  • 14. xproc.xq Project • Xproc processor built with Xquery 3.0 on MarkLogic • Github Project - https://github.com/xquery/xproc.xq • Build/test system – xray – run w3c unit test suite • dist layout – compact – extensible • Xquery entry points – flags
  • 15. Architecture • Parse: consume and parse Xproc pipeline • Dynamic evaluation: runtime engine • Serializer: output results
  • 18. Static analysis - unordered <p:declare-step version='1.0' name="main"> <p:input port="source"/> <p:output port="result"> <p:pipe step="i1" port="result"/> </p:output> <p:identity> <p:input port="source"> <p:pipe step="main" port="source"/> </p:input> </p:identity> <p:identity name="i3"/> <p:identity name="i1"> <p:input port="source"> <p:pipe step="i3" port="result"/> </p:input> </p:identity> </p:declare-step>
  • 19. Static analysis - ordered <p:declare-step version='1.0' name="main"> <p:input port="source"/> <p:output port="result"> <p:pipe step="i1" port="result"/> </p:output> <p:identity> <p:input port="source"> <p:pipe step="main" port="source"/> </p:input> </p:identity> <p:identity name="i3"/> <p:identity name="i1"> <p:input port="source"> <p:pipe step="i3" port="result"/> </p:input> </p:identity> </p:declare-step>
  • 30. Xquery 3.0 to the rescue • Using a Reducer, such as left-fold(), in combination with dynamic function calls underpin the heart of xproc.xq dynamic evaluation engine. • XQuery 3.0 annotations feature is employed to identify in the codebase step functions, making it straightforward to author new steps in pure XQuery. • The choice of the 'flow' work flow model is a perfect match for a functional programming language which has functions as first class citizens. All step inputs and outputs are written once and never mutated thereafter. Changing state 'in- place' is destructive and can represent a loss of fidelity.
  • 31. Steps with XSLT • ‘XSLT's polymorphism and dynamic dispatch makes static analysis difficult.’ – Mkay 2009 • Spent many years pipelining XSLT • XProc dependency on XSLT match patterns combined with the fact that many of the steps lent themselves to implementation using XSLT v2.0,
  • 33. morefun with a fold engine • Graph out steps • Journaling/Logging … etc (inject in ML properties) • Architectural side effects = powerful runtime idiom
  • 34. Extensibility and reuse • Create new step libs at xproc level • Easily create custom xproc libs from xquery • Use steps in your own xquery programs as functions
  • 35. Create new extension step 1. Add src/steps/ext.xqy 2. Add src/extensions/pipeline-extensions.xml
  • 36. Invoke xproc step in XQuery xquery version "3.0"; import module namespace std = "http://xproc.net/xproc/std" at "/xquery/steps/std.xqy"; declare namespace p="http://www.w3.org/ns/xproc"; declare namespace xproc = "http://xproc.net/xproc"; std:count( (<test><a>test</a></test>, <test/>), (), <xproc:options> <p:with-option name="limit" select="1000"/> </xproc:options>, ())
  • 38. Review • XProc has many favorable characteristics • XProc getting better • xproc.xq will get better
  • 39. The Future • Support other xquery engines (Saxon …) • Deeper integration, better compliance • Analyze performance in database • CXAN integration
  • 40. References [apache-lic-v2] [Apache v2 License] Apache v2.0 License - http://www.apache.org/licenses/LICENSE-2.0.html [xml-calabash] [XML Calabash] Norm Walsh's XProc processor XML Calabash - http://xmlcalabash.com/ [EXIST] eXist XML Database - http://exist.sourceforge.net [fsm] [Finite State Machine] Finite State Machine (FSM) entry at wikipedia - https://en.wikipedia.org/wiki/Finite- state_machine [saxon] [SAXON] Michael Kay's XSLT & XQUERY Processor - http://www.saxonica.com [transform-xq] [transform.xq] John Snelson's transform.xq - https://github.com/jpcs/transform.xq [MarkLogic] MarkLogic - http://www.marklogic.com [kay2010] [Kay2010] Kay, Michael. “A Streaming XSLT Processor.” Presented at Balisage: The Markup Conference 2010, Montréal, Canada, August 3 - 6, 2010. In Proceedings of Balisage: The Markup Conference 2010. Balisage Series on Markup Technologies, vol. 5 (2010). doi:10.4242/BalisageVol5.Kay01. - http://www.balisage.net/Proceedings/vol5/html/Kay01/ BalisageVol5-Kay01.html [xmlprocwg] [XMLPROCWG] XML Processing Working Group - http://www.w3.org/XML/Processing/ [xml-catalog] [XML-CATALOG] XML catalog - http://en.wikipedia.org/wiki/XML_Catalog [xproc-spec] [XProc] XProc: An XML Pipeline LanguageW3C Recommendation 11 May 2010 - http:// www.w3.org/TR/xproc [xproc-use-case-note] [XProc Requirements] XProc XML Processing Model Requirements W3C Working Group Note 05 April 2004 - http://www.w3.org/TR/2004/NOTE- proc-model-req-20040405/ 29xproc.xq - Architecture of an XProc processor [xproc-member-submission-xpl] [XPL] XML Pipeline Language (XPL) Version 1.0 (Draft) W3C Member Submission 11 April 2005 - http://www.w3.org/Submission/xpl/ [xproc-use-cases] [XProc Use Cases] XProc Use Cases - http://www.w3.org/TR/xproc-requirements/ [xprov-vnext-requirements] [XProc vnext] XProc vnext language requirements - http://www.w3.org/XML/XProc/ docs/langreq-v2.html [known-impl] [XProc Implementations] Known Implementations - http://xproc.org/implementations/ [XPROC-PARALLELISM] [XPROC- PARALLELISM] XProc specification H. Sequential Steps, parallelism, and side-effects - http://www.w3.org/TR/xproc/#parallelism [XPROC-TEST-SUITE] [XPROC-TEST-SUITE] XProc Test Suite - http://tests.xproc.org/ [xray] [XRAY] - Rob Whitby's XRay - https://github.com/robwhitby/xray [xproc-system-properties] [XPROC-SYSTEM-PROPERTIES] XProc system properties - http://www.w3.org/TR/ xproc/#f.system-property [xslt-2] [XSLT 2.0] XSL Transformations (XSLT) Version 2.0. Michael Kay, editor. W3C Recommendation. 23 January 2007. - http://www.w3.org/TR/xslt20/ [xproc.xq] [xproc.xq project] xproc.xq github - https://github.com/xquery/xproc.xq [zergaoui2009] [Zergaoui2009] Mohamed Zergaoui. Memory management in streaming: Buffering, lookahead, or none. Which to choose? Int Symp on Processing XML Efficiently. 10 Aug 2009, Montreal, Canada. Balisage Series on Markup Technologies, vol. 4 (2009). doi: 10.4242/BalisageVol4.Zergaoui02. - http:// www.balisage.net/Proceedings/vol4/html/Zergaoui02/BalisageVol4-Zergaoui02.html

Editor's Notes

  1. static analysis- consume and parse the XProc pipeline, generating a runnable representation of said pipelinedynamic evaluation- engine that dynamically evaluates the runnable pipeline representation serialization- output interim results for further processing or final results
  2. Michael Kay noted in a paper ‘you push, ill pull’ that XSLT polymorphism and dynamic dispatch make analysis difficult
  3. Bring your own steprunner
  4. In a lot of data systems normalisation happens on data entry, Map reduce algorithm defers this cost at mapping time …