2. XML and Other Markup Languages SGML (1973) HTML (1989) XML (1996) “ XML has several favorable attributes that distinguish it from other competing technologies. Programmers find XML easy to learn because it is human-readable . The downside, however, is that an XML document needs to be parsed for it to become machine-readable.” Ref: XML on a Chip? “ A specially prepared document for Sun Microsystem by XimpleWare [ 6/9/2003 ]“
3.
4.
5. Typical XML Processing Symantic Analysis Parsing input XML Output XML Ref: XML Document Parsing: Operational and Performance Characteristics Tak Cheung Lam and Jianxun Jason Ding (Cisco Systems) Jyh-Charn Liu, Texas A&M University
6. Typical XML Processing Parsing Access Modification Serialization input XML Output XML Ref: XML Document Parsing: Operational and Performance Characteristics Tak Cheung Lam and Jianxun Jason Ding (Cisco Systems) Jyh-Charn Liu, Texas A&M University Symantic Analysis
7. Typical XML Processing Parsing Access Modification Serialization input XML Output XML Ref: XML Document Parsing: Operational and Performance Characteristics Tak Cheung Lam and Jianxun Jason Ding (Cisco Systems) Jyh-Charn Liu, Texas A&M University Performance Bottleneck Symantic Analysis
8. Typical XML Processing Parsing Access Modification Serialization input XML Output XML Ref: XML Document Parsing: Operational and Performance Characteristics Tak Cheung Lam and Jianxun Jason Ding (Cisco Systems) Jyh-Charn Liu, Texas A&M University Performance Bottleneck Performance affected by parsing models Symantic Analysis
9. Steps in Parsing Parsing Ref: XML Document Parsing: Operational and Performance Characteristics Tak Cheung Lam and Jianxun Jason Ding (Cisco Systems) Jyh-Charn Liu, Texas A&M University Character Conversion Lexical Analysis (FSM) Syntactic Analysis (PDA) Bit Sequence 36 61 3E Character Sequence ‘ <‘ ‘a’ ‘>’ Token Sequence (‘<a>’ ‘X’ ‘</a>’) Data Representation (tree, event, integer array)
10. Steps in Parsing Parsing Ref: XML Document Parsing: Operational and Performance Characteristics Tak Cheung Lam and Jianxun Jason Ding (Cisco Systems) Jyh-Charn Liu, Texas A&M University Character Conversion Lexical Analysis (FSM) Syntactic Analysis (PDA) Bit Sequence 36 61 3E Character Sequence ‘ <‘ ‘a’ ‘>’ Token Sequence (‘<a>’ ‘X’ ‘</a>’) Data Representation (tree, event, integer array) Invariant among different parsing models
11. Steps in Parsing Parsing Ref: XML Document Parsing: Operational and Performance Characteristics Tak Cheung Lam and Jianxun Jason Ding (Cisco Systems) Jyh-Charn Liu, Texas A&M University Character Conversion Lexical Analysis (FSM) Syntactic Analysis (PDA) Bit Sequence 36 61 3E Character Sequence ‘ <‘ ‘a’ ‘>’ Token Sequence (‘<a>’ ‘X’ ‘</a>’) Data Representation (tree, event, integer array) PARSING MODEL DEPENDENT Invariant among different parsing models Different among different parsing models
12. Xml Processing: DOM & SAX or StAX Ref: XML Document Parsing: Operational and Performance Characteristics Tak Cheung Lam and Jianxun Jason Ding (Cisco Systems) Jyh-Charn Liu, Texas A&M University
13.
14.
15.
16.
17.
18. VTD: inside VTD record Ref: XML Document Parsing: Operational and Performance Characteristics Tak Cheung Lam and Jianxun Jason Ding (Cisco Systems) Jyh-Charn Liu, Texas A&M University
19. Xml Processing: VTD Ref: XML Document Parsing: Operational and Performance Characteristics Tak Cheung Lam and Jianxun Jason Ding (Cisco Systems) Jyh-Charn Liu, Texas A&M University
21. VTD-XML Resolving child elements using Location Cache. Image: http://vtd-xml.sourceforge.net/technical/2.html
22.
23.
24.
25.
26.
27.
28.
29.
30. VTD on Android Platform Ref: Analyzing XML Parsers Performance for Android Platform M V Uttam Tej ,Dhanaraj Cheelu, M.Rajasekhara Babu, P Venkata Krishna SCSE, VIT University, Vellore, Tamil Nadu
31. Ref: XML Document Parsing: Operational and Performance Characteristics Tak Cheung Lam and Jianxun Jason Ding (Cisco Systems) Jyh-Charn Liu, Texas A&M University
32. Comparisons (contd.) Ref: XML Document Parsing: Operational and Performance Characteristics Tak Cheung Lam and Jianxun Jason Ding (Cisco Systems) Jyh-Charn Liu, Texas A&M University
33. Comparisons (contd.) Ref: XML Document Parsing: Operational and Performance Characteristics Tak Cheung Lam and Jianxun Jason Ding (Cisco Systems) Jyh-Charn Liu, Texas A&M University
34.
35. Parallel Approach to XML Parsing A Parallel Approach to XML Parsing Wei Lu, Kenneth Chiu, Yinfei Pan
36. Parallel Approach to XML Parsing (cont.) A Parallel Approach to XML Parsing Wei Lu, Kenneth Chiu, Yinfei Pan
37. Limitations of PXP “ First, the skeleton requires extra memory that is proportional to the number of node in the DOM tree. Further, the partitioning scheme based on subtrees can cause load imbalance on processing cores for XML documents with irregular or deep tree structures (e.g., TREEBANK with parts-of-speech tagging [29]). This scheme severely limits the granularity of parallelism that can be achieved, and thus cannot scale with increasing core count.” Ref: 2.2 PriorWork on Parallel XML Parsing “ A Data Parallel Algorithm for XML DOM Parsing” Bhavik Shah 1 , Praveen R. Rao 1 , and Bongki Moon 2 and Mohan Rajagopalan 3 1 University of Missouri-Kansas City 2 University of Arizona 3 Intel Research Labs
38. ParDOM Ref: “ A Data Parallel Algorithm for XML DOM Parsing” Bhavik Shah 1 , Praveen R. Rao 1 , and Bongki Moon 2 and Mohan Rajagopalan 3 1 University of Missouri-Kansas City 2 University of Arizona 3 Intel Research Labs
39. ParDOM (contd) Ref: “ A Data Parallel Algorithm for XML DOM Parsing” Bhavik Shah 1 , Praveen R. Rao 1 , and Bongki Moon 2 and Mohan Rajagopalan 3 1 University of Missouri-Kansas City 2 University of Arizona 3 Intel Research Labs
40. ParDOM (contd) Ref: “ A Data Parallel Algorithm for XML DOM Parsing” Bhavik Shah 1 , Praveen R. Rao 1 , and Bongki Moon 2 and Mohan Rajagopalan 3 1 University of Missouri-Kansas City 2 University of Arizona 3 Intel Research Labs