Automating Google Workspace (GWS) & more with Apps Script
Design Concepts For Xml Applications That Will Perform
1. Oracle XML Database Design Concepts for XML Applications That Will Perform ! Marco Gralike, AMIS, 2009
2. Started as DBA with Oracle 7 on Windows NT 3.1 (1994) Experienced with Oracle 7.x / 8.x / 9.x / 10.x and 11.1 Oracle 11g Beta tester for Oracle XMLDB Active Oracle OTN XMLDB Forum Member Oracle ACE Award for XMLDB Community Contributions OakTable Network member Introductions
3. Or a short story “Why XML on Diskcan be faster than XML in Memory…”
4. Disclaimer The following are “Rules of Numb” Bare in mind: Every environment has its own unique criteria and needs regarding business needs and its architecture, etc… “Maintainability” “Extendibility” …so pay attention to: “Choice” “Design” “Testing” “Performance”
6. Initial State No performance 12.000 “Cases” / night (4 Hour Window) 4 hours are not enough anymore The “XML” part “looks like it takes too long” Original database system version 8.1.X Future Wishes The need to be able to handle 120.000 “Cases” / night In the near future hardware/OS from OpenVMS to HPUX Customer Case
7. An overview Memory / DOM Memory / DOM CLOB Oracle Advanced Queue XMLType BLOB Process Checks Validation XML Schema (JAVA) Store in ETL Tables Oracle Workflow Shred Elements Via XMLDOM
12. Feeding data to the database Why BLOB ? XML data & PDF data Why CLOB ? Conversion needed for XML handling Why XMLType Needed to check XML element content XML Validation (well-formedness) Memory / DOM CLOB Oracle Advanced Queue XMLType BLOB
13. Different data models. XPath models an XML document as a tree while most general purpose programming languages have no native data types for a tree. Different programming paradigms. XSLT is a functional language, while Java is object-oriented and Perl is a procedural one. Effect/Costs Unnecessary CPU and Memory Overhead A lot of expensive type and encoding conversions Impedance Mismatch
14. If you deal with XML Handle it via XML(DB) So if it is relational, do it the relational way… If XML use XQuery, or others like XPath etc… If you mix worlds be careful regarding Information loss (PK/FK XML) ? Whitespace NULL Whitespace ? Impedance mismatch The General Rule !
17. Validation on content and structure XML Schema Validation on XML structure PL/SQL Wrapper with JAVA XML Parser Memory / DOM Validation XML Schema ( JAVA based) XMLType Shred Elements via XMLDOM Process Checks
19. XML Parsers Often DOM or Infoset based CPU intensive Memory intensive Parsing, serializing or tree traversals, happen in memory Often handle XML tree traversals only via ONE method It is not structured, semi-structured or unstructured XML content aware It is not very “smart” / “content aware” regarding XML handling based on its XML tree’s and/or XML data content
20. XML Schema will be parsed only once XML Schema will be cached in memory No additional parsing No additional validation XML Document structure is known, therefore: No parsing is needed when loaded from disk into memory XML Object (XOB) structures can be applied Memory footprint is much less compared to DOM structure Needed specific nodes can now be handled efficiently in memory XML Schema Registration Advantages
21. XML Schema based - Query Rewrite String CHAR bookstore String VARCHAR2 (20) Float CLOB book whitepaper title author author chapter title author id paragraph NUMBER (15) content content
22. Checked on XML Well-Formedness One root element Begin & End tags If XML Schema reference XOB methods will be used if an XML Schema is available DOM methods will be used if an XML Schema information is not available XMLType – Not just a “Datatype”
24. Keep XML small ! Do not use / enforce Pretty Print if not needed Avoid namespace reference “Overkill” Most used Namespace is Leading Use short Namespace References Make XML data as “sparse” as possible <employee><name>Marco</name></employee> <employee name=“Marco”/> XML Data Partitioning Binary XML if possible Y X
25. XML Design Avoid Cyclic References in XML Schemata For ease of Maintenance: xdb:annotations Is DOM validation, fidelity needed ? CPU: XML parsing- XML Schema validation “overhead” ? Index maintenance overhead, if implemented via disk Y X
29. Think in “3D” or in “Driving Table” terms maxoccurs=“unbounded” Give me the <title> and <content> where <content> contains… 1 3 4 5 2 X Y 6 Z x n rows
33. Effect of // In memory 10.000 Cases: ORA-31186 Document contains too many nodes maxoccurs=unbounded maxLength, totalDigits, etc Increasing volume – XMLType CLOB ORA-31186: Document contains too many nodes Cause: Unable to load the document because it has exceeded the maximum allocated number of DOM nodes. Action: Reduces the size of the document
35. A Solution based on XMLType O.R. Rewrite on Disk / XOB (Relational) CLOB Oracle Advanced Queue BLOB Store in ETL Tables Oracle Workflow Validation Against XML Schema Checks XMLType Table (O.R)
36. Driving Access on CONTENT (11gR1, on Disk) BTree Index BTree Index BTree Index bookstore Secondary Oracle Text Index Function based Index (XPath) BTree Index book whitepaper Unstructured XMLIndex title author author chapter title author id paragraph content structured content
37. Can be influenced via Statistics Indexes XML Schema Registration (XOB) Encoding in Binary XML storage SQL Re-Write of XPath, XQuery Partitioning Cost Based Optimizer Advantages
39. So why can DISK out perform MEMORY XML Schema validation based on Registered XML Schema Query re-write possible Based on plain “old” SQL/database methods Optimized CPU handling Optimized Memory handling (if needed) Multiple optimized solutions possible via Optimizer instead of one XML parser method Specific parts of XML can be handled / be driven via: specific indexing or content Full blown validation can be avoided
41. Be aware of what you are doing ! Avoid unneeded (full) XML Schema validation During Insert Generating XML Avoid Impedance mismatch Java XML Java XML Relational XML Java “All In One Go Objective” Avoid intermediate XML fragments // XMLEXISTS Use Indexes xdb:MaintainDOM=false Y X
42. XML Data Handling and Design Handle XML Smart Keep XML Small Restrict XML where possible Be precise ! maxoccurs, maxLength Provide Oracle of extra / precise information (XSD) Register XML Schema If possible… Y X
53. References XMLDB DevelopersGuide http://www.oracle.com/pls/db112/homepage The XMLDB Forum http://forums.oracle.com/forums/forum.jspa?forumID=34 XML DB FAQ Thread http://forums.oracle.com/forums/thread.jspa?threadID=410714 Blog http://technology.amis.nl/blog http://blog.gralike.com