SlideShare una empresa de Scribd logo
1 de 90
Creating Structure in
                 Unstructured Data
                What is possible, today…?



Marco Gralike
“Big Data” = XML ?
Challenges are!
Ahum, the problems are!
WikiPedia
• One string of XML data with
  structured and unstructured
  data sections
• Language: English
• Size      : 42,15 GB
• Pages     : 12.961.997
• Date      : 21 Dec 2012
Adventures into
the unknown…?
Setup
• VirtualBox VM
  – OEL 5U8 (64)
  – 8 GB RAM
• LaCie Little Big Disk
  – RAID 0
  – Thunderbolt
• Database
  – SGA    4GB
  – PGA    2GB
My new LaCie LBD is really fast - 
Defeat?! - 1.000.000 pages only
Status of Technology used
XML - Where are we…?




Gartner
Achieved…?
On the Horizon!
• JSoniq
• Zorba
Building (streaming) Bridges
Oracle XML DB
      • NO cost option
      • C (native / embedded kernel)
      • (XQuery) Standards
      • Code maintained by Oracle
XQuery

                                           XMLType Abstraction
                               DB XQuery                                                 Procedural XQuery

                     XQuery Rewrite                         Pushdown                XVM
                                                                           (use “no query rewrite”)


                                  Relational        Streaming XPath                             DOM Tree
                                                       Evaluation                                Model
                                   Access
       SQL Execution              Methods                                   XMLIndex




            Object-Relational                                             Binary XML


           Relational Storage                                            Secure Files

Source: S317428: Building Really Scalable XML Applications with Oracle XML DB and Oracle Text
So about what are we talking ?
WikiPedia
• Structured & Unstructured
  bits and pieces
• A lot of “unbounded”
  elements
• Not a lot of restrictions
• The bit with value is in
  element “tekst”
How do we get this Structured?
Strings = small & defined (12c?)

   Ename  pointer += 100;
<string1/><string2/><string3/>
Flexible, Humans
No Design Patterns
<small/><verybigggr/><bigger/>
<verybigggr>
       <empno>1</empno><ename>Marco</ename>
       <empno>2</empno>
</verybigggr>




 <small/><verybigggr/><bigger/>
We need options!
“XMLType” Container

  In Memory            CLOB
  (document)        (document)

Object Relational   Binary XML
     (data)            (data)
XMLType
      In Memory
      (document)


XOB          XML Schema
XMLType
   Binary XML Securefile
    (document/content)


Post Parse        LOB Index
XMLType
        Object Relational
           (content)


Fully Shredded        Indexes
Something else to Realize !
“What is the fastest way to get this
    stuff in the database…?”
“…it depends…”
“So what is the fastest way to get
    XML in the database…?”
“…it depends…”
“So what is the fastest way to get XML
           in the database…
    … and   useful in my case…?”
Garbage IN – Garbage OUT
WikiPedia
•   SQL*Loader
•   Parallel or Direct
•   Securefile LOB Column
•   2.5 hours

And no (performant) way
to get the details out…
a.k.a “completely useless”
WikiPedia
•   SQL*Loader
•   Parallel or Direct
•   Securefile Binary XML
•   …2.5 hours ???
XML Parsing




• SAX   - Simple API for XML
• DOM   - Document Object Module
fast

insert performance   CLOB



                               XMLType
                                CLOB

                       (domain) indexes

                                           XMLType
                                          Binary XML



                                                         XMLType
                                                       Object Relational




                                                                           fast
                             select performance
XML Partitioning
• Object Relational Partitioning
  – Equi-Partitioning since version Oracle 11.1.0.7.0
• Binary XML Partitioning
  – Range, List, Hash
• Local partitioned XMLIndex
  – LOCAL keyword in XMLIndex create syntax
• Partition Key on virtual column (Binary XML)
• Partition Key on column (Object Relational)
XMLType
   Binary XML Securefile
    (document/content)


Post Parse        LOB Index
Driving access on CONTENT
                                                   BTre
                                                    e
                                                  Index
                           bookstore
                                                                          Function
                                                                         based Index
                                                                           (XPath)
        book                                    whitepaper

title   author   author chapter         title     author          id     paragraph
            Unstructured
                                                          Structured XMLIndex
             XMLIndex
                            content                                       structured
                                                                           content
                                                          BTree
                           Oracle XML                     Index
                           Text Index
Structured Data
Structured XMLIndex (SXI)
• CONTENT TABLE(s)
• Based on XMLTABLE syntax        Structured
                                  XMLIndex
• XMLTable construct can be          f (x)

  nested:
  – VIRTUAL column alias
• Can be maintained manually
• Secondary indexes possible
                                   Content
                                   Tables
Describe CONTENT TABLE




• A “regular” heap table with columns…
• Ideal for secondary indexes, if needed.
CONTENT TABLE(s)

 Structured
 XMLIndex
    f (x)




  Content
  Tables
Semi-Structured Data
Unstructured XMLIndex (UXI)
• PATH TABLE
• Use Path Subsetting                 Unstructured
   – Full Blown XMLIndex can be BIG    XMLIndex
                                          f (x)
• Token Tables (XDB.X$......)
   – Query re-write on Tokens
   – Fuzzy Searches, //
   – Optimizer Statistics
• Can be maintained manually
   – Recorded in Pending Table
                                        Path Table
• Secondary indexes possible
Describe PATH TABLE
What’s hidden…
PATH TABLE

Unstructured
 XMLIndex
    f (x)




 Path Table
Binary XML – No Index
Binary XML + XMLIndex (SXI)
Binary XML + XMLIndex + Sec.Ind.
Binary XML + XMLIndex + Sec.Ind.
Un-Structured Data
XML Full Tekst Index
• Based on Oracle Text Index, XQuery Full Text
• XML Namespace Aware
• XML Semantic aware full text search
  – Full-Tekst Selection Expression – contains text
  – Logical Full Text Operator – ftor, ftand, ftMildNot
  – Context Aware full text search
Balanced Design
• Inserts, Updates & Deletes
  – XML Future Changes
  – Index Maintenance           In Memory   On Disk

• Selects
  – In Memory
  – Via Indexes
• XML Validation
  – Strict, Lazy
  – Client Side Possibilities
Reward
• Optimal performance
• Out performing XML
• Proper design will give
  performance increase over
  XML handling…


…proper design is still key…
References
Oracle XML DB
  – http://www.oracle.com/pls/db112/homepage
XML DB FAQ Thread
  – http://forums.oracle.com/forums/thread.jspa?thr
    eadID=410714
Personal Blog
  – http://www.xmldb.nl
  – http://technology.amis.nl
References
Daniela Florescu, Oracle Corporation
  Advances in XML and XQuery
Sam Idicula, Oracle XML DB Development Team
  Binary XML Storage and Query Processing in Oracle
Jinyu Wang, Scott Brewton
  Making XML Technology Easier to Use
Joel Spolsky - Joel on Software
  Back to Basics
References
Oracle XML DB Main page material
• Oracle XML DB : Best Practices to Get Optimal
  Performance out of XML Queries (PDF)
• Oracle XML DB : Choosing the Best XMLType
  Storage Option for Your Use Case (PDF)
• A Request for Comments for the Oracle Binary
  XML Format

Más contenido relacionado

La actualidad más candente

XFILES, The APEX 4 version - The truth is in there
XFILES, The APEX 4 version - The truth is in thereXFILES, The APEX 4 version - The truth is in there
XFILES, The APEX 4 version - The truth is in thereMarco Gralike
 
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...Marco Gralike
 
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1Marco Gralike
 
Design Concepts For Xml Applications That Will Perform
Design Concepts For Xml Applications That Will PerformDesign Concepts For Xml Applications That Will Perform
Design Concepts For Xml Applications That Will PerformMarco Gralike
 
Oracle Database 11g Release 2 - XMLDB New Features
Oracle Database 11g Release 2 - XMLDB New FeaturesOracle Database 11g Release 2 - XMLDB New Features
Oracle Database 11g Release 2 - XMLDB New FeaturesMarco Gralike
 
OakTable World 2015 - Using XMLType content with the Oracle In-Memory Column...
OakTable World 2015  - Using XMLType content with the Oracle In-Memory Column...OakTable World 2015  - Using XMLType content with the Oracle In-Memory Column...
OakTable World 2015 - Using XMLType content with the Oracle In-Memory Column...Marco Gralike
 
XML In The Real World - Use Cases For Oracle XMLDB
XML In The Real World - Use Cases For Oracle XMLDBXML In The Real World - Use Cases For Oracle XMLDB
XML In The Real World - Use Cases For Oracle XMLDBMarco Gralike
 
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
UKOUG Tech14 - Using Database In-Memory Column Store with Complex DatatypesUKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
UKOUG Tech14 - Using Database In-Memory Column Store with Complex DatatypesMarco Gralike
 
Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...
Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...
Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...Marco Gralike
 
Ordina Oracle Open World
Ordina Oracle Open WorldOrdina Oracle Open World
Ordina Oracle Open WorldMarco Gralike
 
Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2Marco Gralike
 
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...Marco Gralike
 
UKOUG Tech14 - Getting Started With JSON in the Database
UKOUG Tech14 - Getting Started With JSON in the DatabaseUKOUG Tech14 - Getting Started With JSON in the Database
UKOUG Tech14 - Getting Started With JSON in the DatabaseMarco Gralike
 
Jdbc 4.0 New Features And Enhancements
Jdbc 4.0 New Features And EnhancementsJdbc 4.0 New Features And Enhancements
Jdbc 4.0 New Features And Enhancementsscacharya
 
Database Programming
Database ProgrammingDatabase Programming
Database ProgrammingHenry Osborne
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Scott Leberknight
 

La actualidad más candente (20)

XFILES, The APEX 4 version - The truth is in there
XFILES, The APEX 4 version - The truth is in thereXFILES, The APEX 4 version - The truth is in there
XFILES, The APEX 4 version - The truth is in there
 
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
 
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
 
Design Concepts For Xml Applications That Will Perform
Design Concepts For Xml Applications That Will PerformDesign Concepts For Xml Applications That Will Perform
Design Concepts For Xml Applications That Will Perform
 
Oracle Database 11g Release 2 - XMLDB New Features
Oracle Database 11g Release 2 - XMLDB New FeaturesOracle Database 11g Release 2 - XMLDB New Features
Oracle Database 11g Release 2 - XMLDB New Features
 
OakTable World 2015 - Using XMLType content with the Oracle In-Memory Column...
OakTable World 2015  - Using XMLType content with the Oracle In-Memory Column...OakTable World 2015  - Using XMLType content with the Oracle In-Memory Column...
OakTable World 2015 - Using XMLType content with the Oracle In-Memory Column...
 
XML In The Real World - Use Cases For Oracle XMLDB
XML In The Real World - Use Cases For Oracle XMLDBXML In The Real World - Use Cases For Oracle XMLDB
XML In The Real World - Use Cases For Oracle XMLDB
 
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
UKOUG Tech14 - Using Database In-Memory Column Store with Complex DatatypesUKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
 
Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...
Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...
Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...
 
Ordina Oracle Open World
Ordina Oracle Open WorldOrdina Oracle Open World
Ordina Oracle Open World
 
Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2
 
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
 
UKOUG Tech14 - Getting Started With JSON in the Database
UKOUG Tech14 - Getting Started With JSON in the DatabaseUKOUG Tech14 - Getting Started With JSON in the Database
UKOUG Tech14 - Getting Started With JSON in the Database
 
Jdbc 4.0 New Features And Enhancements
Jdbc 4.0 New Features And EnhancementsJdbc 4.0 New Features And Enhancements
Jdbc 4.0 New Features And Enhancements
 
Xml parsers
Xml parsersXml parsers
Xml parsers
 
Xml processors
Xml processorsXml processors
Xml processors
 
Database Programming
Database ProgrammingDatabase Programming
Database Programming
 
Java XML Parsing
Java XML ParsingJava XML Parsing
Java XML Parsing
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0
 
Spring data jpa
Spring data jpaSpring data jpa
Spring data jpa
 

Destacado

Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...Peter Wren-Hilton
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016George Roth
 
Dealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityDealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityGreat Wide Open
 
Lecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data WarehouseLecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data Warehousephanleson
 
The Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the DataThe Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the DataHealth Catalyst
 
Unstructured Data in BI
Unstructured Data in BIUnstructured Data in BI
Unstructured Data in BIMonaheng Diaho
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarDatameer
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataSeth Grimes
 
Using Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data ManagementUsing Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data ManagementDataWorks Summit
 

Destacado (9)

Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016
 
Dealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityDealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to Infinity
 
Lecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data WarehouseLecture 11 Unstructured Data and the Data Warehouse
Lecture 11 Unstructured Data and the Data Warehouse
 
The Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the DataThe Analytic System: Finding Patterns in the Data
The Analytic System: Finding Patterns in the Data
 
Unstructured Data in BI
Unstructured Data in BIUnstructured Data in BI
Unstructured Data in BI
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop Webinar
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
 
Using Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data ManagementUsing Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data Management
 

Similar a Hotsos 2013 - Creating Structure in Unstructured Data

Expertezed 2012 Webcast - XML DB Use Cases
Expertezed 2012 Webcast - XML DB Use CasesExpertezed 2012 Webcast - XML DB Use Cases
Expertezed 2012 Webcast - XML DB Use CasesMarco Gralike
 
SQLPASS AD501-M XQuery MRys
SQLPASS AD501-M XQuery MRysSQLPASS AD501-M XQuery MRys
SQLPASS AD501-M XQuery MRysMichael Rys
 
Making your data work harder than you do
Making your data work harder than you doMaking your data work harder than you do
Making your data work harder than you doSusan Jane Williams
 
Extbase object to xml mapping
Extbase object to xml mappingExtbase object to xml mapping
Extbase object to xml mappingThomas Maroschik
 
Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...
Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...
Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...InSync2011
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyRobert Viseur
 
Easy Data Object Relational Mapping Tool
Easy Data Object Relational Mapping ToolEasy Data Object Relational Mapping Tool
Easy Data Object Relational Mapping ToolHasitha Guruge
 
OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...
OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...
OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...Dr.-Ing. Thomas Hartmann
 
XML-Extensible Markup Language
XML-Extensible Markup Language XML-Extensible Markup Language
XML-Extensible Markup Language Ann Joseph
 
Tech 802: Data, Databases & XML
Tech 802: Data, Databases & XMLTech 802: Data, Databases & XML
Tech 802: Data, Databases & XMLsomisguided
 
Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...
Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...
Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...Dr.-Ing. Thomas Hartmann
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopDmitry Kan
 

Similar a Hotsos 2013 - Creating Structure in Unstructured Data (20)

Expertezed 2012 Webcast - XML DB Use Cases
Expertezed 2012 Webcast - XML DB Use CasesExpertezed 2012 Webcast - XML DB Use Cases
Expertezed 2012 Webcast - XML DB Use Cases
 
Xml databases
Xml databasesXml databases
Xml databases
 
SQLPASS AD501-M XQuery MRys
SQLPASS AD501-M XQuery MRysSQLPASS AD501-M XQuery MRys
SQLPASS AD501-M XQuery MRys
 
Catmandu / LibreCat Project
Catmandu / LibreCat ProjectCatmandu / LibreCat Project
Catmandu / LibreCat Project
 
XML Technologies
XML TechnologiesXML Technologies
XML Technologies
 
Agile xml
Agile xmlAgile xml
Agile xml
 
Making your data work harder than you do
Making your data work harder than you doMaking your data work harder than you do
Making your data work harder than you do
 
Extbase object to xml mapping
Extbase object to xml mappingExtbase object to xml mapping
Extbase object to xml mapping
 
Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...
Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...
Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technology
 
Easy Data Object Relational Mapping Tool
Easy Data Object Relational Mapping ToolEasy Data Object Relational Mapping Tool
Easy Data Object Relational Mapping Tool
 
OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...
OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...
OCAS @ ISWC 2011 - Generic Multilevel Approach Designing Domain Ontologies Ba...
 
XML-Extensible Markup Language
XML-Extensible Markup Language XML-Extensible Markup Language
XML-Extensible Markup Language
 
XML
XMLXML
XML
 
Tech 802: Data, Databases & XML
Tech 802: Data, Databases & XMLTech 802: Data, Databases & XML
Tech 802: Data, Databases & XML
 
XML
XMLXML
XML
 
XMl
XMlXMl
XMl
 
Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...
Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...
Workshop on Semantic Statistics - Generic Multilevel Approach Designing Domai...
 
Unit iv xml dom
Unit iv xml domUnit iv xml dom
Unit iv xml dom
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache Hadoop
 

Más de Marco Gralike

UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxUKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxMarco Gralike
 
eProseed Oracle Open World 2016 debrief - Oracle Management Cloud
eProseed Oracle Open World 2016 debrief - Oracle Management CloudeProseed Oracle Open World 2016 debrief - Oracle Management Cloud
eProseed Oracle Open World 2016 debrief - Oracle Management CloudMarco Gralike
 
eProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 Database
eProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 DatabaseeProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 Database
eProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 DatabaseMarco Gralike
 
Oracle Database - JSON and the In-Memory Database
Oracle Database - JSON and the In-Memory DatabaseOracle Database - JSON and the In-Memory Database
Oracle Database - JSON and the In-Memory DatabaseMarco Gralike
 
UKOUG Tech15 - Going Full Circle - Building a native JSON Database API
UKOUG Tech15 - Going Full Circle - Building a native JSON Database APIUKOUG Tech15 - Going Full Circle - Building a native JSON Database API
UKOUG Tech15 - Going Full Circle - Building a native JSON Database APIMarco Gralike
 
An introduction into Oracle VM V3.x
An introduction into Oracle VM V3.xAn introduction into Oracle VM V3.x
An introduction into Oracle VM V3.xMarco Gralike
 
An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3
An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3
An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3Marco Gralike
 
An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)Marco Gralike
 
Flexibiliteit & Snel Schakelen
Flexibiliteit & Snel SchakelenFlexibiliteit & Snel Schakelen
Flexibiliteit & Snel SchakelenMarco Gralike
 
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file server
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file serverBGOUG 2012 - Drag & drop and other stuff - Using your database as a file server
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file serverMarco Gralike
 

Más de Marco Gralike (11)

UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxUKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
 
eProseed Oracle Open World 2016 debrief - Oracle Management Cloud
eProseed Oracle Open World 2016 debrief - Oracle Management CloudeProseed Oracle Open World 2016 debrief - Oracle Management Cloud
eProseed Oracle Open World 2016 debrief - Oracle Management Cloud
 
eProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 Database
eProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 DatabaseeProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 Database
eProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 Database
 
Oracle Database - JSON and the In-Memory Database
Oracle Database - JSON and the In-Memory DatabaseOracle Database - JSON and the In-Memory Database
Oracle Database - JSON and the In-Memory Database
 
UKOUG Tech15 - Going Full Circle - Building a native JSON Database API
UKOUG Tech15 - Going Full Circle - Building a native JSON Database APIUKOUG Tech15 - Going Full Circle - Building a native JSON Database API
UKOUG Tech15 - Going Full Circle - Building a native JSON Database API
 
An introduction into Oracle VM V3.x
An introduction into Oracle VM V3.xAn introduction into Oracle VM V3.x
An introduction into Oracle VM V3.x
 
An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3
An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3
An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3
 
An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)
 
Flexibiliteit & Snel Schakelen
Flexibiliteit & Snel SchakelenFlexibiliteit & Snel Schakelen
Flexibiliteit & Snel Schakelen
 
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file server
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file serverBGOUG 2012 - Drag & drop and other stuff - Using your database as a file server
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file server
 
Amis ACE
Amis ACEAmis ACE
Amis ACE
 

Último

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 

Hotsos 2013 - Creating Structure in Unstructured Data

  • 1. Creating Structure in Unstructured Data What is possible, today…? Marco Gralike
  • 2.
  • 3.
  • 4.
  • 7. WikiPedia • One string of XML data with structured and unstructured data sections • Language: English • Size : 42,15 GB • Pages : 12.961.997 • Date : 21 Dec 2012
  • 9. Setup • VirtualBox VM – OEL 5U8 (64) – 8 GB RAM • LaCie Little Big Disk – RAID 0 – Thunderbolt • Database – SGA 4GB – PGA 2GB
  • 10. My new LaCie LBD is really fast - 
  • 11. Defeat?! - 1.000.000 pages only
  • 13. XML - Where are we…? Gartner
  • 15. On the Horizon! • JSoniq • Zorba
  • 17. Oracle XML DB • NO cost option • C (native / embedded kernel) • (XQuery) Standards • Code maintained by Oracle
  • 18. XQuery XMLType Abstraction DB XQuery Procedural XQuery XQuery Rewrite Pushdown XVM (use “no query rewrite”) Relational Streaming XPath DOM Tree Evaluation Model Access SQL Execution Methods XMLIndex Object-Relational Binary XML Relational Storage Secure Files Source: S317428: Building Really Scalable XML Applications with Oracle XML DB and Oracle Text
  • 19. So about what are we talking ?
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27. WikiPedia • Structured & Unstructured bits and pieces • A lot of “unbounded” elements • Not a lot of restrictions • The bit with value is in element “tekst”
  • 28. How do we get this Structured?
  • 29.
  • 30.
  • 31. Strings = small & defined (12c?) Ename  pointer += 100;
  • 35. <verybigggr> <empno>1</empno><ename>Marco</ename> <empno>2</empno> </verybigggr> <small/><verybigggr/><bigger/>
  • 36.
  • 37.
  • 38.
  • 39.
  • 41. “XMLType” Container In Memory CLOB (document) (document) Object Relational Binary XML (data) (data)
  • 42. XMLType In Memory (document) XOB XML Schema
  • 43. XMLType Binary XML Securefile (document/content) Post Parse LOB Index
  • 44. XMLType Object Relational (content) Fully Shredded Indexes
  • 45. Something else to Realize !
  • 46. “What is the fastest way to get this stuff in the database…?”
  • 48. “So what is the fastest way to get XML in the database…?”
  • 50. “So what is the fastest way to get XML in the database… … and useful in my case…?”
  • 51. Garbage IN – Garbage OUT
  • 52. WikiPedia • SQL*Loader • Parallel or Direct • Securefile LOB Column • 2.5 hours And no (performant) way to get the details out… a.k.a “completely useless”
  • 53. WikiPedia • SQL*Loader • Parallel or Direct • Securefile Binary XML • …2.5 hours ???
  • 54. XML Parsing • SAX - Simple API for XML • DOM - Document Object Module
  • 55. fast insert performance CLOB XMLType CLOB (domain) indexes XMLType Binary XML XMLType Object Relational fast select performance
  • 56.
  • 57. XML Partitioning • Object Relational Partitioning – Equi-Partitioning since version Oracle 11.1.0.7.0 • Binary XML Partitioning – Range, List, Hash • Local partitioned XMLIndex – LOCAL keyword in XMLIndex create syntax • Partition Key on virtual column (Binary XML) • Partition Key on column (Object Relational)
  • 58. XMLType Binary XML Securefile (document/content) Post Parse LOB Index
  • 59. Driving access on CONTENT BTre e Index bookstore Function based Index (XPath) book whitepaper title author author chapter title author id paragraph Unstructured Structured XMLIndex XMLIndex content structured content BTree Oracle XML Index Text Index
  • 61. Structured XMLIndex (SXI) • CONTENT TABLE(s) • Based on XMLTABLE syntax Structured XMLIndex • XMLTable construct can be f (x) nested: – VIRTUAL column alias • Can be maintained manually • Secondary indexes possible Content Tables
  • 62. Describe CONTENT TABLE • A “regular” heap table with columns… • Ideal for secondary indexes, if needed.
  • 63. CONTENT TABLE(s) Structured XMLIndex f (x) Content Tables
  • 65. Unstructured XMLIndex (UXI) • PATH TABLE • Use Path Subsetting Unstructured – Full Blown XMLIndex can be BIG XMLIndex f (x) • Token Tables (XDB.X$......) – Query re-write on Tokens – Fuzzy Searches, // – Optimizer Statistics • Can be maintained manually – Recorded in Pending Table Path Table • Secondary indexes possible
  • 69. Binary XML – No Index
  • 70. Binary XML + XMLIndex (SXI)
  • 71. Binary XML + XMLIndex + Sec.Ind.
  • 72. Binary XML + XMLIndex + Sec.Ind.
  • 74. XML Full Tekst Index • Based on Oracle Text Index, XQuery Full Text • XML Namespace Aware • XML Semantic aware full text search – Full-Tekst Selection Expression – contains text – Logical Full Text Operator – ftor, ftand, ftMildNot – Context Aware full text search
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85. Balanced Design • Inserts, Updates & Deletes – XML Future Changes – Index Maintenance In Memory On Disk • Selects – In Memory – Via Indexes • XML Validation – Strict, Lazy – Client Side Possibilities
  • 86. Reward • Optimal performance • Out performing XML • Proper design will give performance increase over XML handling… …proper design is still key…
  • 87.
  • 88. References Oracle XML DB – http://www.oracle.com/pls/db112/homepage XML DB FAQ Thread – http://forums.oracle.com/forums/thread.jspa?thr eadID=410714 Personal Blog – http://www.xmldb.nl – http://technology.amis.nl
  • 89. References Daniela Florescu, Oracle Corporation Advances in XML and XQuery Sam Idicula, Oracle XML DB Development Team Binary XML Storage and Query Processing in Oracle Jinyu Wang, Scott Brewton Making XML Technology Easier to Use Joel Spolsky - Joel on Software Back to Basics
  • 90. References Oracle XML DB Main page material • Oracle XML DB : Best Practices to Get Optimal Performance out of XML Queries (PDF) • Oracle XML DB : Choosing the Best XMLType Storage Option for Your Use Case (PDF) • A Request for Comments for the Oracle Binary XML Format

Notas del editor

  1. See also OOW 2010, S317428: Building Really Scalable XML Applications with Oracle XML DB and Oracle Text – Nipun Agarwal, Oracle