SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
XML Schema Computations: Schema
    Compatibility Testing and Subschema Extraction

           Thomas Y.T. LEE and David W.L. Cheung

                  Department of Computer Science
                    The University of Hong Kong


                      October 28, 2010
                        CIKM 2010
                      Toronto, Canada




1
Outline



    Introduction and motivation


    Formal models for XML data and schemas


    Schema computational algorithms


    Experiments and conclusions




2
Outline



    Introduction and motivation


    Formal models for XML data and schemas


    Schema computational algorithms


    Experiments and conclusions




3
Data interoperability on web services
    In order for two web services to be interoperable , the XML
    schema on the message receiving end must accept all possible
    XML messages from the sending end.
        The sending schema must be a subschema of the receiving
        schema.

                                  _


                                  ∩
                      XML                     XML
                   Instances               Instances




                   Schema A                 Schema B




                     Web                      Web
                    Service                  Service
                      A                        B


4
W3C XML Schema and data standards

    1. W3C XML Schema (XSD) is the most popular schema
       language to define data standards.
    2. In order for the new version of an XSD to be
       backward-compatible with the old version, the new version
       must be a superschema of the old version.
           The new schema must accept every instance of the old
           schema.
    3. However, a typical e-commerce standard XSD contains
       thousands of types / elements, which makes manual
       verification of compatibility hardly possible.
    4. When an XSD is too large, how can we extract a smaller
       subschema just enough for processing by a specific
       application?



5
Schema compatibility problems



    1. Given two XSDs, how to verify two XSDs are equivalent or
       one is a subschema of the other?
    2. Given XSD A , how to extract a smaller subschema of A called
       B so that B recognizes only a subset of elements recognized
       by A ?
    3. In this research, we have developed the formal models for
       XML data and schemas, as well as the algorithms to solve
       these problems.




6
Outline



    Introduction and motivation


    Formal models for XML data and schemas


    Schema computational algorithms


    Experiments and conclusions




7
Data Tree (DT) to model XML data


    A DT is a tree where edges represent elements and nodes
    represent their contents.
    <Quote>                                            n0:ε

     <Line>                                             <Quote>
      <Desc>hPhone</Desc>
      <Price>499.9</Price>                             n1:ε
     </Line>
                                                <Line> <Line>
     <Line>
      <Desc>iMat</Desc>                       n2:ε            n3:ε
      <Price>999.9</Price>
                                         <Desc> <Price>         <Desc> <Price>
     </Line>
    </Quote>                     n4:           n5:              n6:      n7:
                              "hPhone"       "499.9"          "iMat"   "999.9"




8
Schema Automaton (SA) to model XML schemas

    1. An SA is a deterministic finite automaton (DFA) where each
       state is associated with a regular expression (RE) and a set of
       values called value domain (VDom)
    2. The DFA called vertical language (VLang) defines how the
       symbols are arranged along the paths from the root to the
       leaves.
       2.1 Each state represents an XSD data type and each symbol
           represents an element name.
    3. The RE of a state called horizontal language (HLang)
       defines how child elements can be arranged under an XSD
       data type, i.e., content model.
    4. The value domain defines the set of all possible values an
       element can contain.



9
Example SA


                                              <Line>                 q3    <Desc>
              <Quote>            q1
                                                                           <Price>    q5
     q0       <Order>
                             <Line>                       <Qty>
                        q2                      q4                   q8    <Desc>
                                                         <Product>
                                                                           <Price>    q6

                                                                     q7

          q       HLang(q)       VDom(q)
                                                     q        HLang(q)      VDom(q)
      q0       <Quote>|<Order>        {   }
                                                     q5         { }         STRINGS
      q1           <Line>+            {   }
                                                     q6         { }         DECIMALS
      q2           <Line>+            {   }
                                                     q7    <Desc><Price>       { }
      q3        <Desc><Price>         {   }
                                                     q8         { }         INTEGERS
      q4       <Product><Qty>         {   }




10
Outline



     Introduction and motivation


     Formal models for XML data and schemas


     Schema computational algorithms


     Experiments and conclusions




11
Schema compatibility testing

     1. Schema equivalence testing and subschema testing .
     2. A schema minimization is involved.
        2.1 All useless states (data types) are removed first. A useless
            state is an inaccessible state or a state which does not
            recognize any element with a finite number of descendants.
        2.2 The process is like a DFA minimization but the HLang and
            VDom of each state are considered when deciding whether
            two states can be merged.
     3. We have proved that two SAs (XSDs) are equivalent iff their
        minimized forms have isomorphic VLang DFAs and all
        corresponding HLangs and VDoms are equivalent .
     4. We have developed an algorithm to verify whether an SA is a
        subschema of another SA.



12
Useless states

                               B         q2

                               A
                                                       A
                    q0     A                      q7        q8
                                     C   q3            B
                               q1

                               C              B        C
                                         q4       q5   A          B
                                                            q6        q9

             q    HLang(q)          VDom(q)       q    HLang(q)       VDom(q)
            q0   A{2,5}BC?          STRINGS       q5        C          STRINGS
            q1       C*             STRINGS       q6       A+B*       INTEGERS
            q2       { }           INTEGERS       q7        A?         STRINGS
            q3       A*             STRINGS       q8        B*         STRINGS
            q4       B+             STRINGS       q9        { }       DECIMALS

     1. q7 and q8 are inaccessible.
     2. q5 and q6 are irrational because they generate infinite children.
     3. q9 is useless because it is blocked by irrational states.
     4. q4 is useless because it must lead to an irrational state.


13
Schema minimization and equivalence
                                                                        q     HLang(q)       VDom(q)
                                                                        q0   Quote | Order     { }
Schema A                                                                q1      Line +         { }
                               <Line>               q3   <Desc>         q2      Line +         { }
       <Quote>          q1
                                                         <Price>   q5
 q0    <Order>                                                          q3    Desc Price       { }
                      <Line>             <Qty>
                 q2              q4                 q8   <Desc>         q4   Product Qty       { }
                                        <Product>
                                                         <Price>   q6   q5       { }          STRS
                                                    q7                  q6       { }          DECS
                                                                        q7   Desc Price        { }
                                                                        q8       { }          INTS
                                                                        q4   Product Qty       { }
         1. q3 and q7 can be merged into q9.
         2. Two SAs are equivalent.                                     q     HLang(q)       VDom(q)
                                                                        q0   Quote | Order     { }
                                                         <Desc>    q5
                               <Line>
                                                                        q1      Line +         { }
       <Quote>          q1                          q9   <Price>
       <Order>                          <Product>                       q2      Line +         { }
 q0
                      <Line>                                       q6   q9    Desc Price       { }
                 q2              q4      <Qty>
                                                    q8                  q4   Product Qty       { }
                                                                        q5        { }         STRS
Schema B                                                                q6        { }         DECS
                                                                        q8        { }         INTS


  14
Subschema testing
                                                                            q      HLang(q)        VDom(q)

Schema A                                                                    q0   Quote | Order       { }
                                                                            q1       Line +          { }
                                                             <Desc>    q5
                                                                            q2       Line +          { }
                         q1     <Line>
       <Quote>                                          q9   <Price>
       <Order>                              <Product>                       q9    Desc Price         { }
 q0
                       <Line>                                          q6   q4    Product Qty        { }
                 q2               q4         <Qty>
                                                        q8                  q5        { }           STRS
                                                                            q6        { }           DECS
                                                                            q8        { }           INTS
B is a subschema of A.
 1. HLang(q0B ) ⊆ HLang(q0A ) and VDom(q0B ) = VDom(q0A ).
 2. HLang(q6B ) = HLang(q6A ) and VDom(q6B ) ⊆ VDom(q6A ).
 3. HLang(qiB ) = HLang(qiA ) and VDom(qiB ) = VDom(qiA ), for i = 1.5, 9.
                                                                            q    HLang(q)        VDom(q)
                                             <Desc>     q5
                                                                            q0     Quote           { }
       <Quote>          <Line>
 q0               q1                   q9    <Price>                        q1     Line +          { }
                                                        q6                  q9   Desc Price        { }
                                                                            q5       { }          STRS
Schema B                                                                    q6       { }          INTS



  15
Subschema extraction

     We have developed the subschema extraction algorithm:
         Given SA (XSD) A and a set of symbols (element names) Z,
         compute an SA which accepts all instances (XML documents)
         of A except those containing some symbols not in Z.
                                                                          <Desc>     q4
                                    q1       <Line>
                  <Quote>                                            q2   <Price>
             q0   <Order>                             <Product>
                                  <Line>                                             q5
                             q7                q3         <Qty>
                                                                     q6

        q         HLang(q)        VDom(q)             q           HLang(q)          VDom(q)
        q0   <Quote>|<Order>         {   }          q3      <Product><Qty>             { }
        q1       <Line>+             {   }          q4            { }                STRINGS
        q7       <Line>+             {   }          q5            { }               DECIMALS
        q2    <Desc><Price>          {   }          q6            { }               INTEGERS

         Z = {<Quote>, <Line>, <Desc>, <Price>, <Order>, <Qty>}, where <Product> is
         excluded.



16
Outline



     Introduction and motivation


     Formal models for XML data and schemas


     Schema computational algorithms


     Experiments and conclusions




17
xCBL compatibility testing experiment

     1. Data sets: XML Common Business Library
                         file   no. of    data   element     doc.
         XSD            size    files    types    names     types
         xCBL 3.0    1.8MB       413    1,290     3,728       42
         xCBL 3.5    2.0MB       496    1,476     4,473       51
     2. The subschema testing program has disproved the claim on
        xCBL.org:
       The only modifications allowed to xCBL 3.0 documents were the
       additions of new optional elements and additions to code lists; to
       maintain interoperability between the two versions. An xCBL 3.0
       instance of a document is also a valid instance in xCBL 3.5.
     3. xCBL 3.5 is not a superschema of xCBL 3.0.
     4. The experiment took only 272ms when the quick RE test
        was applied.
            Machine: Q6600@2.40GHz, 4GB RAM, Linux OS


18
Schema size reduction by subschema extraction
     1. The subschema extraction program was run to extract
        different subschemas from xCBL. Each subschema
        recognizes a different element subset for a specific
        application, e.g., order, invoice, etc.
     2. The schema size was reduced to 6–32% of the original size.
     3. The time required by XMLBeans to compile a subschema was
        reduced to 34–50% of the time originally required.
     4. The time to extract such a subschema was only 2–3s.
                  5000                                                              35
                                                               #element names
                                                                         #types     30
                  4000                                    #element declarations
                                                      XMLBeans compilation time     25




                                                                                         time (second)
                  3000
         number




                                                                                    20

                  2000                                                              15
                                                                                    10
                  1000
                                                                                    5
                    0                                                               0
                         original   invoice   order    quote    auction   catalog
                              Subschema extraction from xCBL 3.5.

19
Conclusions
     1. We have developed:
            formal models for XML and XSD, and
            algorithms for schema equivalence and subschema testing,
            and subschema extraction.
     2. These algorithms are PSPACE-complete because of
        comparions of regular expressions.
            We have developed a heuristic (quick RE test) to make these
            algorithms run fast on very large schemas.
     3. Our experiments:
            have proved that xCBL 3.5 is in fact not backward-compatible
            with xCBL 3.0, and
            have extracted small subschemas from xCBL for different
            instance subsets, which largely reduce processing time on
            these subschemas.
     4. These models can be extended for other applications:
            web service adaptor for legacy systems (text to XML
            transformation), and
            schema inferrer from XML instances.
20

Más contenido relacionado

Similar a XML Schema Computations: Schema Compatibility Testing and Subschema Extraction

Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparisonshsedghi
 
Enabling ontology based streaming data access final
Enabling ontology based streaming data access finalEnabling ontology based streaming data access final
Enabling ontology based streaming data access finalJean-Paul Calbimonte
 
Cascalog at Strange Loop
Cascalog at Strange LoopCascalog at Strange Loop
Cascalog at Strange Loopnathanmarz
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkCloudera, Inc.
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Matthias Niehoff
 
Knowledge Discovery Query Language (KDQL)
Knowledge Discovery Query Language (KDQL)Knowledge Discovery Query Language (KDQL)
Knowledge Discovery Query Language (KDQL)Zakaria Zubi
 
Ast2Cfg - A Framework for CFG-Based Analysis and Visualisation of Ada Programs
Ast2Cfg - A Framework for CFG-Based Analysis and Visualisation of Ada ProgramsAst2Cfg - A Framework for CFG-Based Analysis and Visualisation of Ada Programs
Ast2Cfg - A Framework for CFG-Based Analysis and Visualisation of Ada ProgramsGneuromante canalada.org
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraRobbie Strickland
 

Similar a XML Schema Computations: Schema Compatibility Testing and Subschema Extraction (9)

Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparison
 
Enabling ontology based streaming data access final
Enabling ontology based streaming data access finalEnabling ontology based streaming data access final
Enabling ontology based streaming data access final
 
NoSQL Smackdown!
NoSQL Smackdown!NoSQL Smackdown!
NoSQL Smackdown!
 
Cascalog at Strange Loop
Cascalog at Strange LoopCascalog at Strange Loop
Cascalog at Strange Loop
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra
 
Knowledge Discovery Query Language (KDQL)
Knowledge Discovery Query Language (KDQL)Knowledge Discovery Query Language (KDQL)
Knowledge Discovery Query Language (KDQL)
 
Ast2Cfg - A Framework for CFG-Based Analysis and Visualisation of Ada Programs
Ast2Cfg - A Framework for CFG-Based Analysis and Visualisation of Ada ProgramsAst2Cfg - A Framework for CFG-Based Analysis and Visualisation of Ada Programs
Ast2Cfg - A Framework for CFG-Based Analysis and Visualisation of Ada Programs
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and Cassandra
 

Más de Thomas Lee

What AI can do for your business
What AI can do for your businessWhat AI can do for your business
What AI can do for your businessThomas Lee
 
多雲策略:別把所有系統跑在同一雲平台上
多雲策略:別把所有系統跑在同一雲平台上多雲策略:別把所有系統跑在同一雲平台上
多雲策略:別把所有系統跑在同一雲平台上Thomas Lee
 
XML Schema Design and Management for e-Government Data Interoperability
XML Schema Design and Management for e-Government Data Interoperability XML Schema Design and Management for e-Government Data Interoperability
XML Schema Design and Management for e-Government Data Interoperability Thomas Lee
 
Automating Relational Database Schema Design for Very Large Semantic Datasets
Automating Relational Database Schema Design for Very Large Semantic DatasetsAutomating Relational Database Schema Design for Very Large Semantic Datasets
Automating Relational Database Schema Design for Very Large Semantic DatasetsThomas Lee
 
Formal Models and Algorithms for XML Data Interoperability
Formal Models and Algorithms for XML Data InteroperabilityFormal Models and Algorithms for XML Data Interoperability
Formal Models and Algorithms for XML Data InteroperabilityThomas Lee
 
Cloud Portability and Interoperability Architecture Model and Best Practices ...
Cloud Portability and Interoperability Architecture Model and Best Practices ...Cloud Portability and Interoperability Architecture Model and Best Practices ...
Cloud Portability and Interoperability Architecture Model and Best Practices ...Thomas Lee
 
Architecture and Practices on Cloud Interoperability and Portability
Architecture and Practices on Cloud Interoperability and PortabilityArchitecture and Practices on Cloud Interoperability and Portability
Architecture and Practices on Cloud Interoperability and PortabilityThomas Lee
 
ebXML Technology Development in Hong Kong
ebXML Technology Development in Hong KongebXML Technology Development in Hong Kong
ebXML Technology Development in Hong KongThomas Lee
 
ebXML and Open Source Software for E-Commerce
ebXML and Open Source Software for E-CommerceebXML and Open Source Software for E-Commerce
ebXML and Open Source Software for E-CommerceThomas Lee
 
The Mythical XML
The Mythical XMLThe Mythical XML
The Mythical XMLThomas Lee
 
Paperless Trading Infrastructure Technology Development in Hong Kong
Paperless Trading Infrastructure Technology Development in Hong KongPaperless Trading Infrastructure Technology Development in Hong Kong
Paperless Trading Infrastructure Technology Development in Hong KongThomas Lee
 
E government Interoperability Infrastructure Development
E government Interoperability Infrastructure DevelopmentE government Interoperability Infrastructure Development
E government Interoperability Infrastructure DevelopmentThomas Lee
 
Adopting Web 2.0 in Business World
Adopting Web 2.0 in Business WorldAdopting Web 2.0 in Business World
Adopting Web 2.0 in Business WorldThomas Lee
 
Webformer: a Rapid Application Development Toolkit for Writing Ajax Web Form ...
Webformer: a Rapid Application Development Toolkit for Writing Ajax Web Form ...Webformer: a Rapid Application Development Toolkit for Writing Ajax Web Form ...
Webformer: a Rapid Application Development Toolkit for Writing Ajax Web Form ...Thomas Lee
 
E-Government Interoperability Infrastructure in Hong Kong
E-Government Interoperability Infrastructure in Hong KongE-Government Interoperability Infrastructure in Hong Kong
E-Government Interoperability Infrastructure in Hong KongThomas Lee
 

Más de Thomas Lee (15)

What AI can do for your business
What AI can do for your businessWhat AI can do for your business
What AI can do for your business
 
多雲策略:別把所有系統跑在同一雲平台上
多雲策略:別把所有系統跑在同一雲平台上多雲策略:別把所有系統跑在同一雲平台上
多雲策略:別把所有系統跑在同一雲平台上
 
XML Schema Design and Management for e-Government Data Interoperability
XML Schema Design and Management for e-Government Data Interoperability XML Schema Design and Management for e-Government Data Interoperability
XML Schema Design and Management for e-Government Data Interoperability
 
Automating Relational Database Schema Design for Very Large Semantic Datasets
Automating Relational Database Schema Design for Very Large Semantic DatasetsAutomating Relational Database Schema Design for Very Large Semantic Datasets
Automating Relational Database Schema Design for Very Large Semantic Datasets
 
Formal Models and Algorithms for XML Data Interoperability
Formal Models and Algorithms for XML Data InteroperabilityFormal Models and Algorithms for XML Data Interoperability
Formal Models and Algorithms for XML Data Interoperability
 
Cloud Portability and Interoperability Architecture Model and Best Practices ...
Cloud Portability and Interoperability Architecture Model and Best Practices ...Cloud Portability and Interoperability Architecture Model and Best Practices ...
Cloud Portability and Interoperability Architecture Model and Best Practices ...
 
Architecture and Practices on Cloud Interoperability and Portability
Architecture and Practices on Cloud Interoperability and PortabilityArchitecture and Practices on Cloud Interoperability and Portability
Architecture and Practices on Cloud Interoperability and Portability
 
ebXML Technology Development in Hong Kong
ebXML Technology Development in Hong KongebXML Technology Development in Hong Kong
ebXML Technology Development in Hong Kong
 
ebXML and Open Source Software for E-Commerce
ebXML and Open Source Software for E-CommerceebXML and Open Source Software for E-Commerce
ebXML and Open Source Software for E-Commerce
 
The Mythical XML
The Mythical XMLThe Mythical XML
The Mythical XML
 
Paperless Trading Infrastructure Technology Development in Hong Kong
Paperless Trading Infrastructure Technology Development in Hong KongPaperless Trading Infrastructure Technology Development in Hong Kong
Paperless Trading Infrastructure Technology Development in Hong Kong
 
E government Interoperability Infrastructure Development
E government Interoperability Infrastructure DevelopmentE government Interoperability Infrastructure Development
E government Interoperability Infrastructure Development
 
Adopting Web 2.0 in Business World
Adopting Web 2.0 in Business WorldAdopting Web 2.0 in Business World
Adopting Web 2.0 in Business World
 
Webformer: a Rapid Application Development Toolkit for Writing Ajax Web Form ...
Webformer: a Rapid Application Development Toolkit for Writing Ajax Web Form ...Webformer: a Rapid Application Development Toolkit for Writing Ajax Web Form ...
Webformer: a Rapid Application Development Toolkit for Writing Ajax Web Form ...
 
E-Government Interoperability Infrastructure in Hong Kong
E-Government Interoperability Infrastructure in Hong KongE-Government Interoperability Infrastructure in Hong Kong
E-Government Interoperability Infrastructure in Hong Kong
 

Último

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Último (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 

XML Schema Computations: Schema Compatibility Testing and Subschema Extraction

  • 1. XML Schema Computations: Schema Compatibility Testing and Subschema Extraction Thomas Y.T. LEE and David W.L. Cheung Department of Computer Science The University of Hong Kong October 28, 2010 CIKM 2010 Toronto, Canada 1
  • 2. Outline Introduction and motivation Formal models for XML data and schemas Schema computational algorithms Experiments and conclusions 2
  • 3. Outline Introduction and motivation Formal models for XML data and schemas Schema computational algorithms Experiments and conclusions 3
  • 4. Data interoperability on web services In order for two web services to be interoperable , the XML schema on the message receiving end must accept all possible XML messages from the sending end. The sending schema must be a subschema of the receiving schema. _ ∩ XML XML Instances Instances Schema A Schema B Web Web Service Service A B 4
  • 5. W3C XML Schema and data standards 1. W3C XML Schema (XSD) is the most popular schema language to define data standards. 2. In order for the new version of an XSD to be backward-compatible with the old version, the new version must be a superschema of the old version. The new schema must accept every instance of the old schema. 3. However, a typical e-commerce standard XSD contains thousands of types / elements, which makes manual verification of compatibility hardly possible. 4. When an XSD is too large, how can we extract a smaller subschema just enough for processing by a specific application? 5
  • 6. Schema compatibility problems 1. Given two XSDs, how to verify two XSDs are equivalent or one is a subschema of the other? 2. Given XSD A , how to extract a smaller subschema of A called B so that B recognizes only a subset of elements recognized by A ? 3. In this research, we have developed the formal models for XML data and schemas, as well as the algorithms to solve these problems. 6
  • 7. Outline Introduction and motivation Formal models for XML data and schemas Schema computational algorithms Experiments and conclusions 7
  • 8. Data Tree (DT) to model XML data A DT is a tree where edges represent elements and nodes represent their contents. <Quote> n0:ε <Line> <Quote> <Desc>hPhone</Desc> <Price>499.9</Price> n1:ε </Line> <Line> <Line> <Line> <Desc>iMat</Desc> n2:ε n3:ε <Price>999.9</Price> <Desc> <Price> <Desc> <Price> </Line> </Quote> n4: n5: n6: n7: "hPhone" "499.9" "iMat" "999.9" 8
  • 9. Schema Automaton (SA) to model XML schemas 1. An SA is a deterministic finite automaton (DFA) where each state is associated with a regular expression (RE) and a set of values called value domain (VDom) 2. The DFA called vertical language (VLang) defines how the symbols are arranged along the paths from the root to the leaves. 2.1 Each state represents an XSD data type and each symbol represents an element name. 3. The RE of a state called horizontal language (HLang) defines how child elements can be arranged under an XSD data type, i.e., content model. 4. The value domain defines the set of all possible values an element can contain. 9
  • 10. Example SA <Line> q3 <Desc> <Quote> q1 <Price> q5 q0 <Order> <Line> <Qty> q2 q4 q8 <Desc> <Product> <Price> q6 q7 q HLang(q) VDom(q) q HLang(q) VDom(q) q0 <Quote>|<Order> { } q5 { } STRINGS q1 <Line>+ { } q6 { } DECIMALS q2 <Line>+ { } q7 <Desc><Price> { } q3 <Desc><Price> { } q8 { } INTEGERS q4 <Product><Qty> { } 10
  • 11. Outline Introduction and motivation Formal models for XML data and schemas Schema computational algorithms Experiments and conclusions 11
  • 12. Schema compatibility testing 1. Schema equivalence testing and subschema testing . 2. A schema minimization is involved. 2.1 All useless states (data types) are removed first. A useless state is an inaccessible state or a state which does not recognize any element with a finite number of descendants. 2.2 The process is like a DFA minimization but the HLang and VDom of each state are considered when deciding whether two states can be merged. 3. We have proved that two SAs (XSDs) are equivalent iff their minimized forms have isomorphic VLang DFAs and all corresponding HLangs and VDoms are equivalent . 4. We have developed an algorithm to verify whether an SA is a subschema of another SA. 12
  • 13. Useless states B q2 A A q0 A q7 q8 C q3 B q1 C B C q4 q5 A B q6 q9 q HLang(q) VDom(q) q HLang(q) VDom(q) q0 A{2,5}BC? STRINGS q5 C STRINGS q1 C* STRINGS q6 A+B* INTEGERS q2 { } INTEGERS q7 A? STRINGS q3 A* STRINGS q8 B* STRINGS q4 B+ STRINGS q9 { } DECIMALS 1. q7 and q8 are inaccessible. 2. q5 and q6 are irrational because they generate infinite children. 3. q9 is useless because it is blocked by irrational states. 4. q4 is useless because it must lead to an irrational state. 13
  • 14. Schema minimization and equivalence q HLang(q) VDom(q) q0 Quote | Order { } Schema A q1 Line + { } <Line> q3 <Desc> q2 Line + { } <Quote> q1 <Price> q5 q0 <Order> q3 Desc Price { } <Line> <Qty> q2 q4 q8 <Desc> q4 Product Qty { } <Product> <Price> q6 q5 { } STRS q7 q6 { } DECS q7 Desc Price { } q8 { } INTS q4 Product Qty { } 1. q3 and q7 can be merged into q9. 2. Two SAs are equivalent. q HLang(q) VDom(q) q0 Quote | Order { } <Desc> q5 <Line> q1 Line + { } <Quote> q1 q9 <Price> <Order> <Product> q2 Line + { } q0 <Line> q6 q9 Desc Price { } q2 q4 <Qty> q8 q4 Product Qty { } q5 { } STRS Schema B q6 { } DECS q8 { } INTS 14
  • 15. Subschema testing q HLang(q) VDom(q) Schema A q0 Quote | Order { } q1 Line + { } <Desc> q5 q2 Line + { } q1 <Line> <Quote> q9 <Price> <Order> <Product> q9 Desc Price { } q0 <Line> q6 q4 Product Qty { } q2 q4 <Qty> q8 q5 { } STRS q6 { } DECS q8 { } INTS B is a subschema of A. 1. HLang(q0B ) ⊆ HLang(q0A ) and VDom(q0B ) = VDom(q0A ). 2. HLang(q6B ) = HLang(q6A ) and VDom(q6B ) ⊆ VDom(q6A ). 3. HLang(qiB ) = HLang(qiA ) and VDom(qiB ) = VDom(qiA ), for i = 1.5, 9. q HLang(q) VDom(q) <Desc> q5 q0 Quote { } <Quote> <Line> q0 q1 q9 <Price> q1 Line + { } q6 q9 Desc Price { } q5 { } STRS Schema B q6 { } INTS 15
  • 16. Subschema extraction We have developed the subschema extraction algorithm: Given SA (XSD) A and a set of symbols (element names) Z, compute an SA which accepts all instances (XML documents) of A except those containing some symbols not in Z. <Desc> q4 q1 <Line> <Quote> q2 <Price> q0 <Order> <Product> <Line> q5 q7 q3 <Qty> q6 q HLang(q) VDom(q) q HLang(q) VDom(q) q0 <Quote>|<Order> { } q3 <Product><Qty> { } q1 <Line>+ { } q4 { } STRINGS q7 <Line>+ { } q5 { } DECIMALS q2 <Desc><Price> { } q6 { } INTEGERS Z = {<Quote>, <Line>, <Desc>, <Price>, <Order>, <Qty>}, where <Product> is excluded. 16
  • 17. Outline Introduction and motivation Formal models for XML data and schemas Schema computational algorithms Experiments and conclusions 17
  • 18. xCBL compatibility testing experiment 1. Data sets: XML Common Business Library file no. of data element doc. XSD size files types names types xCBL 3.0 1.8MB 413 1,290 3,728 42 xCBL 3.5 2.0MB 496 1,476 4,473 51 2. The subschema testing program has disproved the claim on xCBL.org: The only modifications allowed to xCBL 3.0 documents were the additions of new optional elements and additions to code lists; to maintain interoperability between the two versions. An xCBL 3.0 instance of a document is also a valid instance in xCBL 3.5. 3. xCBL 3.5 is not a superschema of xCBL 3.0. 4. The experiment took only 272ms when the quick RE test was applied. Machine: Q6600@2.40GHz, 4GB RAM, Linux OS 18
  • 19. Schema size reduction by subschema extraction 1. The subschema extraction program was run to extract different subschemas from xCBL. Each subschema recognizes a different element subset for a specific application, e.g., order, invoice, etc. 2. The schema size was reduced to 6–32% of the original size. 3. The time required by XMLBeans to compile a subschema was reduced to 34–50% of the time originally required. 4. The time to extract such a subschema was only 2–3s. 5000 35 #element names #types 30 4000 #element declarations XMLBeans compilation time 25 time (second) 3000 number 20 2000 15 10 1000 5 0 0 original invoice order quote auction catalog Subschema extraction from xCBL 3.5. 19
  • 20. Conclusions 1. We have developed: formal models for XML and XSD, and algorithms for schema equivalence and subschema testing, and subschema extraction. 2. These algorithms are PSPACE-complete because of comparions of regular expressions. We have developed a heuristic (quick RE test) to make these algorithms run fast on very large schemas. 3. Our experiments: have proved that xCBL 3.5 is in fact not backward-compatible with xCBL 3.0, and have extracted small subschemas from xCBL for different instance subsets, which largely reduce processing time on these subschemas. 4. These models can be extended for other applications: web service adaptor for legacy systems (text to XML transformation), and schema inferrer from XML instances. 20