SlideShare una empresa de Scribd logo
1 de 121
Descargar para leer sin conexión
Datalog and its Extensions
                 for Semantic Web Databases


Georg Gottlob1       Giorgio Orsi1         Andreas Pieris1 Mantas Šimkus2


             1Department     of Computer Science, University of Oxford, UK
      2Institute of Information Systems, Vienna University of Technology, Austria




Reasoning Web Summer School, Vienna, Austria, September 3 - 8, 2012
Structure of the Lecture



            ¡ Relational Databases and Datalog


            ¡ Complexity of Datalog


            ¡ Datalog and Ontological Reasoning


            ¡ Datalog§
Relational Databases

Predominant technology
for data storage and processing
Relational Databases

Predominant technology
for data storage and processing


                                    Flight   origin   destination   airline

“On the fly” example:                         VIE        LHR         BA
                                             LHR         EDI         BA
                                             LGW         GLA         U2
                                             LCA         VIE         OS
    Edinburgh
                          Glasgow                         Airport   code         city
                                                                    VIE        Vienna
    London                                                          LHR        London
                                                                    LGW        London
                        Larnaca                                     LCA       Larnaca

   Vienna                                                           GLA       Glasgow
                                                                    EDI       Edinburgh
Relational Databases (Terminology)

                    Flight   origin    destination         airline
                              VIE            LHR            BA
                             LHR             EDI            BA
                             LGW             GLA            U2
                             LCA             VIE            OS




                         Airport      code          city
                                      VIE          Vienna
   Constants                          LHR      London
(from a domain)
                                      LGW      London
                                      LCA      Larnaca
VIE, LHR, …
                                      GLA      Glasgow
BA, U2, OS
                                      EDI     Edinburgh
Vienna, London, …
Relational Databases (Terminology)

                    Flight   origin    destination         airline
                              VIE            LHR            BA
    Relations                LHR             EDI            BA
                             LGW             GLA            U2
                             LCA             VIE            OS




                         Airport      code          city
                                      VIE          Vienna
   Constants                          LHR      London
(from a domain)
                                      LGW      London
                                      LCA      Larnaca
VIE, LHR, …
                                      GLA      Glasgow
BA, U2, OS
                                      EDI     Edinburgh
Vienna, London, …
Relational Databases (Terminology)

                    Flight   origin    destination         airline
                              VIE            LHR            BA
    Relations                LHR             EDI            BA
                             LGW             GLA            U2
                             LCA             VIE            OS
                                                                        Tuples



                         Airport      code          city
                                      VIE          Vienna
   Constants                          LHR      London
(from a domain)
                                      LGW      London
                                                                      Relational atoms
                                      LCA      Larnaca
VIE, LHR, …
                                      GLA      Glasgow               Flight(LHR,EDI,BA)
BA, U2, OS
                                      EDI     Edinburgh
Vienna, London, …                                                    Airport(LGW,London)
Querying: Relational Algebra

                                  List all the airlines

                                                          Airport   code      city
  Flight   origin   destination   airline
                                                                    VIE     Vienna
            VIE        LHR         BA
                                                                    LHR     London
           LHR         EDI         BA
                                                                    LGW     London
           LGW         GLA         U2
                                                                    LCA    Larnaca
           LCA         VIE         OS
                                                                    GLA    Glasgow
                                                                    EDI    Edinburgh


                             {BA, U2, OS}



                                  Πairline Flight
Querying: Relational Algebra

                    List the codes of the airports in London

                                                         Airport   code      city
  Flight   origin   destination   airline
                                                                   VIE     Vienna
            VIE        LHR         BA
                                                                   LHR     London
           LHR         EDI         BA
                                                                   LGW     London
           LGW         GLA         U2
                                                                   LCA    Larnaca
           LCA         VIE         OS
                                                                   GLA    Glasgow
                                                                   EDI    Edinburgh




                                                              {LHR, LGW}


                     Πcode (σcity           = “London”   Airport)
Querying: Relational Algebra

           Which airlines fly directly from London to Glasgow?

                                                          Airport   code      city
  Flight   origin   destination   airline
                                                                    VIE     Vienna
            VIE        LHR         BA
                                                                    LHR     London
           LHR         EDI         BA
                                                                    LGW     London
           LGW         GLA         U2
                                                                    LCA    Larnaca
           LCA         VIE         OS
                                                                    GLA    Glasgow
                                                                    EDI    Edinburgh




                       T1 Ã σcity           = “London”    Airport

                       T2 Ã σcity           = “Glasgow”   Airport
Querying: Relational Algebra

           Which airlines fly directly from London to Glasgow?


  Flight   origin   destination   airline                   T1   code     city

            VIE        LHR         BA                            LHR    London

           LHR         EDI         BA                            LGW    London

           LGW         GLA         U2
           LCA         VIE         OS
                                                            T2   code     city
                                                                 GLA    Glasgow




                       T1 Ã σcity           = “London”    Airport

                       T2 Ã σcity           = “Glasgow”   Airport
Querying: Relational Algebra

           Which airlines fly directly from London to Glasgow?


  Flight   origin   destination   airline            T1     code     city

            VIE        LHR         BA                       LHR    London

           LHR         EDI         BA                       LGW    London

           LGW         GLA         U2
           LCA         VIE         OS
                                                     T2     code     city
                                                            GLA    Glasgow




                      T3 Ã Flight           origin = code    T1
Querying: Relational Algebra

          Which airlines fly directly from London to Glasgow?


   T3   origin   destination   airline   code     city
        LHR         EDI         BA       LHR    London
        LGW         GLA         U2       LGW    London


                                                         T2     code     city
                                                                GLA    Glasgow




                    T3 Ã Flight                 origin = code    T1
Querying: Relational Algebra

          Which airlines fly directly from London to Glasgow?


   T3   origin   destination   airline   code    city
        LHR         EDI         BA       LHR    London
        LGW         GLA         U2       LGW    London


                                                         T2   code     city
                                                              GLA    Glasgow




                     T4 Ã T3             destination = code   T2
Querying: Relational Algebra

          Which airlines fly directly from London to Glasgow?




     T4    origin   destination   airline   code    city    code     city
           LGW         GLA         U2       LGW    London   GLA    Glasgow




                    T4 Ã T3           destination = code      T2
Querying: Relational Algebra

          Which airlines fly directly from London to Glasgow?




     T4    origin   destination   airline   code    city    code     city
           LGW         GLA         U2       LGW    London   GLA    Glasgow




                                   Πairline T4
Querying: FOL and SQL Representation

                                    List all the airlines

                                                            Airport   code      city
  Flight     origin   destination   airline
                                                                      VIE     Vienna
              VIE        LHR         BA
                                                                      LHR     London
             LHR         EDI         BA
                                                                      LGW     London
             LGW         GLA         U2
                                                                      LCA    Larnaca
             LCA         VIE         OS
                                                                      GLA    Glasgow
                                                                      EDI    Edinburgh



       FOL Representation                              SQL Representation

                                                        SELECT airline
           9X9Y Flight(X,Y,Z)
                                                        FROM   Flight
                               Z is free
Querying: FOL and SQL Representation

                     List the codes of the airports in London

                                                  Airport   code      city
  Flight    origin   destination   airline
                                                            VIE     Vienna
             VIE        LHR         BA
                                                            LHR     London
            LHR         EDI         BA
                                                            LGW     London
            LGW         GLA         U2
                                                            LCA    Larnaca
            LCA         VIE         OS
                                                            GLA    Glasgow
                                                            EDI    Edinburgh



       FOL Representation                        SQL Representation

                                              SELECT code
           Airport(X,London)                  FROM   Airport
                                              WHERE city = “London”
Querying: FOL and SQL Representation

           Which airlines fly directly from London to Glasgow?

                                                   Airport   code      city
  Flight   origin    destination   airline
                                                             VIE     Vienna
            VIE         LHR         BA
                                                             LHR     London
           LHR          EDI         BA
                                                             LGW     London
           LGW          GLA         U2
                                                             LCA    Larnaca
           LCA          VIE         OS
                                                             GLA    Glasgow
                                                             EDI    Edinburgh


                                             SELECT F.airline
                                             FROM   Flight as F,
  9X9Y     Airport(X,London)       ^                Airport as A1,
           Airport(Y,Glasgow)      ^                Airport as A2
                                             WHERE A1.code = F.origin AND
           Flight(X,Y,Z)           ^
                                                    A2.code = F.destination AND
                                                    A1.city = “London” AND
                    Z is free                       A2.city = “Glasgow”
What we Cannot Ask?

                    Is Glasgow reachable from Vienna?

                           Edinburgh
                                                Glasgow


                           London


                                              Larnaca

                          Vienna



  Looks like this FOL query can do the job…

9X9Y9Z9W9V Airport(X,Vienna) ^ Airport(Y,Glasgow) ^ Flight(X,Z,W) ^ Flight(Z,Y,V)

                                                                          YES
What we Cannot Ask?

                    Is Glasgow reachable from Vienna?

                           Edinburgh
                                                 Glasgow


                           London             Intermediate


                                              Larnaca

                          Vienna



  Looks like this FOL query can do the job…

9X9Y9Z9W9V Airport(X,Vienna) ^ Airport(Y,Glasgow) ^ Flight(X,Z,W) ^ Flight(Z,Y,V)

                                                                           NO
What we Cannot Ask?

                Is Glasgow reachable from Vienna?

                        Edinburgh
                                           Glasgow


                       London           Intermediate


                                         Larnaca

                      Vienna




¡ List all the pairs of airports (A1,A2) such that A2 is reachable from A1

¡ Is their a pair (A1,A2) such that A1 is in Vienna and A2 is in Glasgow?
What we Cannot Ask?


     Flight   origin    destination   airline       Airport   code   city




¡ List all the pairs of airports (A1,A2) such that A2 is reachable from A1

                       Reachable(X,Y)  Flight(X,Y,Z)
                       Reachable(X,W)  Flight(X,Y,Z), Reachable(Y,W)



¡ Is their a pair (A1,A2) such that A1 is in Vienna and A2 is in Glasgow?

       Ans()  Airport(X,Vienna), Airport(Y,Glasgow), Reachable(X,Y)
What we Cannot Ask?


     Flight   origin    destination   airline       Airport   code   city




¡ List all the pairs of airports (A1,A2) such that A2 is reachable from A1

                       Reachable(X,Y)  Flight(X,Y,Z)
                       Reachable(X,W)  Flight(X,Y,Z), Reachable(Y,W)




                                       RECURSION

                              RA, FOL, SQL not enough
What we Cannot Ask?


     Flight   origin    destination   airline       Airport   code   city




¡ List all the pairs of airports (A1,A2) such that A2 is reachable from A1

                       Reachable(X,Y)  Flight(X,Y,Z)
                       Reachable(X,W)  Flight(X,Y,Z), Reachable(Y,W)




                                         DATALOG

                           Select-Project-Join + Recursion
Datalog at First Glance

 Transitive closure of a graph:

           TrClosure(X,Y)  Graph(X,Y)

           TrClosure(X,Y)  Graph(X,Z),TrClosure(Z,Y)




              A              B       C           D
Datalog at First Glance

 Transitive closure of a graph:

           TrClosure(X,Y)  Graph(X,Y)

           TrClosure(X,Y)  Graph(X,Z),TrClosure(Z,Y)


         Graph                           TrClosure
                     A   B                           A   B
                     B   C                           A   C
                     C   D                           A   D
                                                     B   C
                                                     B   D
                 A           B   C   D               C   D
Datalog at First Glance

¡ Semantics - a mapping from instances over body-relations to instances
              over head-relations

                              Graph                    TrClosure
                                      A B                          A   B
                                      B C                          A   C
                                      C D                          A   D
                                                                   B   C
                                                                   B   D
                                                                   C   D


¡ Equivalent approaches for defining the semantics:

  ¡ Model-Theoretic - logical sentences asserting a property of the result

  ¡ Fixpoint - solution of a fixpoint equation

  ¡ Proof-Theoretic - obtaining proofs of facts
Datalog



                   ¡ Formal Syntax


                   ¡ Fixpoint Semantics


                   ¡ Complexity Results




          Note: For details on the model-theoretic and proof-theoretic semantics see:
                     - Foundations of Databases by Abiteboul, Hull and Vianu
                     - Logic Programming and Databases by Ceri, Gottlob and Tanca
Syntax of Datalog

 A Datalog rule is an expression


                    R0(X0)  R1(X1),…,Rn(Xn)

                     head              body


         ¡ n ¸ 0 (empty body)

         ¡ R0,…,Rn are relation symbols (or predicates)

         ¡ X0,…,Xn are tuples of terms (constants or variables)

         ¡ Safe - each variable occurring in head must occur in body
Syntax of Datalog


 ¡ Datalog program P - a (finite) set of Datalog rules


 ¡ Extensional predicate - does not occur in the head of a rule of P


 ¡ Intensional predicate - occurs in the head of some rule of P


 ¡ edb(P) - extensional predicates of P


 ¡ idb(P) - intensional predicates of P


 ¡ sch(P) - the set of predicates edb(P) [ idb(P)
Syntax of Datalog


 ¡ Datalog program P - a (finite) set of Datalog rules


 ¡ Extensional predicate - does not occur in the head of a rule of P


 ¡ Intensional predicate - occurs in the head of some rule of P


 ¡ edb(P) - extensional predicates of P
                                              not necessary
 ¡ idb(P) - intensional predicates of P


 ¡ sch(P) - the set of predicates edb(P) [ idb(P)
Example: Recursive Program P

                      Is Glasgow reachable from Vienna?

    Flight   origin    destination   airline       Airport   code   city




  Reachable(X,Y)  Flight(X,Y,Z)
 Reachable(X,W)  Flight(X,Y,Z),Reachable(Y,W)
             Ans()  Airport(X,Vienna),Airport(Y,Glasgow),Reachable(X,Y)


                      dom(P) = {Vienna, Glasgow}
                      sch(P) = {Flight, Airport, Reachable, Ans}

                      edb(P) = {Flight, Airport}

                       idb(P) = {Reachable, Ans}
Fixpoint Semantics

¡ Relies on the immediate consequence operator TP


¡ Given a database D and a Datalog program P, an atom R(c1,…,cn)
¡ is an immediate consequence for D and P if:
   - R(c1,…,cn) 2 D, or

   - exists R0(X0)  R1(X1),…,Rn(Xn) in P, and a homomorphism h:

       {h(R1(X1)),…,h(Rn(Xn))} µ D and h(R0(X0)) = R(c1,…,cn)


¡ TP - mapping from databases for sch(P) to databases for sch(P)


¡ TP (D) = { immediate consequences for D and P }
Fixpoint Semantics


 ¡ A crucial fact - for each P and D for edb(P):

         TP has a minimum fixpoint containing D


                  I is a fixpoint of TP if TP(I) = I
                                       Note: For the proof see, e.g., Theorem 12.3.2
                                       in Foundations of Databases


 ¡ The semantics of P on D, denoted P(D), is this minimum fixpoint

 ¡ How do we compute it?
Fixpoint Semantics



  ¡ TP,0(I) = I    and      TP,i+1(I) = TP(TP,i(I))



  ¡ Compute TP,ω(I) = [i ¸ 0 TP,i(I)
Fixpoint Semantics: Example

¡ Let P the program which computes the transitive closure of a graph:

           TrClosure(X,Y)  Graph(X,Y)

           TrClosure(X,Y)  Graph(X,Z),TrClosure(Z,Y)


¡ Consider the input database D = {Graph(A,B),Graph(B,C),Graph(C,D)}


¡ TP,1(D) = D [ {TrClosure(A,B),TrClosure(B,C),TrClosure(C,D)}

¡ TP,2(D) = TP,1(D) [ {TrClosure(A,C),TrClosure(B,D)}

¡ TP,3(D) = TP,2(D) [ {TrClosure(A,D)}

¡ TP,4(D) = TP,3(D)
                                              A         B   C      D


¡ Thus, TP,ω(D) = TP,3(D)
Fixpoint Semantics

  ¡ TP,0(I) = I    and      TP,i+1(I) = TP(TP,i(I))



  ¡ Compute TP,ω(I) = [i ¸ 0 TP,i(I)



  ¡ TP,ω(D) is the minimum fixpoint of TP containing D



  ¡ TP,ω(D) is the µ-minimal model of P containing D


                                       Note: For the proof see, e.g., Theorem 12.3.4
                                       in Foundations of Databases
Model-Theoretic Approach

 Transitive closure of a graph:

            TrClosure(X,Y)  Graph(X,Y)

            TrClosure(X,Y)  Graph(X,Z),TrClosure(Z,Y)


Graph                     TrClosure                M
        A   B                         A   B              A   A
        B   C                         A   C              A   B
        C   D                         A   D              A   C
                                      B   C              A   D
                                      B   D
                                      C   D              D   D

                          µ-minimal model      not µ-minimal model
Complexity of Datalog


   ¡ Fact inference problem:

            - Input: program P, database D for edb(P), atom α

            - Question: P [ D ² α, or, equivalently, α 2 P(D)?




   ¡ Data complexity - P fixed, D part of the input
     [Vardi, STOC 1982]




   ¡ Combined complexity - both P and D part of the input
     [Vardi, STOC 1982]
Data Complexity of Datalog


Theorem: Datalog is PTIME-complete in data complexity


Proof (in PTIME):

¡ consider a program P and a database D

¡ |P(D)| · |sch(P)| ¢ (|dom(P) [ dom(D)|)maxarity
                                                     dom(P) - constants in P
                           maximum number of         dom(D) - constants in D
                         tuples using constants of
                             dom(P) [ dom(D)


¡ maxarity is a constant ) P(D) can be constructed in PTIME
Data Complexity of Datalog


Theorem: Datalog is PTIME-complete in data complexity


Proof (PTIME-hard): reduction from fact inference for propositional LP(2)
Data Complexity of Datalog


¡ Propositional LP(2) - set of rules R0  R1,R2



¡ Fact inference for propositional LP(2):

        - Input: propositional LP(2) program P, propositional atom Q

        - Question: P ² Q?
PTIME-hardness of LP(2)
Theorem: LP(2) is PTIME-hard

 Proof: Logspace reduction from Monotone Circuit Value Problem



                g6
                     Ç

       g4                     g5
            Æ             Ç



  g1                 g2            g3
  1                  0             1




Does the circuit evaluate to true?
PTIME-hardness of LP(2)
Theorem: LP(2) is PTIME-hard

 Proof: Logspace reduction from Monotone Circuit Value Problem


                                        encoding of the circuit as LP(2) program P
                g6
                     Ç
                                                        g6  g4
       g4                     g5                        g6  g5
            Æ             Ç
                                                        g4  g1, g2
                                                        g5  g2
                                                        g5  g3
  g1                 g2            g3
  1                  0             1
                                                        g1 
                                                        g3 


Does the circuit evaluate to true?          Circuit evaluates to true iff P ² g6
Data Complexity of Datalog


¡ Propositional LP(2) - set of rules R0  R1,R2



¡ Fact inference for propositional LP(2):

         - Input: propositional LP(2) program P, propositional atom Q

         - Question: P ² Q?



¡ PTIME-hard - we can also simulate a PTIME Turing machine
  see, e.g., [Dantsin, Eiter, Gottlob & Voronkov, ACM Computing Surveys 2001]
Data Complexity of Datalog

Theorem: Datalog is PTIME-complete in data complexity

Proof (PTIME-hard): reduction from fact inference for propositional LP(2)

¡ For each R0  add in D the atom True(R0)                   encode program
¡ For each R0  R1,R2 add in D the atom S(R0,R1,R2)          P in database D


¡ Construct the fixed Datalog program PDAT:


            T(X)  True(X)
                                              meta-interpreter for LP(2)
            T(Z)  T(X),T(Y),S(Z,X,Y)


¡ P ² Q iff T(Q) 2 PDAT(D)
Combined Complexity of Datalog


 Theorem: Datalog is EXPTIME-complete in combined complexity


 Proof (in EXPTIME):

        ¡ consider a program P and a database D

        ¡ |P(D)| · |sch(P)| ¢ (|dom(P) [ dom(D)|)maxarity

                                   maximum number of
                                 tuples using constants of
                                     dom(P) [ dom(D)


        ¡ P(D) can be constructed in EXPTIME
Combined Complexity of Datalog


 Theorem: Datalog is EXPTIME-complete in combined complexity


 Proof (EXPTIME-hard):




           by simulating an EXPTIME Turing machine
Deterministic Turing Machine (DTM)

                Sn{sacc} £ Σ ! S £ Σ £ {-1,0,1}

                                           accepting state


       M = (S, Σ, t, δ, s0, sacc)


       states    tape        blank     initial state
                symbols     symbol
Deterministic Turing Machine (DTM)

                       Sn{sacc} £ Σ ! S £ Σ £ {-1,0,1}

                                                  accepting state


           M = (S, Σ, t, δ, s0, sacc)


           states       tape        blank     initial state
                       symbols     symbol


  δ(s1,a) = (s2,b,1)

        IF at some time instant τ the machine is in sate s1, the cursor
                 points to cell κ, and this cell contains a
        THEN at instant τ+1 the machine is in state s2, cell κ contains b,
                 and the cursor points to cell κ+1
EXPTIME-hardness of Datalog




 The goal: encode the EXPTIME computation of a DTM M on input

string I with a Datalog program P, a database D, and an atom α such

 that α 2 P(D) iff M accepts I in at most N = 2m steps, where m =|I|k
The Relational Schema

Time points and tape positions from 0 to N-1, are encoded using m-ary tuples
{0,1}m (recall that N = 2m) such that 0 = (0,…,0), 1 = (0,…,1), …, N-1 = (1,…,1)


           ¡ Symbol[a](T,C) - at time instant T, cell C contains a

           ¡ Cursor(T,C) - at time instant T, cursor points to cell C

           ¡ State[s](T) - at time instant T, the machine is in state s

           ¡ Accept(T) - at time instant T, the machine accepts


   where T = T1,…,Tm and C = C1,…,Cm
Initialization Rules

¡ Assume that I = a1…an
¡ Assume that we have the relations Firstm, Succm and Ám (will be defined later)

                                the number i


                  Symbol[ai](T,ti)  Firstm(T)


                     Cursor(T,T)  Firstm(T)


                      State[s0](T)  Firstm(T)


                 Symbol[t](T,Y)  Firstm(T), Ám(t,C)

                                                      the number n
Transition Rules


                        δ(s1,a) = (s2,b,1)



Symbol[b](T0,C)  State[s1](T), Symbol[a](T,C), Cursor(T,C), Succm(T,T0)


  Cursor(T0,C0)  State[s1](T), Symbol[a](T,C), Cursor(T,C), Succm(T,T0),
                    Succm(C,C0)


   State[s2](T0)  State[s1](T), Symbol[a](T,C), Cursor(T,C), Succm(T,T0)
Inertia Rules




Cells which are not changed during the transition keep their old values




Symbol[a](T0,C)  Symbol[a](T,C), Cursor(T,C0), Ám(C,C0), Succm(T,T0)


Symbol[a](T0,C)  Symbol[a](T,C), Cursor(T,C0), Ám(C0,C), Succm(T,T0)
Accepting Rule




       Once we reach the accepting state we accept




                 Accept  State[sacc](T)
Defining Firstm, Succm and Ám

              We assume that D = {First0(0), Last1(1), Succ1(0,1)}

         Z 2 {0,1}

Succi+1(Z,X,Z,Y)  Succi(X,Y)

Succi+1(Z,X,W,Y)  Succ1(Z,W), Lasti(X), Firsti(Y)            inductive definition
                                                             of Firsti+1 and Succi+1
     Firsti+1(Z,X)  First1(Z), Firsti(X)
     Lasti+1(Z,X)  Last1(Z), Lasti(X)



        Ám(X,Y)  Succm(X,Y)
                                                               definition of Ám
        Ám(X,Y)  Succm(X,Z), Ám(Z,Y)
Concluding EXPTIME-hardness of Datalog




  ¡ Several rules but polynomially many ) feasible in PTIME



  ¡ Accept 2 P(D) iff M accepts I in at most N steps



  ¡ Can be formally shown by induction on the time steps
Datalog as an Ontology Language

¡ Ontology languages are usually based on description logics (prev. lecture)

¡ Much is possible with Datalog


            DL Axiom                             Datalog Rule

      Parent u Male v Father             Father(X)  Parent(X),Male(X)

   MetalDevice v 8hasPart.Metal      Metal(Y)  MetalDevice(X), hasPart(X,Y)

       brotherOf v relativeOf            relativeOf(X,Y)  brotherOf(X,Y)

        parentOf inv childOf              childOf(Y,X)  parentOf(X,Y)

         trans(ancestorOf)            ancOf(X,Z)  ancOf(X,Y), ancOf(Y,Z)

   SeniorEmp £ Emp v moreThan        moreThan(X,Y)  SeniorEmp(X), Emp(Y)
Datalog as an Ontology Language

¡ Ontology languages are usually based on description logics (prev. lecture)

¡ Much is not possible with Datalog
  [Patel-Schneider & Horrocks, Journal of Web Semantics 2007]




              DL Axiom                                           ?

       Employee v reportsTo                      9Y reportsTo(X,Y)  Employee(X)

           funct(reportsTo)                    Y = Z  reportsTo(X,Y), reportsTo(X,Z)

      Employee disj Customer                       ?  Employee(X), Customer(X)
Datalog§

  ¡ Extend Datalog by allowing in the head:

          - Existential quantification (9)
                                                   Datalog[9,=,?]
          - Equality atoms (=)

          - Constant false (?)



  ¡ As we shall see, Datalog[9] is undecidable



  ¡ Datalog[9,=,?] is syntactically restricted ! Datalog§
Datalog Extensions


          ¡ Formal Syntax of Datalog[9]


          ¡ Fixpoint Semantics of Datalog[9]


          ¡ Undecidability of Datalog[9]


          ¡ Guardedness, Linearity and Stickiness


          ¡ Additional Features (=,?)
Syntax of Datalog[9]

 A Datalog[9] rule is an expression


                Y R0(X0,Y)  R1(X1),…,Rn(Xn)

                    head                   body


          ¡ n ¸ 0 (empty body)

          ¡ R0,…,Rn are relation symbols (or predicates)

          ¡ X0,…,Xn are tuples of terms (constants or variables)

          ¡ Y is a tuple of variables (disjoint from X0 […[ Xn)
Syntax of Datalog[9]

  ¡ Datalog[9] program P - a (finite) set of Datalog[9] rules

  ¡ sch(P) - the set of predicates ocurring in P

  ¡ Note: sch(P) is no longer partitioned into idb(P) and edb(P) - why?
Syntax of Datalog[9]

  ¡ Datalog[9] program P - a (finite) set of Datalog[9] rules

  ¡ sch(P) - the set of predicates ocurring in P

  ¡ Note: sch(P) is no longer partitioned into idb(P) and edb(P) - why?


             DL Ontology            Datalog[9] Program P

          Person v hasFather    9Y hasFather(X,Y)  Person(X)

         hasFather ¡ v Person     Person(Y)  hasFather(X,Y)



     All predicates of sch(P) appear in the body and in the head
Fixpoint Semantics

 Analogous to the fixpoint semantics of Datalog - chase procedure


 Input: Database D, Datalog[9] program P
 Output: Instance for sch(P) that satisfies P



                             Person(John)



   9Y hasFather(X,Y)  Person(X)         Person(Y)  hasFather(X,Y)




   chase(D,P) = D [ ?
Fixpoint Semantics

 Analogous to the fixpoint semantics of Datalog - chase procedure


 Input: Database D, Datalog[9] program P
 Output: Instance for sch(P) that satisfies P



                             Person(John)



   9Y hasFather(X,Y)  Person(X)         Person(Y)  hasFather(X,Y)




   chase(D,P) = D [ {hasFather(John,z1)
Fixpoint Semantics

 Analogous to the fixpoint semantics of Datalog - chase procedure


 Input: Database D, Datalog[9] program P
 Output: Instance for sch(P) that satisfies P



                             Person(John)



   9Y hasFather(X,Y)  Person(X)         Person(Y)  hasFather(X,Y)




   chase(D,P) = D [ {hasFather(John,z1), Person(z1)
Fixpoint Semantics

 Analogous to the fixpoint semantics of Datalog - chase procedure


 Input: Database D, Datalog[9] program P
 Output: Instance for sch(P) that satisfies P



                             Person(John)



   9Y hasFather(X,Y)  Person(X)         Person(Y)  hasFather(X,Y)




   chase(D,P) = D [ {hasFather(John,z1), Person(z1), hasFather(z1,z2)
Fixpoint Semantics

 Analogous to the fixpoint semantics of Datalog - chase procedure


 Input: Database D, Datalog[9] program P
 Output: Instance for sch(P) that satisfies P



                             Person(John)



   9Y hasFather(X,Y)  Person(X)         Person(Y)  hasFather(X,Y)




   chase(D,P) = D [ {hasFather(John,z1), Person(z1), hasFather(z1,z2) …
Fixpoint Semantics

¡ Chase rule - the building block of the chase procedure



¡ A rule ρ = Y R0(X0,Y)  R1(X1),…,Rn(Xn) is applicable to instance I if:
   - exists homomorphism h such that {h(R1(X1)),…,h(Rn(Xn))} µ I

   - but no μ ¶ h such that μ(R0(X0,Y)) 2 Ι



              Ι = {S(a,b), R(a)}              Ι = {S(b,a), R(a)}
    μ = {X! a, Υ! b}
                                                    £
                               h = {X! a}                      h = {X! a}

              9Y S(X,Y)  R(X)                9Y S(X,Y)  R(X)
Fixpoint Semantics

¡ Chase rule - the building block of the chase procedure



¡ A rule ρ = Y R0(X0,Y)  R1(X1),…,Rn(Xn) is applicable to instance I if:
   - exists homomorphism h such that {h(R1(X1)),…,h(Rn(Xn))} µ I

   - but no μ ¶ h such that μ(R0(X0,Y)) 2 Ι



¡ Let J = I [ {μ(R0(X0,Y))}, where μ ¶ h and μ(Yi) is a fresh value not in I


                                                  ρ,h
¡ The result of applying ρ to Ι is J, denoted Ι         J - chase step
Fixpoint Semantics

¡ A finite chase of D w.r.t. to P is a finite sequence
                         ρ1,h1        ρ2,h2        ρ3,h3       ρm,hm
                     D           I1           I2                       Im
¡ and chase(D,P) is defined as the instance Im



¡ An infinite chase of D w.r.t. to P is an infinite sequence
                         ρ1,h1        ρ2,h2        ρ3,h3       ρm,hm
                     D           I1           I2                       Im
¡ and chase(D,P) is defined as the instance [j ¸ 0 Ij (with I0 = D)



¡ The semantics of P on D, denoted P(D), is defined as chase(D,P)
Chase: A Universal Model


                         C = chase(D,P)

                                   D

                  h1                             h2



                                                                   h2(C)
     h1(C)                  .     .      .
      I1
                                                                      I2


     8I (I model of D and P ) chase(D,P)                      hom          I)

             Implicit in [ Fagin, Kolaitis, Miller & Popa, Theoretical Computer Science 2005]
Chase: Uniqueness Property

¡ In general is not unique - depends on the order of rule application

      D = {R(a)}        ρ1 = 9Y S(Y)  R(X)               ρ2 = S(X)  R(X)

                   Solution1 = {R(a), S(z), S(a)}   ρ1 then ρ2

                   Solution2 = {R(a), S(a)}         ρ2 then ρ1


¡ Unique up to homomorphic equivalence


                          h12                       h23
            C1                         C2                           C3
                          h21                       h32
Chase: The Challenge of Infinity


  ¡ In general is infinite

                 D = {R(a,b)}     9Z R(Y,Z)  R(X,Y)

             Solution = {R(a,b),R(b,z1),R(z1,z2),R(z2,z3),…}




  ¡ For plain Datalog, the fixpoint semantics provides an algorithm




  ¡ The situation changes dramatically for Datalog[9] - undecidable
Undecidability of Datalog[9]


 Theorem: Datalog[9] is undecidable


  Proof :




  by simulating a deterministic Turing machine with an empty tape
Build an Infinite Grid

     c   H

    V
                            i-th horizontal line represents the

                             i-th configuration of the machine



                      Node(X)  Start(X)
                      Initial(X)  Start(X)
                                                         X        Y
  Start(c) fixes     9Y H(X,Y)  Node(X)
the starting point    Node(Y)  H(X,Y)
                     9Y V(X,Y)  Node(X)                 Z        W

                      Node(Y)  V(X,Y)
                       V(Y,W)  H(X,Y), H(Z,W), V(X,Z)
Initialization Rules


          t   t   t
     s0




                           Initial(Y)  Initial(X), H(X,Y)

                       Cursor[s0](X)  Start(X)

                       Symbol[t](X)  Initial(X)
Transition Rules


            a
       s1
                               δ(s1,a) = (s2,b,1)
                    s2
            b




  Cursor[s2](Z)  Cursor[s1](X), Symbol[a](X), V(X,Y), H(Y,Z)

  Symbol[b](Y)  Cursor[s1](X), Symbol[a](X), V(X,Y)

      Mark(X)  Cursor[s1](X), Symbol[a](X)
Inertia Rules

                MarkLeft                 MarkRight
                 a       b           c         d



                 a      b            c         d



      MarkRight(Y)  Mark(X), H(X,Y)

      MarkRight(Y)  MarkRight(X), H(X,Y)

      Symbol[a](Y)  MarkRight(X), Symbol[a](X), V(X,Y)


       We need similar rules for the cells before the cursor
Accepting Rule


        Once we reach the accepting state we accept




                 Accept  Cursor[sacc](X)




             Accept 2 P(D) iff the DTM accepts
Undecidability of Datalog[9]


 Theorem: Datalog[9] is undecidable


  Proof :




  by simulating a deterministic Turing machine with an empty tape




                                … syntactic restrictions are needed!
Guarded Datalog[9]

¡ Inspired by the guarded fragment of first-order logic
  [Andréka, van Benthem & Németi, Journal of Philosophical Logic 98]




¡ There exists a body-atom that contains all the body-variables - guard

     Manager(X)  Employee(X), supervisorOf(X,Y), Manager(Y)


¡ Chase has finite treewidth ) Guarded Datalog[9] is decidable
  [Calì, Gottlob & Kifer, KR 2008]
Treewidth of the Chase

¡ Tree decomposition - a mapping of a graph into a tree


                                                   ABC
      A       B       F

                                                   BCE       Treewidth = 2
      C               G

                                             CDE          BEG
      D       E       H
                                                   BFG           EGH



¡ Treewidth - number of graph vertices mapped to any treenode (in fact, -1)
Treewidth of the Chase


 ¡ An instance I can be represented as a graph - Gaifman graph


         R(a,b,c)
                                            a                 d
          S(c,d)                                     c
         T(c,d,e)                           b                 e



 ¡ Treewidth of I is defined as the treewidth of its Gaifman graph



 ¡ Chase has finite treewidth ) is a tree-like structure
Guarded Datalog[9]

¡ Inspired by the guarded fragment of first-order logic
  [Andréka, van Benthem & Németi, Journal of Philosophical Logic 98]




¡ There exists a body-atom that contains all the body-variables - guard

     Manager(X)  Employee(X), supervisorOf(X,Y), Manager(Y)


¡ Chase has finite treewidth ) Guarded Datalog[9] is decidable
  [Calì, Gottlob & Kifer, KR 08]




¡ What about the complexity of Guarded Datalog[9]? - now we consider
sdontological query answering (previous lecture)
Complexity of Guarded Datalog[9]

¡ Ontological query answering problem:

        - Input: program P, database D for sch(P), conjunctive query Q

                   TBox             ABox

        - Question: P [ D ² Q, or, equivalently, P(D) ² Q?



¡ Data complexity - P and Q fixed, D part of the input



¡ Combined complexity - everything part of the input
Ontological Query Answering: Example

  D                          P
                                 9Y hasFather(X,Y)  Person(X)
      Person(John)
                                  Person(Y)  hasFather(X,Y)



             P(D)
                     … Father(z,John) Person(z) …
Ontological Query Answering: Example

  D                          P
                                 9Y hasFather(X,Y)  Person(X)
      Person(John)
                                  Person(Y)  hasFather(X,Y)



             P(D)
                     … Father(z,John) Person(z) …



           Q1 ← Father(X,John), Person(X)
Ontological Query Answering: Example

  D                          P
                                 9Y hasFather(X,Y)  Person(X)
      Person(John)
                                  Person(Y)  hasFather(X,Y)



             P(D)
                     … Father(z,John) Person(z) …



           Q1 ← Father(X,John), Person(X)

           Q2 ← Father(John,X)
Data Complexity of Guarded Datalog[9]


Theorem: Guarded Datalog[9] is PTIME-complete in data complexity


Proof (in PTIME):

       ¡ Guarded Datalog[9] enjoys the bounded guard-depth property



       ¡ Construct in PTIME the finite part C of the guarded chase forest



       ¡ Evaluate the given query over C


                           [Calì, Gottlob & Lukasiewicz, Journal of Web Semantics 2012]
Bounded Guard-Depth Property



 ¡ Chase graph

                                          R(a,b)     S(b)

   D = {R(a,b), S(b)}
                                          R(z1,b)    S(a)


           ρ1 = 9Z R(Z,X)  R(X,Y),S(Y)   R(z2,z1)   S(z1)
   P=
          ρ2 = S(X)  R(X,Y)
                                          R(z3,z2)   S(z2)
Bounded Guard-Depth Property


 ¡ Guarded chase forest


    R(a,b)    S(b)                            R(a,b)     S(b)    0



   R(z1,b)    S(a)    restriction to guards   R(z1,b)    S(a)    1
                       and their children

   R(z2,z1)   S(z1)                           R(z2,z1)   S(z1)   2


   R(z3,z2)   S(z2)                           R(z3,z2)   S(z2)   3
Bounded Guard-Depth Property

                          D
Q


                                       constant depth
                                          w.r.t. D



                                  C

                guarded chase forest
                    of D w.r.t. P


               P(D) ² Q   )   C²Q
Data Complexity of Guarded Datalog[9]


Theorem: Guarded Datalog[9] is PTIME-complete in data complexity


Proof (in PTIME):

       ¡ Guarded Datalog[9] enjoys the bounded guard-depth property



       ¡ Construct in PTIME the finite part C of the guarded chase forest



       ¡ Evaluate the given query over C


                           [Calì, Gottlob & Lukasiewicz, Journal of Web Semantics 2012]
Data Complexity of Datalog

Theorem: Datalog[] is PTIME-complete in data complexity

Proof (PTIME-hard): reduction from fact inference for propositional LP(2)

¡ For each R0  add in D the atom True(R0)                   encode program
¡ For each R0  R1,R2 add in D the atom S(R0,R1,R2)          P in database D


¡ Construct the fixed Guarded Datalog[] program PDAT:


            T(X)  True(X)
                                              meta-interpreter for LP(2)
            T(Z)  T(X),T(Y),S(Z,X,Y)


¡ P ² Q iff T(Q) 2 PDAT(D)
Combined Complexity of Guarded Datalog[9]



Theorem: Guarded Datalog[9] is 2EXPTIME-complete in comb. complexity


Proof:

         ¡ Upper bound: alternating EXPSPACE algorithm



         ¡ Lower bound: by simulating an alternating EXPSPACE TM




                                                 [Calì, Gottlob & Kifer, KR 2008]
DLs vs Guarded Datalog[9]

ELH: Popular DL (for biological applications) with PTIME data complexity
[Baader, IJCAI 2003 and Rosati, DL 2007]




  ELH TBox                             Datalog[9] Representation

     AvB                                      B(X)  A(X)

  AuBvC                                    C(X)  A(X), B(X)

   9R.A v B                                B(X)  R(X,Y), A(Y)

   A v 9R.B        9Y Aux(X,Y)  A(X)        R(X,Y)  Aux(X,Y)   B(Y)  Aux(X,Y)

     RvS                                    S(X,Y)  R(X,Y)
DLs vs Guarded Datalog[9]


DL-Lite: Popular family of DLs with AC0 data complexity (OWL 2 QL)
[Calvanese, De Giacomo, Lembo, Lenzerini & Rosati, Journal of Automated Reasoning 2007]




               DL-LiteR TBox            Datalog[9] Representation

                    AvB                           B(X)  A(X)

                   A v 9R                      9Y R(X,Y)  A(X)

                   9R v A                        A(X)  R(X,Y)

                    RvS                         S(X,Y)  R(X,Y)




                          Note: Disjointness assertions are harmless - will be treated differently
Linear Datalog[9]


  ¡ Goal: Lightweight Datalog[9] fragment which is highly tractable



  ¡ There exists only one atom in the body

                              9Y hasFather(X,Y)  Person(X)



  ¡ Enjoys first-order rewritability
    [Calì, Gottlob & Lukasiewicz, Journal of Web Semantics 2012]
First-Order Rewritability


             Q                 P




  compilation
                     QP                            Q*
                                                                     evaluation
                               translation
                 first-order                      SQL
                    query
                                                                       D



                          8D (P(D) ² Q , D ² Q*)


     [Calvanese, De Giacomo, Lembo, Lenzerini & Rosati, Journal of Automated Reasonig 2007]
Bounded Derivation-Depth Property (BDDP)

                            D
Q


                                               constant depth
                                                  w.r.t. D



                                      C

                 chase graph of D w.r.t. P
              (not the guarded chase forest)


                P(D) ² Q    )    C²Q
Bounded Derivation-Depth Property (BDDP)

                                D
Q


                                                         constant depth
                                                            w.r.t. D



                                            C

                   chase graph of D w.r.t. P
                (not the guarded chase forest)


         Sufficient condition for first-order rewritability
Linear Datalog[9]


  ¡ Goal: Lightweight Datalog[9] fragment which is highly tractable



  ¡ There exists only one atom in the body

                              9Y hasFather(X,Y)  Person(X)



  ¡ Enjoys first-order rewritability
    [Calì, Gottlob & Lukasiewicz, Journal of Web Semantics 2012]




  ¡ What about the complexity of Linear Datalog[9]?
Data Complexity of Linear Datalog[9]

Theorem: Linear Datalog[9] is in AC0 in data complexity


Proof:

         ¡ We exploit first-order rewritability of Linear Datalog[9]



         ¡ Construct the first-order rewritten query QFO (in constant time)



         ¡ Evaluate QFO over the given database (in AC0)
           [Vardi, PODS 1995]



                                [Calì, Gottlob & Lukasiewicz, Journal of Web Semantics 2012]
Combined Complexity of Linear Datalog[9]


Theorem: Linear Datalog[9] is PSPACE-complete in comb. complexity


Proof:

         ¡ Upper bound: exploit the BDDP                 D
           implicit in [Johnson & Klug, JCSS 1984]
                                                     Q




                                                         ...
         ¡ Lower bound: by simulating a PSPACE Turing machine
PSPACE-hardness of Linear Datalog[9]
¡ Assume that the tape alphabet is {0,1,t}
¡ Suppose that M halts on I = a1…am using n = mk cells, for k > 0

¡ Initial configuration (the database D)

                         Config(sinit,a1,…,am,t,…,t,1,0,…,0)

                                              n-m      n-1

¡ Transition rule - δ(s1,a) = (s2,b,1)
                                                                    i-1    n-i

                                Config(s1,X1,…,Xi-1,a, Xi+1,…,Xn,0,…,0,1, 0,…,0)

  Config(s2,X1,…,Xi-1,b, Xi+1,…,Xn,0,…,0,1, 0,…,0) 

                                         i   n-i-1

¡ Accepting rule
                      Accept  Config(sacc,X1,…,Xn,Y1,…,Yn)
Datalog[9,=,?]: Overview




                                        DL-LiteR
           PTIME-complete      in AC0

          Guarded           Linear




          ELH
But…



   ¡ What about joins in rule bodies?

         9E Employee(E,D,P,A)  Runs(D,P), Area(P,A)




   ¡ What about the DL assertion concept product?

          biggerThan(E,M)  Elephant(E), Mouse(M)
Sticky Datalog[9]




    9E Employee(E,D,P,A)  Runs(D,P), Area(P,A)




     biggerThan(E,M)  Elephant(E), Mouse(M)
Sticky Datalog[9]



      9W T(X,Y,W)  R(X,Y),P(Y,Z)

       9W S(Y,W)  T(X,Y,Z)
Sticky Datalog[9]



      9W T(X,Y,W)  R(X,Y),P(Y,Z)

       9W S(Y,W)  T(X,Y,Z)




      9W T(X,Y,W)  R(X,Y),P(Y,Z)

       9W S(X,W)  T(X,Y,Z)
                 £
Data Complexity of Sticky Datalog[9]

Theorem: Sticky Datalog[9] is in AC0 in data complexity



Proof:

         ¡ Sticky Datalog[9] enjoys the BDDP              D
                                             Q




                                                          ...
         ¡ Thus, Sticky Datalog[9] is first-order rewritable ) in AC0


                                                 [Calì, Gottlob & Pieris, VLDB 2010]
Combined Complexity of Sticky Datalog[9]

Theorem: Sticky Datalog[9] is EXPTIME-complete in combined complexity

Proof:

 ¡ EXPTIME-membership: construct a proof of the query by applying
                       an APSPACE procedure            D




                                                    Q


 ¡ EXPTIME-hardness: fact inference for lossless Datalog programs
                     over {0,1} is EXPTIME-hard

          S(X,Y,Z)  P(X,Y),P(Y,Z),R(Z)
                                             [Calì, Gottlob & Pieris, VLDB 2010]
Datalog as an Ontology Language

¡ Ontology languages are based on description logics (previous lecture)

¡ Much is not possible with Datalog
  [Patel-Schneider & Horrocks, Journal of Web Semantics 2007]




            DL Axiom                                            ?

     Employee v reportsTo                      9Y reportsTo(X,Y)  Employee(X)

         funct(reportsTo)                    Y = Z  reportsTo(X,Y), reportsTo(X,Z)

    Employee disj Customer                       ?  Employee(X), Customer(X)
Equality Atom

                       Xi = Xj  R1(X1),…,Rn(Xn)


 ¡ Linear Datalog[9,=] is already undecidable
   implicit in [Chandra & Vardi, SIAM Journal on Computing 1985]



 ¡ Separability: given P = P [ P=,

                     8D8Q: D ² P=          )      P(D) ² Q         iff   P(D) ² Q

   [Calì, Lembo & Rosati, PODS 2003]




 ¡ Non-conflicting Datalog[9,=]: sufficient condition for separability
   see, e.g., [Calì, Gottlob & Lukasiewicz, Journal of Web Semantics 2012]
Truth Constant False


                    ?  R1(X1),…,Rn(Xn)


 ¡ Preliminary check without adding complexity - given P = P [ P?


                           P(D) ² Q

                               m
        P(D) ² Q    OR      P(D) ² Qρ, for some ρ 2 P?

                                                 Qρ ← body(ρ)
Non-Conflicting Datalog[9,=,?]: Overview

                                           DL-LiteR




   PTIME-complete      in AC0                 in AC0

   Guarded            Linear             Sticky




  ELH

             9W S(X,W)  R(X,Y,Y)   9W S(Y,W)  R(X,Y),P(Y,Z)
Further Reading (Partial List)
Guarded Datalog[9] (and extensions)
-   Andrea Calì, Georg Gottlob, and Michael Kifer. Taming the infinite chase: Query answering under expressive
    relational constraints. In Proceedings of KR, pages 70-80, 2008.
-   Andrea Calì, Georg Gottlob, and Thomas Lukasiewicz. A general Datalog-based framework for tractable query
    answering over ontologies. J. Web Sem., 14:57-83, 2012.
-   Jean-François Baget, Marie-Laure Mugnier, Sebastian Rudolph, and Michaël Thomazo. Walking the
    complexity lines for generalized guarded existential rules. In Proceedings of IJCAI, pages 712-717, 2011.

Sticky Datalog[9] (and extensions)
-   Andrea Calì, Georg Gottlob, Andreas Pieris. Advanced processing for ontological queries. PVLDB 3(1): 554-
    565, 2010.
-   Andrea Calì, Georg Gottlob, Andreas Pieris. Query answering under non-guarded rules in Datalog+/-. In
    Proceedings of RR, pages 1-17, 2010.
-   Georg Gottlob, Giorgio Orsi, Andreas Pieris. Ontological queries: Rewriting and optimization. In Proceedings
    of ICDE, pages 2-13, 2011.

Weakly-acyclic Datalog[9] (and extensions)
-   Ronald Fagin, Phokion G. Kolaitis, Renée J. Miller, Lucian Popa. Data exchange: Semantics and query answering.
    Theor. Comput. Sci., 336(1): 89-124, 2005.
-   Alin Deutsch, Alan Nash, Jeffrey B. Remmel. The chase revisited. In Proceedings of PODS, pages 149-158, 2008.
-   Bruno Marnette. Generalized schema-mappings: From termination to tractability. In Proceedings of PODS, pages
    13-22, 2009.
Shy Datalog[9]
-   Nicola Leone, Marco Manna, Giorgio Terracina, Pierfrancesco Veltri. Efficiently computable Datalog∃ Programs.
    In Proceedings of KR, 2012.

Más contenido relacionado

Destacado

De an-tttn-olap-slide
De an-tttn-olap-slideDe an-tttn-olap-slide
De an-tttn-olap-slideMan El
 
Data_Warehouse
Data_WarehouseData_Warehouse
Data_WarehouseThang Luu
 
Ch 12 O O D B Dvlpt
Ch 12  O O  D B  DvlptCh 12  O O  D B  Dvlpt
Ch 12 O O D B Dvlptguest8fdbdd
 
Tài liệu data warehouse vietsub
Tài liệu data warehouse  vietsubTài liệu data warehouse  vietsub
Tài liệu data warehouse vietsubhoangdat1361
 
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013Juan Sequeda
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEZalpa Rathod
 

Destacado (10)

Csdl hdt
Csdl hdtCsdl hdt
Csdl hdt
 
De an-tttn-olap-slide
De an-tttn-olap-slideDe an-tttn-olap-slide
De an-tttn-olap-slide
 
Data_Warehouse
Data_WarehouseData_Warehouse
Data_Warehouse
 
Ch 12 O O D B Dvlpt
Ch 12  O O  D B  DvlptCh 12  O O  D B  Dvlpt
Ch 12 O O D B Dvlpt
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Tài liệu data warehouse vietsub
Tài liệu data warehouse  vietsubTài liệu data warehouse  vietsub
Tài liệu data warehouse vietsub
 
Fuzzy logic &_inference_system
Fuzzy logic &_inference_systemFuzzy logic &_inference_system
Fuzzy logic &_inference_system
 
OLAP
OLAPOLAP
OLAP
 
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 

Más de Giorgio Orsi

Web Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseWeb Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseGiorgio Orsi
 
Fairhair.ai – alan turing institute june '17 (public)
Fairhair.ai – alan turing institute june '17 (public)Fairhair.ai – alan turing institute june '17 (public)
Fairhair.ai – alan turing institute june '17 (public)Giorgio Orsi
 
SAE: Structured Aspect Extraction
SAE: Structured Aspect ExtractionSAE: Structured Aspect Extraction
SAE: Structured Aspect ExtractionGiorgio Orsi
 
wadar_poster_final
wadar_poster_finalwadar_poster_final
wadar_poster_finalGiorgio Orsi
 
Query Rewriting and Optimization for Ontological Databases
Query Rewriting and Optimization for Ontological DatabasesQuery Rewriting and Optimization for Ontological Databases
Query Rewriting and Optimization for Ontological DatabasesGiorgio Orsi
 
ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014
ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014
ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014Giorgio Orsi
 
Heuristic Ranking in Tightly Coupled Probabilistic Description Logics
Heuristic Ranking in Tightly Coupled Probabilistic Description LogicsHeuristic Ranking in Tightly Coupled Probabilistic Description Logics
Heuristic Ranking in Tightly Coupled Probabilistic Description LogicsGiorgio Orsi
 
AMBER WWW 2012 Poster
AMBER WWW 2012 PosterAMBER WWW 2012 Poster
AMBER WWW 2012 PosterGiorgio Orsi
 
AMBER WWW 2012 (Demonstration)
AMBER WWW 2012 (Demonstration)AMBER WWW 2012 (Demonstration)
AMBER WWW 2012 (Demonstration)Giorgio Orsi
 
Nyaya: Semantic data markets: a flexible environment for knowledge management...
Nyaya: Semantic data markets: a flexible environment for knowledge management...Nyaya: Semantic data markets: a flexible environment for knowledge management...
Nyaya: Semantic data markets: a flexible environment for knowledge management...Giorgio Orsi
 
The Diadem Ontology
The Diadem OntologyThe Diadem Ontology
The Diadem OntologyGiorgio Orsi
 
AMBER presentation
AMBER presentationAMBER presentation
AMBER presentationGiorgio Orsi
 

Más de Giorgio Orsi (20)

Web Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseWeb Data Extraction: A Crash Course
Web Data Extraction: A Crash Course
 
Fairhair.ai – alan turing institute june '17 (public)
Fairhair.ai – alan turing institute june '17 (public)Fairhair.ai – alan turing institute june '17 (public)
Fairhair.ai – alan turing institute june '17 (public)
 
SAE: Structured Aspect Extraction
SAE: Structured Aspect ExtractionSAE: Structured Aspect Extraction
SAE: Structured Aspect Extraction
 
diadem-vldb-2015
diadem-vldb-2015diadem-vldb-2015
diadem-vldb-2015
 
wadar_poster_final
wadar_poster_finalwadar_poster_final
wadar_poster_final
 
Query Rewriting and Optimization for Ontological Databases
Query Rewriting and Optimization for Ontological DatabasesQuery Rewriting and Optimization for Ontological Databases
Query Rewriting and Optimization for Ontological Databases
 
ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014
ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014
ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014
 
Perv a ds-rr13
Perv a ds-rr13Perv a ds-rr13
Perv a ds-rr13
 
Heuristic Ranking in Tightly Coupled Probabilistic Description Logics
Heuristic Ranking in Tightly Coupled Probabilistic Description LogicsHeuristic Ranking in Tightly Coupled Probabilistic Description Logics
Heuristic Ranking in Tightly Coupled Probabilistic Description Logics
 
AMBER WWW 2012 Poster
AMBER WWW 2012 PosterAMBER WWW 2012 Poster
AMBER WWW 2012 Poster
 
AMBER WWW 2012 (Demonstration)
AMBER WWW 2012 (Demonstration)AMBER WWW 2012 (Demonstration)
AMBER WWW 2012 (Demonstration)
 
Nyaya: Semantic data markets: a flexible environment for knowledge management...
Nyaya: Semantic data markets: a flexible environment for knowledge management...Nyaya: Semantic data markets: a flexible environment for knowledge management...
Nyaya: Semantic data markets: a flexible environment for knowledge management...
 
Table Recognition
Table RecognitionTable Recognition
Table Recognition
 
The Diadem Ontology
The Diadem OntologyThe Diadem Ontology
The Diadem Ontology
 
Diadem 1.0
Diadem 1.0Diadem 1.0
Diadem 1.0
 
Oxpath vldb
Oxpath vldbOxpath vldb
Oxpath vldb
 
Gottlob ICDE 2011
Gottlob ICDE 2011Gottlob ICDE 2011
Gottlob ICDE 2011
 
OPAL Presentation
OPAL PresentationOPAL Presentation
OPAL Presentation
 
AMBER presentation
AMBER presentationAMBER presentation
AMBER presentation
 
Orsi PersDB11
Orsi PersDB11Orsi PersDB11
Orsi PersDB11
 

Datalog and its Extensions for Semantic Web Databases

  • 1. Datalog and its Extensions for Semantic Web Databases Georg Gottlob1 Giorgio Orsi1 Andreas Pieris1 Mantas Šimkus2 1Department of Computer Science, University of Oxford, UK 2Institute of Information Systems, Vienna University of Technology, Austria Reasoning Web Summer School, Vienna, Austria, September 3 - 8, 2012
  • 2. Structure of the Lecture ¡ Relational Databases and Datalog ¡ Complexity of Datalog ¡ Datalog and Ontological Reasoning ¡ Datalog§
  • 4. Relational Databases Predominant technology for data storage and processing Flight origin destination airline “On the fly” example: VIE LHR BA LHR EDI BA LGW GLA U2 LCA VIE OS Edinburgh Glasgow Airport code city VIE Vienna London LHR London LGW London Larnaca LCA Larnaca Vienna GLA Glasgow EDI Edinburgh
  • 5. Relational Databases (Terminology) Flight origin destination airline VIE LHR BA LHR EDI BA LGW GLA U2 LCA VIE OS Airport code city VIE Vienna Constants LHR London (from a domain) LGW London LCA Larnaca VIE, LHR, … GLA Glasgow BA, U2, OS EDI Edinburgh Vienna, London, …
  • 6. Relational Databases (Terminology) Flight origin destination airline VIE LHR BA Relations LHR EDI BA LGW GLA U2 LCA VIE OS Airport code city VIE Vienna Constants LHR London (from a domain) LGW London LCA Larnaca VIE, LHR, … GLA Glasgow BA, U2, OS EDI Edinburgh Vienna, London, …
  • 7. Relational Databases (Terminology) Flight origin destination airline VIE LHR BA Relations LHR EDI BA LGW GLA U2 LCA VIE OS Tuples Airport code city VIE Vienna Constants LHR London (from a domain) LGW London Relational atoms LCA Larnaca VIE, LHR, … GLA Glasgow Flight(LHR,EDI,BA) BA, U2, OS EDI Edinburgh Vienna, London, … Airport(LGW,London)
  • 8. Querying: Relational Algebra List all the airlines Airport code city Flight origin destination airline VIE Vienna VIE LHR BA LHR London LHR EDI BA LGW London LGW GLA U2 LCA Larnaca LCA VIE OS GLA Glasgow EDI Edinburgh {BA, U2, OS} Πairline Flight
  • 9. Querying: Relational Algebra List the codes of the airports in London Airport code city Flight origin destination airline VIE Vienna VIE LHR BA LHR London LHR EDI BA LGW London LGW GLA U2 LCA Larnaca LCA VIE OS GLA Glasgow EDI Edinburgh {LHR, LGW} Πcode (σcity = “London” Airport)
  • 10. Querying: Relational Algebra Which airlines fly directly from London to Glasgow? Airport code city Flight origin destination airline VIE Vienna VIE LHR BA LHR London LHR EDI BA LGW London LGW GLA U2 LCA Larnaca LCA VIE OS GLA Glasgow EDI Edinburgh T1 Ã σcity = “London” Airport T2 Ã σcity = “Glasgow” Airport
  • 11. Querying: Relational Algebra Which airlines fly directly from London to Glasgow? Flight origin destination airline T1 code city VIE LHR BA LHR London LHR EDI BA LGW London LGW GLA U2 LCA VIE OS T2 code city GLA Glasgow T1 Ã σcity = “London” Airport T2 Ã σcity = “Glasgow” Airport
  • 12. Querying: Relational Algebra Which airlines fly directly from London to Glasgow? Flight origin destination airline T1 code city VIE LHR BA LHR London LHR EDI BA LGW London LGW GLA U2 LCA VIE OS T2 code city GLA Glasgow T3 Ã Flight origin = code T1
  • 13. Querying: Relational Algebra Which airlines fly directly from London to Glasgow? T3 origin destination airline code city LHR EDI BA LHR London LGW GLA U2 LGW London T2 code city GLA Glasgow T3 Ã Flight origin = code T1
  • 14. Querying: Relational Algebra Which airlines fly directly from London to Glasgow? T3 origin destination airline code city LHR EDI BA LHR London LGW GLA U2 LGW London T2 code city GLA Glasgow T4 Ã T3 destination = code T2
  • 15. Querying: Relational Algebra Which airlines fly directly from London to Glasgow? T4 origin destination airline code city code city LGW GLA U2 LGW London GLA Glasgow T4 Ã T3 destination = code T2
  • 16. Querying: Relational Algebra Which airlines fly directly from London to Glasgow? T4 origin destination airline code city code city LGW GLA U2 LGW London GLA Glasgow Πairline T4
  • 17. Querying: FOL and SQL Representation List all the airlines Airport code city Flight origin destination airline VIE Vienna VIE LHR BA LHR London LHR EDI BA LGW London LGW GLA U2 LCA Larnaca LCA VIE OS GLA Glasgow EDI Edinburgh FOL Representation SQL Representation SELECT airline 9X9Y Flight(X,Y,Z) FROM Flight Z is free
  • 18. Querying: FOL and SQL Representation List the codes of the airports in London Airport code city Flight origin destination airline VIE Vienna VIE LHR BA LHR London LHR EDI BA LGW London LGW GLA U2 LCA Larnaca LCA VIE OS GLA Glasgow EDI Edinburgh FOL Representation SQL Representation SELECT code Airport(X,London) FROM Airport WHERE city = “London”
  • 19. Querying: FOL and SQL Representation Which airlines fly directly from London to Glasgow? Airport code city Flight origin destination airline VIE Vienna VIE LHR BA LHR London LHR EDI BA LGW London LGW GLA U2 LCA Larnaca LCA VIE OS GLA Glasgow EDI Edinburgh SELECT F.airline FROM Flight as F, 9X9Y Airport(X,London) ^ Airport as A1, Airport(Y,Glasgow) ^ Airport as A2 WHERE A1.code = F.origin AND Flight(X,Y,Z) ^ A2.code = F.destination AND A1.city = “London” AND Z is free A2.city = “Glasgow”
  • 20. What we Cannot Ask? Is Glasgow reachable from Vienna? Edinburgh Glasgow London Larnaca Vienna Looks like this FOL query can do the job… 9X9Y9Z9W9V Airport(X,Vienna) ^ Airport(Y,Glasgow) ^ Flight(X,Z,W) ^ Flight(Z,Y,V) YES
  • 21. What we Cannot Ask? Is Glasgow reachable from Vienna? Edinburgh Glasgow London Intermediate Larnaca Vienna Looks like this FOL query can do the job… 9X9Y9Z9W9V Airport(X,Vienna) ^ Airport(Y,Glasgow) ^ Flight(X,Z,W) ^ Flight(Z,Y,V) NO
  • 22. What we Cannot Ask? Is Glasgow reachable from Vienna? Edinburgh Glasgow London Intermediate Larnaca Vienna ¡ List all the pairs of airports (A1,A2) such that A2 is reachable from A1 ¡ Is their a pair (A1,A2) such that A1 is in Vienna and A2 is in Glasgow?
  • 23. What we Cannot Ask? Flight origin destination airline Airport code city ¡ List all the pairs of airports (A1,A2) such that A2 is reachable from A1 Reachable(X,Y)  Flight(X,Y,Z) Reachable(X,W)  Flight(X,Y,Z), Reachable(Y,W) ¡ Is their a pair (A1,A2) such that A1 is in Vienna and A2 is in Glasgow? Ans()  Airport(X,Vienna), Airport(Y,Glasgow), Reachable(X,Y)
  • 24. What we Cannot Ask? Flight origin destination airline Airport code city ¡ List all the pairs of airports (A1,A2) such that A2 is reachable from A1 Reachable(X,Y)  Flight(X,Y,Z) Reachable(X,W)  Flight(X,Y,Z), Reachable(Y,W) RECURSION RA, FOL, SQL not enough
  • 25. What we Cannot Ask? Flight origin destination airline Airport code city ¡ List all the pairs of airports (A1,A2) such that A2 is reachable from A1 Reachable(X,Y)  Flight(X,Y,Z) Reachable(X,W)  Flight(X,Y,Z), Reachable(Y,W) DATALOG Select-Project-Join + Recursion
  • 26. Datalog at First Glance Transitive closure of a graph: TrClosure(X,Y)  Graph(X,Y) TrClosure(X,Y)  Graph(X,Z),TrClosure(Z,Y) A B C D
  • 27. Datalog at First Glance Transitive closure of a graph: TrClosure(X,Y)  Graph(X,Y) TrClosure(X,Y)  Graph(X,Z),TrClosure(Z,Y) Graph TrClosure A B A B B C A C C D A D B C B D A B C D C D
  • 28. Datalog at First Glance ¡ Semantics - a mapping from instances over body-relations to instances over head-relations Graph TrClosure A B A B B C A C C D A D B C B D C D ¡ Equivalent approaches for defining the semantics: ¡ Model-Theoretic - logical sentences asserting a property of the result ¡ Fixpoint - solution of a fixpoint equation ¡ Proof-Theoretic - obtaining proofs of facts
  • 29. Datalog ¡ Formal Syntax ¡ Fixpoint Semantics ¡ Complexity Results Note: For details on the model-theoretic and proof-theoretic semantics see: - Foundations of Databases by Abiteboul, Hull and Vianu - Logic Programming and Databases by Ceri, Gottlob and Tanca
  • 30. Syntax of Datalog A Datalog rule is an expression R0(X0)  R1(X1),…,Rn(Xn) head body ¡ n ¸ 0 (empty body) ¡ R0,…,Rn are relation symbols (or predicates) ¡ X0,…,Xn are tuples of terms (constants or variables) ¡ Safe - each variable occurring in head must occur in body
  • 31. Syntax of Datalog ¡ Datalog program P - a (finite) set of Datalog rules ¡ Extensional predicate - does not occur in the head of a rule of P ¡ Intensional predicate - occurs in the head of some rule of P ¡ edb(P) - extensional predicates of P ¡ idb(P) - intensional predicates of P ¡ sch(P) - the set of predicates edb(P) [ idb(P)
  • 32. Syntax of Datalog ¡ Datalog program P - a (finite) set of Datalog rules ¡ Extensional predicate - does not occur in the head of a rule of P ¡ Intensional predicate - occurs in the head of some rule of P ¡ edb(P) - extensional predicates of P not necessary ¡ idb(P) - intensional predicates of P ¡ sch(P) - the set of predicates edb(P) [ idb(P)
  • 33. Example: Recursive Program P Is Glasgow reachable from Vienna? Flight origin destination airline Airport code city Reachable(X,Y)  Flight(X,Y,Z) Reachable(X,W)  Flight(X,Y,Z),Reachable(Y,W) Ans()  Airport(X,Vienna),Airport(Y,Glasgow),Reachable(X,Y) dom(P) = {Vienna, Glasgow} sch(P) = {Flight, Airport, Reachable, Ans} edb(P) = {Flight, Airport} idb(P) = {Reachable, Ans}
  • 34. Fixpoint Semantics ¡ Relies on the immediate consequence operator TP ¡ Given a database D and a Datalog program P, an atom R(c1,…,cn) ¡ is an immediate consequence for D and P if: - R(c1,…,cn) 2 D, or - exists R0(X0)  R1(X1),…,Rn(Xn) in P, and a homomorphism h: {h(R1(X1)),…,h(Rn(Xn))} µ D and h(R0(X0)) = R(c1,…,cn) ¡ TP - mapping from databases for sch(P) to databases for sch(P) ¡ TP (D) = { immediate consequences for D and P }
  • 35. Fixpoint Semantics ¡ A crucial fact - for each P and D for edb(P): TP has a minimum fixpoint containing D I is a fixpoint of TP if TP(I) = I Note: For the proof see, e.g., Theorem 12.3.2 in Foundations of Databases ¡ The semantics of P on D, denoted P(D), is this minimum fixpoint ¡ How do we compute it?
  • 36. Fixpoint Semantics ¡ TP,0(I) = I and TP,i+1(I) = TP(TP,i(I)) ¡ Compute TP,ω(I) = [i ¸ 0 TP,i(I)
  • 37. Fixpoint Semantics: Example ¡ Let P the program which computes the transitive closure of a graph: TrClosure(X,Y)  Graph(X,Y) TrClosure(X,Y)  Graph(X,Z),TrClosure(Z,Y) ¡ Consider the input database D = {Graph(A,B),Graph(B,C),Graph(C,D)} ¡ TP,1(D) = D [ {TrClosure(A,B),TrClosure(B,C),TrClosure(C,D)} ¡ TP,2(D) = TP,1(D) [ {TrClosure(A,C),TrClosure(B,D)} ¡ TP,3(D) = TP,2(D) [ {TrClosure(A,D)} ¡ TP,4(D) = TP,3(D) A B C D ¡ Thus, TP,ω(D) = TP,3(D)
  • 38. Fixpoint Semantics ¡ TP,0(I) = I and TP,i+1(I) = TP(TP,i(I)) ¡ Compute TP,ω(I) = [i ¸ 0 TP,i(I) ¡ TP,ω(D) is the minimum fixpoint of TP containing D ¡ TP,ω(D) is the µ-minimal model of P containing D Note: For the proof see, e.g., Theorem 12.3.4 in Foundations of Databases
  • 39. Model-Theoretic Approach Transitive closure of a graph: TrClosure(X,Y)  Graph(X,Y) TrClosure(X,Y)  Graph(X,Z),TrClosure(Z,Y) Graph TrClosure M A B A B A A B C A C A B C D A D A C B C A D B D C D D D µ-minimal model not µ-minimal model
  • 40. Complexity of Datalog ¡ Fact inference problem: - Input: program P, database D for edb(P), atom α - Question: P [ D ² α, or, equivalently, α 2 P(D)? ¡ Data complexity - P fixed, D part of the input [Vardi, STOC 1982] ¡ Combined complexity - both P and D part of the input [Vardi, STOC 1982]
  • 41. Data Complexity of Datalog Theorem: Datalog is PTIME-complete in data complexity Proof (in PTIME): ¡ consider a program P and a database D ¡ |P(D)| · |sch(P)| ¢ (|dom(P) [ dom(D)|)maxarity dom(P) - constants in P maximum number of dom(D) - constants in D tuples using constants of dom(P) [ dom(D) ¡ maxarity is a constant ) P(D) can be constructed in PTIME
  • 42. Data Complexity of Datalog Theorem: Datalog is PTIME-complete in data complexity Proof (PTIME-hard): reduction from fact inference for propositional LP(2)
  • 43. Data Complexity of Datalog ¡ Propositional LP(2) - set of rules R0  R1,R2 ¡ Fact inference for propositional LP(2): - Input: propositional LP(2) program P, propositional atom Q - Question: P ² Q?
  • 44. PTIME-hardness of LP(2) Theorem: LP(2) is PTIME-hard Proof: Logspace reduction from Monotone Circuit Value Problem g6 Ç g4 g5 Æ Ç g1 g2 g3 1 0 1 Does the circuit evaluate to true?
  • 45. PTIME-hardness of LP(2) Theorem: LP(2) is PTIME-hard Proof: Logspace reduction from Monotone Circuit Value Problem encoding of the circuit as LP(2) program P g6 Ç g6  g4 g4 g5 g6  g5 Æ Ç g4  g1, g2 g5  g2 g5  g3 g1 g2 g3 1 0 1 g1  g3  Does the circuit evaluate to true? Circuit evaluates to true iff P ² g6
  • 46. Data Complexity of Datalog ¡ Propositional LP(2) - set of rules R0  R1,R2 ¡ Fact inference for propositional LP(2): - Input: propositional LP(2) program P, propositional atom Q - Question: P ² Q? ¡ PTIME-hard - we can also simulate a PTIME Turing machine see, e.g., [Dantsin, Eiter, Gottlob & Voronkov, ACM Computing Surveys 2001]
  • 47. Data Complexity of Datalog Theorem: Datalog is PTIME-complete in data complexity Proof (PTIME-hard): reduction from fact inference for propositional LP(2) ¡ For each R0  add in D the atom True(R0) encode program ¡ For each R0  R1,R2 add in D the atom S(R0,R1,R2) P in database D ¡ Construct the fixed Datalog program PDAT: T(X)  True(X) meta-interpreter for LP(2) T(Z)  T(X),T(Y),S(Z,X,Y) ¡ P ² Q iff T(Q) 2 PDAT(D)
  • 48. Combined Complexity of Datalog Theorem: Datalog is EXPTIME-complete in combined complexity Proof (in EXPTIME): ¡ consider a program P and a database D ¡ |P(D)| · |sch(P)| ¢ (|dom(P) [ dom(D)|)maxarity maximum number of tuples using constants of dom(P) [ dom(D) ¡ P(D) can be constructed in EXPTIME
  • 49. Combined Complexity of Datalog Theorem: Datalog is EXPTIME-complete in combined complexity Proof (EXPTIME-hard): by simulating an EXPTIME Turing machine
  • 50. Deterministic Turing Machine (DTM) Sn{sacc} £ Σ ! S £ Σ £ {-1,0,1} accepting state M = (S, Σ, t, δ, s0, sacc) states tape blank initial state symbols symbol
  • 51. Deterministic Turing Machine (DTM) Sn{sacc} £ Σ ! S £ Σ £ {-1,0,1} accepting state M = (S, Σ, t, δ, s0, sacc) states tape blank initial state symbols symbol δ(s1,a) = (s2,b,1) IF at some time instant τ the machine is in sate s1, the cursor points to cell κ, and this cell contains a THEN at instant τ+1 the machine is in state s2, cell κ contains b, and the cursor points to cell κ+1
  • 52. EXPTIME-hardness of Datalog The goal: encode the EXPTIME computation of a DTM M on input string I with a Datalog program P, a database D, and an atom α such that α 2 P(D) iff M accepts I in at most N = 2m steps, where m =|I|k
  • 53. The Relational Schema Time points and tape positions from 0 to N-1, are encoded using m-ary tuples {0,1}m (recall that N = 2m) such that 0 = (0,…,0), 1 = (0,…,1), …, N-1 = (1,…,1) ¡ Symbol[a](T,C) - at time instant T, cell C contains a ¡ Cursor(T,C) - at time instant T, cursor points to cell C ¡ State[s](T) - at time instant T, the machine is in state s ¡ Accept(T) - at time instant T, the machine accepts where T = T1,…,Tm and C = C1,…,Cm
  • 54. Initialization Rules ¡ Assume that I = a1…an ¡ Assume that we have the relations Firstm, Succm and Ám (will be defined later) the number i Symbol[ai](T,ti)  Firstm(T) Cursor(T,T)  Firstm(T) State[s0](T)  Firstm(T) Symbol[t](T,Y)  Firstm(T), Ám(t,C) the number n
  • 55. Transition Rules δ(s1,a) = (s2,b,1) Symbol[b](T0,C)  State[s1](T), Symbol[a](T,C), Cursor(T,C), Succm(T,T0) Cursor(T0,C0)  State[s1](T), Symbol[a](T,C), Cursor(T,C), Succm(T,T0), Succm(C,C0) State[s2](T0)  State[s1](T), Symbol[a](T,C), Cursor(T,C), Succm(T,T0)
  • 56. Inertia Rules Cells which are not changed during the transition keep their old values Symbol[a](T0,C)  Symbol[a](T,C), Cursor(T,C0), Ám(C,C0), Succm(T,T0) Symbol[a](T0,C)  Symbol[a](T,C), Cursor(T,C0), Ám(C0,C), Succm(T,T0)
  • 57. Accepting Rule Once we reach the accepting state we accept Accept  State[sacc](T)
  • 58. Defining Firstm, Succm and Ám We assume that D = {First0(0), Last1(1), Succ1(0,1)} Z 2 {0,1} Succi+1(Z,X,Z,Y)  Succi(X,Y) Succi+1(Z,X,W,Y)  Succ1(Z,W), Lasti(X), Firsti(Y) inductive definition of Firsti+1 and Succi+1 Firsti+1(Z,X)  First1(Z), Firsti(X) Lasti+1(Z,X)  Last1(Z), Lasti(X) Ám(X,Y)  Succm(X,Y) definition of Ám Ám(X,Y)  Succm(X,Z), Ám(Z,Y)
  • 59. Concluding EXPTIME-hardness of Datalog ¡ Several rules but polynomially many ) feasible in PTIME ¡ Accept 2 P(D) iff M accepts I in at most N steps ¡ Can be formally shown by induction on the time steps
  • 60. Datalog as an Ontology Language ¡ Ontology languages are usually based on description logics (prev. lecture) ¡ Much is possible with Datalog DL Axiom Datalog Rule Parent u Male v Father Father(X)  Parent(X),Male(X) MetalDevice v 8hasPart.Metal Metal(Y)  MetalDevice(X), hasPart(X,Y) brotherOf v relativeOf relativeOf(X,Y)  brotherOf(X,Y) parentOf inv childOf childOf(Y,X)  parentOf(X,Y) trans(ancestorOf) ancOf(X,Z)  ancOf(X,Y), ancOf(Y,Z) SeniorEmp £ Emp v moreThan moreThan(X,Y)  SeniorEmp(X), Emp(Y)
  • 61. Datalog as an Ontology Language ¡ Ontology languages are usually based on description logics (prev. lecture) ¡ Much is not possible with Datalog [Patel-Schneider & Horrocks, Journal of Web Semantics 2007] DL Axiom ? Employee v reportsTo 9Y reportsTo(X,Y)  Employee(X) funct(reportsTo) Y = Z  reportsTo(X,Y), reportsTo(X,Z) Employee disj Customer ?  Employee(X), Customer(X)
  • 62. Datalog§ ¡ Extend Datalog by allowing in the head: - Existential quantification (9) Datalog[9,=,?] - Equality atoms (=) - Constant false (?) ¡ As we shall see, Datalog[9] is undecidable ¡ Datalog[9,=,?] is syntactically restricted ! Datalog§
  • 63. Datalog Extensions ¡ Formal Syntax of Datalog[9] ¡ Fixpoint Semantics of Datalog[9] ¡ Undecidability of Datalog[9] ¡ Guardedness, Linearity and Stickiness ¡ Additional Features (=,?)
  • 64. Syntax of Datalog[9] A Datalog[9] rule is an expression Y R0(X0,Y)  R1(X1),…,Rn(Xn) head body ¡ n ¸ 0 (empty body) ¡ R0,…,Rn are relation symbols (or predicates) ¡ X0,…,Xn are tuples of terms (constants or variables) ¡ Y is a tuple of variables (disjoint from X0 […[ Xn)
  • 65. Syntax of Datalog[9] ¡ Datalog[9] program P - a (finite) set of Datalog[9] rules ¡ sch(P) - the set of predicates ocurring in P ¡ Note: sch(P) is no longer partitioned into idb(P) and edb(P) - why?
  • 66. Syntax of Datalog[9] ¡ Datalog[9] program P - a (finite) set of Datalog[9] rules ¡ sch(P) - the set of predicates ocurring in P ¡ Note: sch(P) is no longer partitioned into idb(P) and edb(P) - why? DL Ontology Datalog[9] Program P Person v hasFather 9Y hasFather(X,Y)  Person(X) hasFather ¡ v Person Person(Y)  hasFather(X,Y) All predicates of sch(P) appear in the body and in the head
  • 67. Fixpoint Semantics Analogous to the fixpoint semantics of Datalog - chase procedure Input: Database D, Datalog[9] program P Output: Instance for sch(P) that satisfies P Person(John) 9Y hasFather(X,Y)  Person(X) Person(Y)  hasFather(X,Y) chase(D,P) = D [ ?
  • 68. Fixpoint Semantics Analogous to the fixpoint semantics of Datalog - chase procedure Input: Database D, Datalog[9] program P Output: Instance for sch(P) that satisfies P Person(John) 9Y hasFather(X,Y)  Person(X) Person(Y)  hasFather(X,Y) chase(D,P) = D [ {hasFather(John,z1)
  • 69. Fixpoint Semantics Analogous to the fixpoint semantics of Datalog - chase procedure Input: Database D, Datalog[9] program P Output: Instance for sch(P) that satisfies P Person(John) 9Y hasFather(X,Y)  Person(X) Person(Y)  hasFather(X,Y) chase(D,P) = D [ {hasFather(John,z1), Person(z1)
  • 70. Fixpoint Semantics Analogous to the fixpoint semantics of Datalog - chase procedure Input: Database D, Datalog[9] program P Output: Instance for sch(P) that satisfies P Person(John) 9Y hasFather(X,Y)  Person(X) Person(Y)  hasFather(X,Y) chase(D,P) = D [ {hasFather(John,z1), Person(z1), hasFather(z1,z2)
  • 71. Fixpoint Semantics Analogous to the fixpoint semantics of Datalog - chase procedure Input: Database D, Datalog[9] program P Output: Instance for sch(P) that satisfies P Person(John) 9Y hasFather(X,Y)  Person(X) Person(Y)  hasFather(X,Y) chase(D,P) = D [ {hasFather(John,z1), Person(z1), hasFather(z1,z2) …
  • 72. Fixpoint Semantics ¡ Chase rule - the building block of the chase procedure ¡ A rule ρ = Y R0(X0,Y)  R1(X1),…,Rn(Xn) is applicable to instance I if: - exists homomorphism h such that {h(R1(X1)),…,h(Rn(Xn))} µ I - but no μ ¶ h such that μ(R0(X0,Y)) 2 Ι Ι = {S(a,b), R(a)} Ι = {S(b,a), R(a)} μ = {X! a, Υ! b} £ h = {X! a} h = {X! a} 9Y S(X,Y)  R(X) 9Y S(X,Y)  R(X)
  • 73. Fixpoint Semantics ¡ Chase rule - the building block of the chase procedure ¡ A rule ρ = Y R0(X0,Y)  R1(X1),…,Rn(Xn) is applicable to instance I if: - exists homomorphism h such that {h(R1(X1)),…,h(Rn(Xn))} µ I - but no μ ¶ h such that μ(R0(X0,Y)) 2 Ι ¡ Let J = I [ {μ(R0(X0,Y))}, where μ ¶ h and μ(Yi) is a fresh value not in I ρ,h ¡ The result of applying ρ to Ι is J, denoted Ι J - chase step
  • 74. Fixpoint Semantics ¡ A finite chase of D w.r.t. to P is a finite sequence ρ1,h1 ρ2,h2 ρ3,h3 ρm,hm D I1 I2 Im ¡ and chase(D,P) is defined as the instance Im ¡ An infinite chase of D w.r.t. to P is an infinite sequence ρ1,h1 ρ2,h2 ρ3,h3 ρm,hm D I1 I2 Im ¡ and chase(D,P) is defined as the instance [j ¸ 0 Ij (with I0 = D) ¡ The semantics of P on D, denoted P(D), is defined as chase(D,P)
  • 75. Chase: A Universal Model C = chase(D,P) D h1 h2 h2(C) h1(C) . . . I1 I2 8I (I model of D and P ) chase(D,P) hom I) Implicit in [ Fagin, Kolaitis, Miller & Popa, Theoretical Computer Science 2005]
  • 76. Chase: Uniqueness Property ¡ In general is not unique - depends on the order of rule application D = {R(a)} ρ1 = 9Y S(Y)  R(X) ρ2 = S(X)  R(X) Solution1 = {R(a), S(z), S(a)} ρ1 then ρ2 Solution2 = {R(a), S(a)} ρ2 then ρ1 ¡ Unique up to homomorphic equivalence h12 h23 C1 C2 C3 h21 h32
  • 77. Chase: The Challenge of Infinity ¡ In general is infinite D = {R(a,b)} 9Z R(Y,Z)  R(X,Y) Solution = {R(a,b),R(b,z1),R(z1,z2),R(z2,z3),…} ¡ For plain Datalog, the fixpoint semantics provides an algorithm ¡ The situation changes dramatically for Datalog[9] - undecidable
  • 78. Undecidability of Datalog[9] Theorem: Datalog[9] is undecidable Proof : by simulating a deterministic Turing machine with an empty tape
  • 79. Build an Infinite Grid c H V i-th horizontal line represents the i-th configuration of the machine Node(X)  Start(X) Initial(X)  Start(X) X Y Start(c) fixes 9Y H(X,Y)  Node(X) the starting point Node(Y)  H(X,Y) 9Y V(X,Y)  Node(X) Z W Node(Y)  V(X,Y) V(Y,W)  H(X,Y), H(Z,W), V(X,Z)
  • 80. Initialization Rules t t t s0 Initial(Y)  Initial(X), H(X,Y) Cursor[s0](X)  Start(X) Symbol[t](X)  Initial(X)
  • 81. Transition Rules a s1 δ(s1,a) = (s2,b,1) s2 b Cursor[s2](Z)  Cursor[s1](X), Symbol[a](X), V(X,Y), H(Y,Z) Symbol[b](Y)  Cursor[s1](X), Symbol[a](X), V(X,Y) Mark(X)  Cursor[s1](X), Symbol[a](X)
  • 82. Inertia Rules MarkLeft MarkRight a b c d a b c d MarkRight(Y)  Mark(X), H(X,Y) MarkRight(Y)  MarkRight(X), H(X,Y) Symbol[a](Y)  MarkRight(X), Symbol[a](X), V(X,Y) We need similar rules for the cells before the cursor
  • 83. Accepting Rule Once we reach the accepting state we accept Accept  Cursor[sacc](X) Accept 2 P(D) iff the DTM accepts
  • 84. Undecidability of Datalog[9] Theorem: Datalog[9] is undecidable Proof : by simulating a deterministic Turing machine with an empty tape … syntactic restrictions are needed!
  • 85. Guarded Datalog[9] ¡ Inspired by the guarded fragment of first-order logic [Andréka, van Benthem & Németi, Journal of Philosophical Logic 98] ¡ There exists a body-atom that contains all the body-variables - guard Manager(X)  Employee(X), supervisorOf(X,Y), Manager(Y) ¡ Chase has finite treewidth ) Guarded Datalog[9] is decidable [Calì, Gottlob & Kifer, KR 2008]
  • 86. Treewidth of the Chase ¡ Tree decomposition - a mapping of a graph into a tree ABC A B F BCE Treewidth = 2 C G CDE BEG D E H BFG EGH ¡ Treewidth - number of graph vertices mapped to any treenode (in fact, -1)
  • 87. Treewidth of the Chase ¡ An instance I can be represented as a graph - Gaifman graph R(a,b,c) a d S(c,d) c T(c,d,e) b e ¡ Treewidth of I is defined as the treewidth of its Gaifman graph ¡ Chase has finite treewidth ) is a tree-like structure
  • 88. Guarded Datalog[9] ¡ Inspired by the guarded fragment of first-order logic [Andréka, van Benthem & Németi, Journal of Philosophical Logic 98] ¡ There exists a body-atom that contains all the body-variables - guard Manager(X)  Employee(X), supervisorOf(X,Y), Manager(Y) ¡ Chase has finite treewidth ) Guarded Datalog[9] is decidable [Calì, Gottlob & Kifer, KR 08] ¡ What about the complexity of Guarded Datalog[9]? - now we consider sdontological query answering (previous lecture)
  • 89. Complexity of Guarded Datalog[9] ¡ Ontological query answering problem: - Input: program P, database D for sch(P), conjunctive query Q TBox ABox - Question: P [ D ² Q, or, equivalently, P(D) ² Q? ¡ Data complexity - P and Q fixed, D part of the input ¡ Combined complexity - everything part of the input
  • 90. Ontological Query Answering: Example D P 9Y hasFather(X,Y)  Person(X) Person(John) Person(Y)  hasFather(X,Y) P(D) … Father(z,John) Person(z) …
  • 91. Ontological Query Answering: Example D P 9Y hasFather(X,Y)  Person(X) Person(John) Person(Y)  hasFather(X,Y) P(D) … Father(z,John) Person(z) … Q1 ← Father(X,John), Person(X)
  • 92. Ontological Query Answering: Example D P 9Y hasFather(X,Y)  Person(X) Person(John) Person(Y)  hasFather(X,Y) P(D) … Father(z,John) Person(z) … Q1 ← Father(X,John), Person(X) Q2 ← Father(John,X)
  • 93. Data Complexity of Guarded Datalog[9] Theorem: Guarded Datalog[9] is PTIME-complete in data complexity Proof (in PTIME): ¡ Guarded Datalog[9] enjoys the bounded guard-depth property ¡ Construct in PTIME the finite part C of the guarded chase forest ¡ Evaluate the given query over C [Calì, Gottlob & Lukasiewicz, Journal of Web Semantics 2012]
  • 94. Bounded Guard-Depth Property ¡ Chase graph R(a,b) S(b) D = {R(a,b), S(b)} R(z1,b) S(a) ρ1 = 9Z R(Z,X)  R(X,Y),S(Y) R(z2,z1) S(z1) P= ρ2 = S(X)  R(X,Y) R(z3,z2) S(z2)
  • 95. Bounded Guard-Depth Property ¡ Guarded chase forest R(a,b) S(b) R(a,b) S(b) 0 R(z1,b) S(a) restriction to guards R(z1,b) S(a) 1 and their children R(z2,z1) S(z1) R(z2,z1) S(z1) 2 R(z3,z2) S(z2) R(z3,z2) S(z2) 3
  • 96. Bounded Guard-Depth Property D Q constant depth w.r.t. D C guarded chase forest of D w.r.t. P P(D) ² Q ) C²Q
  • 97. Data Complexity of Guarded Datalog[9] Theorem: Guarded Datalog[9] is PTIME-complete in data complexity Proof (in PTIME): ¡ Guarded Datalog[9] enjoys the bounded guard-depth property ¡ Construct in PTIME the finite part C of the guarded chase forest ¡ Evaluate the given query over C [Calì, Gottlob & Lukasiewicz, Journal of Web Semantics 2012]
  • 98. Data Complexity of Datalog Theorem: Datalog[] is PTIME-complete in data complexity Proof (PTIME-hard): reduction from fact inference for propositional LP(2) ¡ For each R0  add in D the atom True(R0) encode program ¡ For each R0  R1,R2 add in D the atom S(R0,R1,R2) P in database D ¡ Construct the fixed Guarded Datalog[] program PDAT: T(X)  True(X) meta-interpreter for LP(2) T(Z)  T(X),T(Y),S(Z,X,Y) ¡ P ² Q iff T(Q) 2 PDAT(D)
  • 99. Combined Complexity of Guarded Datalog[9] Theorem: Guarded Datalog[9] is 2EXPTIME-complete in comb. complexity Proof: ¡ Upper bound: alternating EXPSPACE algorithm ¡ Lower bound: by simulating an alternating EXPSPACE TM [Calì, Gottlob & Kifer, KR 2008]
  • 100. DLs vs Guarded Datalog[9] ELH: Popular DL (for biological applications) with PTIME data complexity [Baader, IJCAI 2003 and Rosati, DL 2007] ELH TBox Datalog[9] Representation AvB B(X)  A(X) AuBvC C(X)  A(X), B(X) 9R.A v B B(X)  R(X,Y), A(Y) A v 9R.B 9Y Aux(X,Y)  A(X) R(X,Y)  Aux(X,Y) B(Y)  Aux(X,Y) RvS S(X,Y)  R(X,Y)
  • 101. DLs vs Guarded Datalog[9] DL-Lite: Popular family of DLs with AC0 data complexity (OWL 2 QL) [Calvanese, De Giacomo, Lembo, Lenzerini & Rosati, Journal of Automated Reasoning 2007] DL-LiteR TBox Datalog[9] Representation AvB B(X)  A(X) A v 9R 9Y R(X,Y)  A(X) 9R v A A(X)  R(X,Y) RvS S(X,Y)  R(X,Y) Note: Disjointness assertions are harmless - will be treated differently
  • 102. Linear Datalog[9] ¡ Goal: Lightweight Datalog[9] fragment which is highly tractable ¡ There exists only one atom in the body 9Y hasFather(X,Y)  Person(X) ¡ Enjoys first-order rewritability [Calì, Gottlob & Lukasiewicz, Journal of Web Semantics 2012]
  • 103. First-Order Rewritability Q P compilation QP Q* evaluation translation first-order SQL query D 8D (P(D) ² Q , D ² Q*) [Calvanese, De Giacomo, Lembo, Lenzerini & Rosati, Journal of Automated Reasonig 2007]
  • 104. Bounded Derivation-Depth Property (BDDP) D Q constant depth w.r.t. D C chase graph of D w.r.t. P (not the guarded chase forest) P(D) ² Q ) C²Q
  • 105. Bounded Derivation-Depth Property (BDDP) D Q constant depth w.r.t. D C chase graph of D w.r.t. P (not the guarded chase forest) Sufficient condition for first-order rewritability
  • 106. Linear Datalog[9] ¡ Goal: Lightweight Datalog[9] fragment which is highly tractable ¡ There exists only one atom in the body 9Y hasFather(X,Y)  Person(X) ¡ Enjoys first-order rewritability [Calì, Gottlob & Lukasiewicz, Journal of Web Semantics 2012] ¡ What about the complexity of Linear Datalog[9]?
  • 107. Data Complexity of Linear Datalog[9] Theorem: Linear Datalog[9] is in AC0 in data complexity Proof: ¡ We exploit first-order rewritability of Linear Datalog[9] ¡ Construct the first-order rewritten query QFO (in constant time) ¡ Evaluate QFO over the given database (in AC0) [Vardi, PODS 1995] [Calì, Gottlob & Lukasiewicz, Journal of Web Semantics 2012]
  • 108. Combined Complexity of Linear Datalog[9] Theorem: Linear Datalog[9] is PSPACE-complete in comb. complexity Proof: ¡ Upper bound: exploit the BDDP D implicit in [Johnson & Klug, JCSS 1984] Q ... ¡ Lower bound: by simulating a PSPACE Turing machine
  • 109. PSPACE-hardness of Linear Datalog[9] ¡ Assume that the tape alphabet is {0,1,t} ¡ Suppose that M halts on I = a1…am using n = mk cells, for k > 0 ¡ Initial configuration (the database D) Config(sinit,a1,…,am,t,…,t,1,0,…,0) n-m n-1 ¡ Transition rule - δ(s1,a) = (s2,b,1) i-1 n-i  Config(s1,X1,…,Xi-1,a, Xi+1,…,Xn,0,…,0,1, 0,…,0) Config(s2,X1,…,Xi-1,b, Xi+1,…,Xn,0,…,0,1, 0,…,0)  i n-i-1 ¡ Accepting rule Accept  Config(sacc,X1,…,Xn,Y1,…,Yn)
  • 110. Datalog[9,=,?]: Overview DL-LiteR PTIME-complete in AC0 Guarded Linear ELH
  • 111. But… ¡ What about joins in rule bodies? 9E Employee(E,D,P,A)  Runs(D,P), Area(P,A) ¡ What about the DL assertion concept product? biggerThan(E,M)  Elephant(E), Mouse(M)
  • 112. Sticky Datalog[9] 9E Employee(E,D,P,A)  Runs(D,P), Area(P,A) biggerThan(E,M)  Elephant(E), Mouse(M)
  • 113. Sticky Datalog[9] 9W T(X,Y,W)  R(X,Y),P(Y,Z) 9W S(Y,W)  T(X,Y,Z)
  • 114. Sticky Datalog[9] 9W T(X,Y,W)  R(X,Y),P(Y,Z) 9W S(Y,W)  T(X,Y,Z) 9W T(X,Y,W)  R(X,Y),P(Y,Z) 9W S(X,W)  T(X,Y,Z) £
  • 115. Data Complexity of Sticky Datalog[9] Theorem: Sticky Datalog[9] is in AC0 in data complexity Proof: ¡ Sticky Datalog[9] enjoys the BDDP D Q ... ¡ Thus, Sticky Datalog[9] is first-order rewritable ) in AC0 [Calì, Gottlob & Pieris, VLDB 2010]
  • 116. Combined Complexity of Sticky Datalog[9] Theorem: Sticky Datalog[9] is EXPTIME-complete in combined complexity Proof: ¡ EXPTIME-membership: construct a proof of the query by applying an APSPACE procedure D Q ¡ EXPTIME-hardness: fact inference for lossless Datalog programs over {0,1} is EXPTIME-hard S(X,Y,Z)  P(X,Y),P(Y,Z),R(Z) [Calì, Gottlob & Pieris, VLDB 2010]
  • 117. Datalog as an Ontology Language ¡ Ontology languages are based on description logics (previous lecture) ¡ Much is not possible with Datalog [Patel-Schneider & Horrocks, Journal of Web Semantics 2007] DL Axiom ? Employee v reportsTo 9Y reportsTo(X,Y)  Employee(X) funct(reportsTo) Y = Z  reportsTo(X,Y), reportsTo(X,Z) Employee disj Customer ?  Employee(X), Customer(X)
  • 118. Equality Atom Xi = Xj  R1(X1),…,Rn(Xn) ¡ Linear Datalog[9,=] is already undecidable implicit in [Chandra & Vardi, SIAM Journal on Computing 1985] ¡ Separability: given P = P [ P=, 8D8Q: D ² P= ) P(D) ² Q iff P(D) ² Q [Calì, Lembo & Rosati, PODS 2003] ¡ Non-conflicting Datalog[9,=]: sufficient condition for separability see, e.g., [Calì, Gottlob & Lukasiewicz, Journal of Web Semantics 2012]
  • 119. Truth Constant False ?  R1(X1),…,Rn(Xn) ¡ Preliminary check without adding complexity - given P = P [ P? P(D) ² Q m P(D) ² Q OR P(D) ² Qρ, for some ρ 2 P? Qρ ← body(ρ)
  • 120. Non-Conflicting Datalog[9,=,?]: Overview DL-LiteR PTIME-complete in AC0 in AC0 Guarded Linear Sticky ELH 9W S(X,W)  R(X,Y,Y) 9W S(Y,W)  R(X,Y),P(Y,Z)
  • 121. Further Reading (Partial List) Guarded Datalog[9] (and extensions) - Andrea Calì, Georg Gottlob, and Michael Kifer. Taming the infinite chase: Query answering under expressive relational constraints. In Proceedings of KR, pages 70-80, 2008. - Andrea Calì, Georg Gottlob, and Thomas Lukasiewicz. A general Datalog-based framework for tractable query answering over ontologies. J. Web Sem., 14:57-83, 2012. - Jean-François Baget, Marie-Laure Mugnier, Sebastian Rudolph, and Michaël Thomazo. Walking the complexity lines for generalized guarded existential rules. In Proceedings of IJCAI, pages 712-717, 2011. Sticky Datalog[9] (and extensions) - Andrea Calì, Georg Gottlob, Andreas Pieris. Advanced processing for ontological queries. PVLDB 3(1): 554- 565, 2010. - Andrea Calì, Georg Gottlob, Andreas Pieris. Query answering under non-guarded rules in Datalog+/-. In Proceedings of RR, pages 1-17, 2010. - Georg Gottlob, Giorgio Orsi, Andreas Pieris. Ontological queries: Rewriting and optimization. In Proceedings of ICDE, pages 2-13, 2011. Weakly-acyclic Datalog[9] (and extensions) - Ronald Fagin, Phokion G. Kolaitis, Renée J. Miller, Lucian Popa. Data exchange: Semantics and query answering. Theor. Comput. Sci., 336(1): 89-124, 2005. - Alin Deutsch, Alan Nash, Jeffrey B. Remmel. The chase revisited. In Proceedings of PODS, pages 149-158, 2008. - Bruno Marnette. Generalized schema-mappings: From termination to tractability. In Proceedings of PODS, pages 13-22, 2009. Shy Datalog[9] - Nicola Leone, Marco Manna, Giorgio Terracina, Pierfrancesco Veltri. Efficiently computable Datalog∃ Programs. In Proceedings of KR, 2012.