SlideShare a Scribd company logo
1 of 168
Download to read offline
Web Queries
From a Web of Data to a Semantic Web


               François Bry, Tim Furche, Klara Weiand
     based on work in the European project KiWi “Knowledge in a Wiki”



                                                                        1
2
3
keyword search




ranked collection of documents

                             4
?




    5
?

but what does Web querying do?




                                 5
1   100   200   300   400   500




Share value: 27.40$



                                                        6
1   100   200   300   400   500




Share value: 27.40$

  Automatically sell at < 28.00$   Action!

                                                                     6
?




    7
?


if you believe that I have some
bank stocks for you ...


                                  7
or

     query: value of specific element



         Share value: 27.40$




                                       8
or

               query: value of specific element



                   Share value: 27.40$

                 rule: automatically sell at < 28.00$


     Action!


                                                        8
Web Search                      Web Queries
          Scale                    “Web”, TB/PBs               a few “documents”, GBs
                                                             for basic selection lang. high,
     Parallelizability             very high (NC)
                                                                   otherwise very low

                              independent documents,         trees (XML) or graphs (RDF),
          Data
                                   heterogenous                      homogenous

                            (almost) everyone, many casual
        Used by                                                       few experts
                                         users
Expertise Level/Knowledge
                                         low                           very high
         required
     Expressiveness                   very low                         very high
   Result presentation          Ranking, clustering...               programmed
        Matching                often fuzzy or vague             only precise answers
                              Documents (or summaries
      Return value                                              parts of trees or graphs
                                       thereof)
      Actionability                   very low                         very high

                                                                                               9
web search


Expressive Power




                                 Ease of use
                   web queries

                                  In a nutshell...
                                                 10
Web Queries
Search + Query = Easy + Automation
 Goals:
  make web querying accessible to casual users
   e.g., to enable data aggregation/mashups by users
  allow precise queries over vastly heterogeneous data
  precise queries and rules critical to the (Social) Semantic Web
   unless they are accessible no widespread adoption

 Classification of approaches to combining Web search & query
  enhance search: 	 add data extraction / object search
  enhance querying: 	add keyword search / information retrieval
  keyword-based QLs for structured data: grounds-up redesign


                                                                    11
Search + Query
        Approach 1: Enhance Search
 “Peek” into web documents
 Extract data items / Web objects from web documents
  to provide more fine-grained answers

 Examples
  Google (Squared)
  Google Rich Snippets
  Yahoo! Search Monkey




                                                       12
Search + Query
        Approach 1: Enhance Search
 “Peek” into web documents
 Extract data items / Web objects from web documents
  to provide more fine-grained answers

 Examples
  Google (Squared)
  Google Rich Snippets
  Yahoo! Search Monkey




                                                       12
Search + Query
     Approach 2: Enhance Querying
 Enhance existing (XML) web query languages
  by adding information retrieval functionality
   ranking, scoring, fuzzy matching

 Examples
  XQuery and XPath Full Text 1.0, W3C Cand. Recommendation




                                                             13
Search + Query
     Approach 2: Enhance Querying
     Enhance existing (XML) web query languages
      by adding information retrieval functionality
        ranking, scoring, fuzzy matching

     Examples
      XQuery and XPath Full Text 1.0, W3C Cand. Recommendation


1   doc('bib.xml')/bib/book[title ftcontains 'programming']



2   for
    where
               $b score $s in doc('bib.xml')/bib/book
               $b ftcontains 'web' ftand 'query'
    order by   $s descending
    return     <title> {$b//title} </title>

                                                                 13
Search + Query
      Approach 3: Keyword Queries
 Keyword query for structured Web (XML and RDF) data
  apply web search paradigm to querying tree and graph data
  operates in the same setting as e.g. XQuery and SPARQL
   few or single document, no Web scale
  but: easier handling of very heterogeneous data
  querying of semi-structured data through easy-to-use interface

 Ultimate goal:
  Easy usage combined with enough power
  
 
   
   
  → to automate data processing tasks




                                                                   14
Search + Query
      Approach 3: Keyword Queries
 Keyword query for structured Web (XML and RDF) data
  apply web search paradigm to querying tree and graph data
  operates in the same setting as e.g. XQuery and SPARQL
   few or single document, no Web scale                K. Weiand, T. Furche, and F. Bry.
                                                          Quo vadis, web queries?
  but: easier handling of very heterogeneous data           In Web4Web, 2008.
  querying of semi-structured data through easy-to-use interface

 Ultimate goal:
  Easy usage combined with enough power
  
 
   
   
  → to automate data processing tasks


                             Focus of this tutorial
                                                                                 14
Web Queries
                                              Overview
1. Summary of web query research in the 00s
   1.1. XML	     	   	   	   	   	   	   	
   1.2. RDF

2. Keyword query languages
   2.1. Motivation
   2.2. Classification
   2.3. Issues

3. KWQL
4. Discussion and Outlook


                                                         15
16
Part 1
Web Queries
   Summary of XML & RDF
  Query Language Research
                        17
1.1
XML
Tree Data & Tree Queries
                 Data–XPath–XQuery



                                 18
nted         of that element. The following listing shows a small XML
. Fi-        fragment that illustrates elements and element nesting:
s an         <bib xmlns:dc="http://purl.org/dc/elements/1.1/">
 n of    2    <article journal="Computer Journal" id="12">
 -E).            <dc:title>...Semantic Web...</dc:title>
how      4       <year>2005</year>
                 <authors>
m to
         6         <author>
  the                <first>John</first> <last>Doe</last> </author>
Web      8         <author>
ches                 <first>Mary</first> <last>Smith</last> </author>
most    10      </authors>
              </article>
ased
        12    <article journal="Web Journal">
  the            <dc:title>...Web...</dc:title>
arch    14       <year>2003</year>
                 <authors>
        16         <author>
                     <first>Peter</first> <last>Jones</last> </author>
        18         <author>
                     <first>Sue</first> <last>Robinson</last> </author>
 tion   20      </authors>


                                                                                               XML
              </article>
eral.
        22   </bib>
lica-
 ML,            In addition, we can observe attributes (name, value (sometimes graph), mostly uniform edges
                                                           ordered tree pairs
 and         associated with start tags) that are essentially like elements
rop-         but may only contain character data, no other nested
                                                                                                           19
122]         attributes or elements. Also, by definition, element order is
 tion        significant, attribute order is not. For instance
ery languages
 ble interfaces            we can write other elements or character data as children
 implemented               of that element. The following listing shows a small XML
ion VI-D). Fi-             fragment that illustrates elements and element nesting:
 ries not as an            <bib xmlns:dc="http://purl.org/dc/elements/1.1/">
 r extension of        2    <article journal="Computer Journal" id="12">
Section VI-E).                 <dc:title>...Semantic Web...</dc:title>
 mary of how           4       <year>2005</year>
                               <authors>
d RDF aim to
                       6         <author>
                                                                                                                                        !"#
ther with the                      <first>John</first> <last>Doe</last> </author>
aditional Web          8         <author>
                                                                                                                                        $%$
g approaches                       <first>Mary</first> <last>Smith</last> </author>
, are the most        10      </authors>
                            </article>
eyword-based
                      12    <article journal="Web Journal">
 nges, and the                 <dc:title>...Web...</dc:title>
of Web search         14       <year>2003</year>
                               <authors>
                      16         <author>                                                                      !&#                                         !"-#.
 ND   RDF                          <first>Peter</first> <last>Jones</last> </author>
                      18         <author>                                                                    '()%*+,                                      '()%*+,
                                   <first>Sue</first> <last>Robinson</last> </author>
 epresentation        20      </authors>
                            </article>
ata in general.
                      22   </bib>
er of applica-
kup (XHTML,                   In addition, we can observe attributes (name, value pairs
G 7 [141]) and             associated with start tags) that are essentially like elements
(Apple’s prop-                                                                !-#
                           but may only contain character data, no other nested                !3#         !5#.               !"&#               !"3#     !"5#       !"/#.               !&-#
nd XSLT [122]              attributes or elements. Also, by definition, element order is
 r serialization           significant, attribute order is not. For instance
                                                                             )%)+,            4,'(       '0)12(6            720(8'+              )%)+,    4,'(      '0)12(6            720(8'+
 rmats such as
                             <author><last>Doe</last><first>John</first></author>

describing the             represents different information than the author element
rText Markup               in lines 6–9, but
 e on the web,               <article id="12" journal="Computer
are fixed. XML                     Journal">...</article>!!!"#$%&'()*"                                 !/#.        !9#.                                               !":#      !&=#.
  by specifying                                                                             4556                           7.%89($2"-.92'&:   !!!+$,!!!   455>                           +$,"-.92'&:
                           represents the same element+$,"!!! item as lines 2–15.
                                                         information
                             Figure 1 gives a graphical representation of the XML doc-
                                                                                                     '0)12(      '0)12(                                             '0)12(    '0)12(
 ation in XML
                       ument that is referenced in preceding illustrations. When
 et [68] which
                       represented as a graph, an XML document without links is
ML document.
                       a labeled tree where each node in the tree corresponds to
  parts, closely
                       an element and its type. Edges connect nodes and their
                       children, that is, elements and the elements nested in
el, we provide
ates from the
                       them, elements and their content and elements and their               !:#       !<#        !"=#    !""#                              !"<#      !"9#.   !&"#        !&&#
                       attributes. Since the visual distinction between the parent-
ved and thus
ped. This view
                       child relationship can be made without edge labels and                ;(6)     +'6)        ;(6)    +'6)                              ;(6)      +'6)    ;(6)        +'6)
                       since attributes are not addressed or receive no special
uch as Xcerpt
                       treatment in the research presented in this text, edges will
  follow XPath
                       not be labeled in the following figures.
                          Elements, attributes, and character data are XML’s most
 XML is a syn-
                       common information types. In addition, XML documents
ems are called
                       may also contain comments, processing instructions (name-
end tags, both                                                                               -./'     0.$        1&23     #%)(/                           ;$($2     -.'$<     #9$      =.,)'<.'


                                                                                                                                                                                     XML
                       value pair with specific semantics that can be placed any-
r>...</author>
                       where an element can be placed), document level informa-
 place of ‘. . . ’,
                       tion (such as the XML or the document type declarations),
                       entities, and notations, which are essentially just other kinds
                       of information containers.
                                                                                                      Fig. 1.      Visual representation of sample XML document
                                                                                                                                  ordered tree (sometimes graph), mostly uniform edges

               On top of these information types, two additional fa-                                                                    B. Resource Description Framework (RDF)
            cilities relevant to the information content of XML docu-                                                                                                                              19
                                                                                                                                         As the second preeminent data format on the S
            ments are introduced by subsequent specifications: Names-
                                                                                                                                        Web, the Resource Description Format (RDF) [109
            paces [33] and Base URIs [140]. Namespaces allow the
?


what’s new about trees? didn’t we
try that before?


                                    20
r
                    ce sto
               an
                                                 self
                                                 parent

                                                 child

                                                 preceding-sibling

                                                 following-sibling




                                       fol
     ing




                                           lo
                                          win
       d
    ce




                                             g
pre




           descendant                                                XPath
            axis for navigation/selection in tree, context, horizontal axes for order



                                                                                    21
XPath: Navigation in Trees
                                        Intro in 5 Points
  Data: rooted, ordered, unranked, finite trees (no ID/IDREF resolution)
  Paths are sequences of steps (axis & test on properties of node)
   adorned with existential predicates (in []) to obtain tree queries
   some more advanced features: value joins, aggregation, …
  Answers are sets of nodes from the input document
  /child::html/descendant::h1[not(preceding::h1	
  =	
  
  “Introduction”)]/child::p[attribute::class=”abstract”]
  no variables, no construction, no grouping/ordering




                                                                          22
axis Nodes (n)             = {(n  : R axis (n, n  )}
                                                                                are conc
 λ Nodes (n)                = {(n  : Labλ (n  )}                            widely su
 node() Nodes (n)           = Nodes(T )                                       tension o
 axis::nt[qual] Nodes (n)= {n  : n  ∈  axis Nodes ∧ n  ∈  nt Nodes ∧   in XQue
                              qual Bool (n  )}                               in XQue
 step/path Nodes (n)        = {n  : n  ∈  step Nodes (n)∧                which ca
                                n  ∈  path Nodes (n  )}                    Indeed,
 path1 ∪ path2 Nodes (n) =  path1 Nodes (n) ∪  path2 Nodes (n)            normaliz
                                                                                      h) X
 path Bool (n)              =  path Nodes (n)  
                                                                                sion of X
                                                                                use cases
 path1 ∧ path2 Bool (n)     =  path1 Bool (n) ∧  path2 Bool (n)
                                                                                not only
 path1 ∨ path2 Bool (n)     =  path1 Bool (n) ∨  path2 Bool (n)           The mos
 ¬path Bool (n)             = ¬  path Bool (n)                                1) Sequ
 lab() = λ Bool (n)         = Labλ (n)                                              expr
 path1 = path2 Bool (n)     = ∃n  , n  : n  ∈  path1 Nodes (n)∧               sequ
                                n  ∈  path2 Nodes (n)∧  (n  , n  )            from
                                                                                   23 trast

                                    TABLE I                                           of n
XPath: Navigation in Trees
                                        A Success Story
  Commercial success: One of the most successful QLs
   widely implemented and used, several W3C standards
  Research success:
   designed with little concern for formal “beauty”
   but with few restrictions: turns out it hit a formal sweet spot
  Expressiveness: monadic datalog (datalog with only unary
  intensional predicates, i.e., answer predicates)
   also: two-variable first-order logic (FO2)
  Polynomial complexity, linear for navigational XPath



                                                                     24
XPath: Navigation in Trees
                          A Success Story (cont.)
  Completeness: falls short of being first-order complete
   for first-order completeness a “conditional axis” (UNTIL operator) needed
   e.g., all a nodes reachable by a path of only b nodes
   but: partially justified by results on elimination of reverse axes
    reverse axis: parent, ancestor, preceeding, …
    eliminating these axes possible in navigational XPath
      though in few cases at exponential cost or resulting in
      introduction of expensive node-identity joins
      we can safely ignore them in most cases
    in conditional XPath this elimination is not possible




                                                                              25
XPath: Navigation in Trees
                          A Success Story (cont.)
  Completeness: falls short of being first-order complete
   for first-order completeness a “conditional axis” (UNTIL operator) needed XPath, the
                                                            M. Marx. Conditional
                                                                          First Order Complete XPath
   e.g., all a nodes reachable by a path of only b nodes                      Dialect. PODS 2004.
   but: partially justified by results on elimination of reverse axes
    reverse axis: parent, ancestor, preceeding, …
    eliminating these axes possible in navigational XPath       D. Olteanu, H. Meuss, T. Furche, and F. Bry.
                                                                        XPath: Looking Forward.
      though in few cases at exponential cost or resulting in      XMLDM @ EDBT, LNCS 2490, 2002.
      introduction of expensive node-identity joins
      we can safely ignore them in most cases
    in conditional XPath this elimination is not possible             C. Ley and M. Benedikt. How big
                                                                        must complete XML query
                                                                        languages be? ICDT 2009.



                                                                                                   25
Following (x, y ) :⇔ x pre y ∧ x post y

                                         From these axes, all others can be defined in first-order logic.
                                         pre- and post-orders are sufficient to represent the full tree

XPath: Navigation in Trees               structure.



                         A Success Story (cont.)
  Evaluation:
   naïve decomposition:        Structural Joins
    sequence of joins, descendant with closure over child relation
                                       Computing descendants by a single theta-join on the
   better: structural joins for   descendant (single lookup using tree labeling)
                                       representation relation.
    e.g., pre/post encoding                                                  R   pre   post lab
                                                       a:1:7                      1     7    a
    fits nicely with relational storage
                                                                                  2     3    b
                                             b:2:3             a:5:6              3     1    a
                                                                                  4     2    c
                                                                                  5     6    a
                                         a:3:1   c:4:2     b:6:4   d:7:5
                                                                                  6     4    b
                                                                                  7     5    d

   twig joins: stack-based, holistic, not easily adapted to relational DBS
                                Example
                                    CREATE VIEW descendant AS
                                    SELECT r1.pre, r2.pre FROM R r1, R r2 WHERE r1.pre  r2.pre
                                    AND r2.post  r1.post;
                                                                                                    26
                                         Such joins are called structural joins [Al-Khalifa et al., 2002].
XPath: Navigation in Trees
              A Success Story (cont.)
       1     simple, fairly easy to learn



       2     formal “sweet” spot



       3     efficient evaluation

                                            27
XPath: Navigation in Trees
              A Success Story (cont.)
       1     simple, fairly easy to learn


                                            M. Benedikt and C. Koch.

       2     formal “sweet” spot            XPath Leashed. In ACM
                                            Computing Surveys, 2007




       3     efficient evaluation

                                                              27
?


that’s it? everyone happy?



                             28
let	
  $auction	
  :=	
  doc(’auction.xml’)	
  
for	
  $o	
  in	
  $auction/open_auctions/open_auction	
  
return	
  
corrupt	
  id=’{	
  $o/@id	
  }’	
  
{	
  if	
  (sum($o/(initial	
  |	
  bidder/increase))	
  =	
  $o/current)	
  
then	
  text	
  {	
  ’no’	
  }	
  
else	
  $auction//people/person[@id	
  =	
  $o//@person]/name	
  }	
  
/corrupt	
  




                                                                                29
XQuery: Graph Queries  Construction
                                     From XPath to Hell
  Adds an enormous set of features to XPath
   price: highly complex language, Turing-complete, very hard to implement
   here: just some highlights

  Variables: XPath plus variables → no longer FO2
   “an article that has the conference’s chair as author”
   not any more polynomial, but NP-/PSPACE-complete
  Construction: harmless if only at end of query
   but: composition (i.e., querying of data constructed in the same query)
    “find all papers written by the author with the most papers”
    NEXPTIME-complete or in EXPSPACE (can not be captured by datalog)


                                                                             30
XQuery: Graph Queries  Construction
                                   From XPath to Hell
  Recursion: programmable
   with composition → Turing complete
   i.e., like Prolog or datalog with value invention
  Sequences: rather than sets
   without composition little effect
   with composition: drastically harder optimization as iteration semantics of
   a query must be precisely observed




                                                                                 31
s
 type [α] (Haskell), Array (Ruby) or 0                                                              Q3
                                                                                                    iter pos   item
         for y in properly reflect this
bleT  (Linq). To   2001 to 2008               loop(s0 )        loop(s1 ) πiter:outer, Operator     1 1 ’WD/CR/PR’          Seman
         return
 herently unordered relational database s1      iter             iter
                                                                                 pos:pos1 ,
                                                                                 item
                                                                                                     1 2 ’WD/CR/PR’
                                                                                                      . .        .
                                                                                                      . .        .
          if ( y lt 2007)
  we embed order in the data and use              1 ’REC’          1            pos1 :sort,posπa:b. .         .           project
                                             ···else               .                             σa  1 8      ’REC’          select r
          then ’WD/CR/PR’ elseon the
bles with columns pos|item (shown   ’REC’    iter pos item         .           ✶
                                              7 1 ’REC’            .       iter=inner            ×                           Cartes
 present such sequences. Note that the        8 1 ’REC’              .
 s need not be dense and not even be of
                                                                   8∪                            ✶a=b , a=b
                                                                                                 .                           equi-jo
               (a) Query Q3 .
ordered domain will do. In specific cases   (b) Associated loop tables.
                                                           @item:’REC’    @item:’WD/CR/PR’ ∪, ∪,                            (disjoin
 may already reflected by the items xi                      @pos:1         @pos:1
                                                                                                 δ                           elimina
                                                                                                    ···then
 f a sequence of encoded nodes resulting                   πiter          πiter
                                                                                                 @a:c pos ’WD/···’
                                                                                                    iter       item
                                                                                                                             attach
on step evaluation)—column itemand its W3C Recommendation
        Figure 3:                   may                    σitem          σitem1
                                                                                                 #a  1 1 ’WD/CR/PR’
                                                                                                     2 1 ’WD/CR/PR’
                                                                                                                             attach
        track maturity level (Query Q3 ). Annotations ¬s0,1
 e of pos.                                                                                       a:b1 ,..,bn  .
                                                                                                      . .
                                                                                                      . .        .           attach
                                                                                        item1 :item . .        .
                                                        y lt 2007                                         6 1 ’WD/CR/PR’    attach
Query denote semantics is principally
      dynamic iteration scopes.
                                                                                                         #a:b1 ,..,bn /c
                                                      iter pos item       πiter,pos,item:item2
                                                                                                        πa:b1 ,..,bn      attach
                                                       1 1 true
  as the core language construct: any                  2 1 true
                                                   . . .                  item2 :item1 ,item          outer:iter,
 sidered (Linq)                                    . . .
                         Enumerable.Range(2001,8).Select(
         to be iteratively evaluated in the                                                              agga:b/c
                                                                                                         inner,
                                                                                                         sort:pos            attach
                                                   . . .                  ✶
most enclosing for loop. The top level of ? ’WD/CR/PR’ : ’REC’ )
                            y = y  2007          8 1 false        iter1 =iter

 ssumed (Links)
          to be wrapped inside the scope                     π :iter,
                         for (y - [2001,2002,. . .,2008])iter11 :item          @item:2007              Table 1: Excerpt of
                                                              item
gle-iteration loop for _ in (0) return e
                            [if (y  2007) then WD/CR/PR else REC]
 occur free in e (i.e., the choice of 0 is
                                                y            @pos:1             @pos:1                  (with agg ∈ {count
                                                                                                           2007
                                              iter pos item                                                 iter pos item
         (Ruby)
ndamental idea behind(2001..2008).collect { 2001
                          loop lifting is to   1 1           πiter:inner,       πiter:inner                  1 1 2007
                                                              item                                           2 1 2007
                            |y|a y  2007 ? ’WD/CR/PR’ : ’REC’ }
ode that consumes and emits “fully un-
                                               2 1 2002
                                                . . .                                                        .   .    .
                                                . . .                                                        .
                                                                                                             .   .
                                                                                                                 .    .
                                                                                                                      .
                                                . . .
 esentation of e’s value. if y unrolling then WD/CR/PR else REC | #inner
         (Haskell) [        Here,  2007       8 1 2008                                                     gramming languag
                                                                                                            8 1 2007
 le that                    y - [2001..2008] ]                                 2001 to 2008
                                                                                   iter pos item            set-oriented algebr
                                                                                    1 1 2001
 ble with schema iter|pos|item holds the if y  2007 else ’REC’
         (Python) [ ’WD/CR/PR’                                                      1 2 2002
                                                                                    .     .    .            work for database-
                           for y in range(2001,2008) ]
 ues that e assumes during its iterative                                            .     .    .
                                                                                    .
                                                                                    1
                                                                                          .    .
                                                                                          8 2008
                                                                                                            not stumble if pro
                                                                                                                        32
                                                                                                            stances [13].
       Figure 4: A sample of iterative constructs found in
able has key iter, pos since e may yield
                   ’s companion languages (paraphrases of Q3 ). code to evaluatecode.
s in each distinct iteration—only if e’s
                                           Figure 5: Loop-lifted algebraic
                                                                           Algebraic
                                                                                     Q3
                                                                                                                                 W
s
 type [α] (Haskell), Array (Ruby) or 0                                                              Q3
                                                                                                    iter pos   item
         for y in properly reflect this
bleT  (Linq). To   2001 to 2008               loop(s0 )        loop(s1 ) πiter:outer, Operator     1 1 ’WD/CR/PR’             Seman
         return
 herently unordered relational database s1      iter             iter
                                                                                 pos:pos1 ,
                                                                                 item
                                                                                                     1 2 ’WD/CR/PR’
                                                                                                      . .        .
                                                                                                      . .        .
          if ( y lt 2007)
  we embed order in the data and use              1 ’REC’          1            pos1 :sort,posπa:b. .         .              project
                                             ···else               .                             σa  1 8      ’REC’             select r
          then ’WD/CR/PR’ elseon the
bles with columns pos|item (shown   ’REC’    iter pos item         .           ✶
                                              7 1 ’REC’            .       iter=inner            ×                              Cartes
 present such sequences. Note that the        8 1 ’REC’              .
 s need not be dense and not even be of
                                                                   8∪                            ✶a=b , a=b
                                                                                                 .                              equi-jo
               (a) Query Q3 .
ordered domain will do. In specific cases   (b) Associated loop tables.
                                                           @item:’REC’    @item:’WD/CR/PR’ ∪, ∪,                               (disjoin
 may already reflected by the items xi                      @pos:1         @pos:1
                                                                                                 δ                              elimina
                                                                                                    ···then
 f a sequence of encoded nodes resulting                   πiter          πiter
                                                                                                 @a:c pos ’WD/···’
                                                                                                    iter       item
                                                                                                                                attach
on step evaluation)—column itemand its W3C Recommendation
        Figure 3:                   may                    σitem          σitem1
                                                                                                 #a  1 1 ’WD/CR/PR’
                                                                                                     2 1 ’WD/CR/PR’
                                                                                                                                attach
        track maturity level (Query Q3 ). Annotations ¬s0,1
 e of pos.                                                                                       a:b1 ,..,bn  .
                                                                                                      . .
                                                                                                      . .        .              attach
                                                                                                      . .
                                                                                           item1 :item         .
                                                          y lt 2007                                          6 1 ’WD/CR/PR’    attach
Query denote semantics is principally
      dynamic iteration scopes.
                                                                                                            #a:b1 ,..,bn /c
                                                        iter pos item        πiter,pos,item:item2
                                                                                                           πa:b1 ,..,bn      attach
                                                         1 1 true
  as the core language construct: any                    2 1 true
                                                      . . .                  item2 :item1 ,item          outer:iter,
 sidered (Linq)                                       . . .
                          Enumerable.Range(2001,8).Select(
          to be iteratively evaluated in the                                                                agga:b/c
                                                                                                            inner,
                                                                                                            sort:pos            attach
                                                      . . .                  ✶
most enclosing for loop. The top level of ? ’WD/CR/PR’ : ’REC’ )
                               y = y  2007          8 1 false        iter1 =iter

 ssumed (Links)
          to be wrapped inside the scope                        π :iter,
                          for (y - [2001,2002,. . .,2008])iter11 :item            @item:2007              Table 1: Excerpt of
                                                                 item
gle-iteration loop for _ in (0) return e
                               [if (y  2007) then WD/CR/PR else REC]
 occur free in e (i.e., the choice of 0 is
                                                   y            @pos:1             @pos:1                  (with agg ∈ {count
                                                                                                              2007
                                                 iter pos item                                                 iter pos item
         (Ruby)
ndamental idea behind(2001..2008).collect { 2001
                             loop lifting is to   1 1           πiter:inner,       πiter:inner                  1 1 2007
                                                                 item                                           2 1 2007
                               |y|a y  2007 ? ’WD/CR/PR’ : ’REC’ }
ode that consumes and emits “fully un-
                                                  2 1 2002
                                                   . . .                                                        .   .    .
                                                   . . .                                                        .
                                                                                                                .   .
                                                                                                                    .    .
                                                                                                                         .
                                                   . . .
 esentation of e’s value. if y unrolling then WD/CR/PR else REC | #inner
         (Haskell) [           Here,  2007       8 1 2008                                                     gramming languag
                                                                                                               8 1 2007
 le that P. Boncz, T. Grust, M. van Keulen, S.                                     2001 to 2008
                               y - [2001..2008] ]                                                             set-oriented algebr
          Manegold, J. Rittinger, J. Teubner.                                         iter pos item
                                                                                       1 1 2001
        MonetDB/XQuery: ’WD/CR/PR’
         (Python) [ A Fast XQuery
 ble with schema iter|pos|item holds the if y  2007 else               ’REC’          1 2 2002                work for database-
                                                                                       .     .    .
             Processor Poweredy in range(2001,2008) ]
                             for by a
 ues that e assumes during its iterative                                               .     .    .
                                                                                       .     .    .            not stumble if pro
         Relational Engine, SIGMOD 2006.                                               1     8 2008
                                                                                                                           32
                                                                                                               stances [13].
        Figure 4: A sample of iterative constructs found in
able has key iter, pos since e may yield
                   ’s companion languages (paraphrases of Q3 ). code to evaluatecode.
s in each distinct iteration—only if e’s
                                           Figure 5: Loop-lifted algebraic
                                                                           Algebraic
                                                                                     Q3
                                                                                                                                    W
XQuery: Graph Queries  Construction
                     From XPath to Hell
  1    significantly harder than XPath


       composition  recursion
 2     expensive operations


       yet significant steps toward
 3     fast implementations

                                          33
XQuery: Graph Queries  Construction
                     From XPath to Hell
  1    significantly harder than XPath


       composition  recursion           M. Benedikt and C. Koch.

 2     expensive operations
                                        Interpreting Tree-to-Tree
                                          Queries. ICALP 2006.




       yet significant steps toward
 3     fast implementations

                                                                    33
XQuery: Graph Queries  Construction
                     From XPath to Hell
  1    significantly harder than XPath


       composition  recursion            M. Benedikt and C. Koch.

 2     expensive operations
                                         Interpreting Tree-to-Tree
                                           Queries. ICALP 2006.




                                          P. Boncz, T. Grust, et al.
       yet significant steps toward
 3     fast implementations
                                     MonetDB/XQuery: A Fast XQuery
                                         Processor Powered by a
                                     Relational Engine, SIGMOD 2006.



                                                                     33
XQuery: Graph Queries  Construction
                     From XPath to Hell
  1    significantly harder than XPath


       composition  recursion            M. Benedikt and C. Koch.

 2     expensive operations
                                         Interpreting Tree-to-Tree
                                           Queries. ICALP 2006.




                                          P. Boncz, T. Grust, et al.
       yet significant steps toward
 3     fast implementations
                                     MonetDB/XQuery: A Fast XQuery
                                         Processor Powered by a
                                     Relational Engine, SIGMOD 2006.



                                                                     33
34
1.2
RDF
Graph Data for the Web
                     35
Legend
                                          Other                                      Smith
      Class        Literal
                                         Resource


                                                                  Person

                                                                              last                        Mary

                                                                       type
                                                                                     first
                           Computer
         11                                          Bag
                            Journal
                                                                                               Person

                 number           name
    Journal                                           type        _2
                                                                                            type
                 type
                                                                       _1                          first           John

      2005
                                    isPartOf    author
                 year                                                                              last


                                         smith2005         type               Article                       Doe


                                                                                                                         RDF
                          title
   ...Semantic
      Web...

                        unordered, unranked, unrooted finite graph, edge and node labeled, different node kinds



                                                                                                                           36
?


what’s new about that? isn’t that just
a fancy repr. of a ternary relation?


                                         37
RDF: Semantics for the Web?
           What’s new about RDF?
 data model of RDF very similar to relational
  but: “unique” identifiers in form of URIs that can be shared on the Web
   uniqueness only thanks to careful assignment practice (unlike GUIDs)
   human-readable (unlike GUIDs)
  but: existential information in form of blank nodes
   like named null values in SQL, also known as Codd tables
  but: implied information due to RDF/S semantics (and, thus, entailment)
   e.g., class hierarchy and typing, domain and range of properties, axiomatic triples

 notion of redundancy-free or lean graph/answer




                                                                                         38
bib:hasPart.5                                                queries are rang
                                                                 (CONSTRUCT or S
    CONSTRUCT { ?j bib:hasPart ?a }
                                                                 (WHERE clause) o
2   WHERE { ?a rdf:type bib:Article AND ?a bib:isPartOf ?j
            AND ?j bib:name ‘Computer Journal’ }                 expressions (in c
                                                                 FILTER expressio
The query illustrates SPARQLs fundamental query con-             condition must
struct: a pattern (s, p, o) for RDF triples (whose components    first limitation i
are usually thought of as subject, predicate, object). Any       allowed in stand
RDF triple is also a triple pattern, but triple patterns allow   priori and rewritt
variables for each component. Furthermore, SPARQL also           (as FILTER expre
allows literals in subject position, anticipating the same       which, in turn,
change also in RDF itself. We use the variant syntax for         boolean value” i
SPARQL discussed in [161] to ease the definition of syntax           Finally, we a
and semantics of the language. For instance, standard            CONSTRUCT clause
SPARQL, uses . instead of AND for triple conjunction. We         variables occurri
consider two forms of SPARQL queries, viz. SELECT queries        literals, and all v
that return list of variable bindings and CONSTRUCT queries      only ever bound
that return new RDF graphs. Triple patterns contained in a       The first conditio
CONSTRUCT clause (or “template”) are instantiated with the       adding appropria
variable bindings provided by the evaluation of the triple       query body.
pattern in the WHERE clause. We omit named graphs and               Following [16
assume that all queries are on the single input graph. An        queries based
(s, p, o) D
             Subst                      = {θ : dom(θ) = Vars((s, p, o)) ∧ t θ ∈ D}
 pattern1 AND pattern2 D
                         Subst          =  pattern1 D   pattern2 D
                                                      Subst           Subst
 pattern1 UNION pattern2 D
                           Subst        =  pattern1 D ∪  pattern2 D
                                                      Subst           Subst
 pattern1 MINUS pattern2 D
                           Subst        =  pattern1 D   pattern2 D
                                                      Subst           Subst
 pattern1 OPT pattern2 D
                         Subst          =  pattern1 D   pattern2 D
                                                      Subst           Subst
 pattern FILTER condition D
                            Subst       = {θ ∈  pattern D : Vars(condition) ⊂ dom(θ)
                                                          Subst
                                                             ∧  condition D (θ)}
                                                                            Bool

 condition1 ∧ condition2 D (θ)
                           Bool         =  condition1 D (θ) ∧  condition2 D (θ)
                                                        Bool                  Bool
 condition1 ∨ condition2 D (θ)
                           Bool         =  condition1 D (θ) ∨  condition2 D (θ)
                                                        Bool                  Bool
 ¬condition D (θ)
              Bool                      = ¬  condition D (θ)
                                                         Bool
 BOUND(?v) D (θ)
             Bool                       = vθ  nil
 isLITERAL(?v) D (θ)
                 Bool                   = vθ ∈ L
 isIRI(?v) D (θ)
             Bool                       = vθ ∈ I
 isBLANK(?v) D (θ)
               Bool                     = vθ ∈ B
 ?v = literal D (θ)
                Bool                    = vθ = literal
 ?u = ?v D (θ)
           Bool                         = uθ = vθ ∧ uθ  nil

 triple D (θ)
          Graph                         = tripleθ if ∀v ∈ Vars(triple) : vθ  nil,  otherwise

 template1 AND template2 D (θ) =  template1 D (θ) ∪  template2 D (θ)
                           Graph                Graph                Graph
                                            
 CONSTRUCT t WHERE p D                =                   std(t ) D (θ)
                                            θ∈ P D
                                                   Subst
                                                                      Graph

 SELECT V WHERE p D                   = πV ( P D )
                                                   Subst
                                           TABLE V
                                    S EMANTICS FOR SPARQL
RDF: Semantics for the Web?
                      Simple RDF QL?
  SPARQL: Simple Protocol and RDF Query Language
  really simple?
   same expressiveness as full relational algebra, PSPACE-complete
   but: no composition, no order → simpler than full SQL or XQuery
  really an RDF query language?
   blank node construction only limited (no quantifier alternation)
   talks vaguely about extension to entailment regimes
    but no support in SPARQL as defined
   no support for lean answers (justified partially by high computational cost)



                                                                                 40
RDF: Semantics for the Web?
        SPARQL: Simple RDF QL?
 Complexity:
  SPARQL with only AND, FILTER, and UNION: NP-complete
   i.e., no computationally interesting subset of full relational SPJU queries
  full SPARQL: PSPACE-complete
   again no computationally interesting subset of full first-order queries
   why? due to negation “hidden” in OPTIONAL
     reduction from 3SAT using isBound to encode negation

 lacks completeness w.r.t. RDF transformations:
  relational algebra: can express all PSPACE-transformations on relations
  SPARQL: fails at the same for RDF



                                                                                 41
RDF: Semantics for the Web?
        SPARQL: Simple RDF QL?
                                                                          J. Perez, M. Arenas, and
 Complexity:                                                            C. Gutierrez. Semantics and
                                                                         Complexity of SPARQL.
  SPARQL with only AND, FILTER, and UNION: NP-complete                           ISWC 2006
   i.e., no computationally interesting subset of full relational SPJU queries
  full SPARQL: PSPACE-complete
   again no computationally interesting subset of full first-order queries
   why? due to negation “hidden” in OPTIONAL
     reduction from 3SAT using isBound to encode negation

 lacks completeness w.r.t. RDF transformations:
  relational algebra: can express all PSPACE-transformations on relations
  SPARQL: fails at the same for RDF



                                                                                            41
Select all persons and return one blank node
connected to all these persons using “member”

                               Mary



                John        member        Joe


                   member      member




Not expressible in SPARQL                          SPARQL
                                        limitations of blank node construction
Why not? no quantifier alternation
Select all persons and return one blank node
connected to all these persons using “member”

                               Mary



                John        member              Joe


                   member      member T. Furche, C. Ley, B. Linse, B. Marnette, and O.
                                   F. Bry,
                                         Poppe. SPARQLog: SPARQL with Rules and
                                          Quantification. In: Semantic Web Information
                                         Management: A Model-based Perspective, 2009.




Not expressible in SPARQL                                SPARQL
                                              limitations of blank node construction
Why not? no quantifier alternation
RDF: Semantics for the Web?
                       SPARQL: Future
  doesn’t hit a sweet spot (like XPath)
   at least from a formal, database perspective
    contributions from database community very rare

  yet a significant improvement over most previous RDF QLs
  already forms basis for future QL research on RDF
   rule extensions for RDF
    “From SPARQL to Rules” (Polleres, WWW 2007),
    Networked Graphs (Schenk et al, WWW 2008)
    no or limited blank node support
    but: with rules, restricted quantification as in SPARQL (∃∀) as expressive as
    unrestricted quantifier alternation


                                                                                   43
RDF: Semantics for the Web?
                       SPARQL: Future
  doesn’t hit a sweet spot (like XPath)
   at least from a formal, database perspective
    contributions from database community very rare

  yet a significant improvement over most previous RDF QLs
  already forms basis for future QL research on RDF
   rule extensions for RDF
    “From SPARQL to Rules” (Polleres, WWW 2007),
    Networked Graphs (Schenk et al, WWW 2008)
    no or limited blank node support
    but: with rules, restricted quantification as in SPARQL (∃∀) as expressive as
    unrestricted quantifier alternation F. Bry, T. Furche, C. Ley, B. Linse, B. Marnette, and O.
                                             Poppe. SPARQLog: SPARQL with Rules and
                                              Quantification. In: Semantic Web Information
                                             Management: A Model-based Perspective, 2009.         43
RDF: Semantics for the Web?
                  SPARQL: Summary
 1   syntax for first-order queries on RDF



     foundation for research but little
 2   innovation in the language itself


 3   lackluster support for RDF specifics

                                            44
RDF: Semantics for the Web?
                  SPARQL: Summary
 1   syntax for first-order queries on RDF


                                            J. Perez, M. Arenas, and
     foundation for research but little   C. Gutierrez. Semantics and
 2   innovation in the language itself
                                           Complexity of SPARQL.
                                                   ISWC 2006




 3   lackluster support for RDF specifics

                                                              44
45
Part 2
Keyword Queries

   Classification  main issues
                             46
47
Queries as Keywords
                Main Characteristics
  Queries are (mostly) unstructured bags of words
  Used in general purpose web search engines
  but also elsewhere (Amazon, Facebook,...)

  Implicit conjunctive semantics (with limitations)
  Often combined with IR techniques
  ranking, fuzzy matching

  Research focuses on
  application to semi-structured data
  general structured data (Web objects  tables, relations)
                                                              48
Queries as Keywords
Why Keyword Qs for Structured Data
 Success in other areas
 Users are already familiarized with the paradigm
 Allow casual users to query structured data without
 having deep knowledge of
  the query language
  the structure  schema of the data

 Enable querying of heterogeneous data



                                                       49
Queries as Keywords
      Classification of Keyword QLs
 Data type
  XML
  RDF

 Implementation
  stand-alone systems
  translation to conventional query language
  keyword-enhanced query languages




                                               50
Queries as Keywords
      Classification of Keyword QLs
 Complexity of atomic queries
  keyword-only
  label-keyword
  keyword-enhanced

 Querying of elements
  Values
  Node labels
  Edge labels


                                     51
Queries as Keywords
                        XML Keyword QLs
                   Stand-alone   Enhancement

  Keyword-only         9              0


 Label-keyword         2              0


Keyword-enhanced       --             3

                                               52
Queries as Keywords
                                         XML Keyword QLs
K. Weiand, T. Furche, and F. Bry.
   Quo vadis, web queries?
     In Web4Web, 2008.



                                    Stand-alone   Enhancement

              Keyword-only              9              0


             Label-keyword              2              0


         Keyword-enhanced               --             3

                                                                52
Queries as Keywords
                        RDF Keyword QLs
                   Stand-alone   Translation

  Keyword-only         2             2


 Label-keyword         0             1


Keyword-enhanced       --            --

                                               53
Queries as Keywords
                                         RDF Keyword QLs
K. Weiand, T. Furche, and F. Bry.
   Quo vadis, web queries?
     In Web4Web, 2008.



                                    Stand-alone   Translation

              Keyword-only              2             2


             Label-keyword              0             1


         Keyword-enhanced               --            --

                                                                53
Queries as Keywords
     XML Keyword QLs More Popular
 Time
  first XML keyword query languages are as old as RDF

 Familiarity with XML
 Complexity of RDF
  Graph-shaped
  Labeled Edges
  Blank nodes




                                                       54
Queries as Keywords
     XML Keyword QLs More Popular
 Time
  first XML keyword query languages are as old as RDF

 Familiarity with XML
 Complexity of RDF
  Graph-shaped
  Labeled Edges
  Blank nodes




                                                       54
Queries as Keywords
                                  Focus Issues
1. Grouping keyword matches
2. Determining answer representations
3. Expressive power
4. Ranking
5. Limitations




                                                 55
2.1
 Computing
Query Answers
 Turning keyword matches into
                    answers
                            56
Computing Query Answers
                     Problem Setting
  Query K yields match lists L1 ... L|K|
  Answer sets S1 … Sm are constructed from L
   may contain either
    only one match per match list (m = |K|) or
    several (m ≥ |K|)

  Data-centric vs. document-centric XML




                                                 57
Computing Query Answers
 K
={Smith, web},       Problem Setting
 L1
 ={11}, 
L 2
 ={3,23},
 S1
 ={11,3},
S 2
 ={11,23}                                             (1)
                                                                        bib



                                            (2)                                              (13)
                                          article                                           article



                (3)        (4)             (5)               (12)                  (14)      (15)       (16)               (23)
               title      year           authors           journal                 title     year     authors            journal



                                   (6)            (9)                                                  (17)      (20)
...Semantic Web...     2005                                  Computer Journal   ...XML...   2003                             Web Journal
                                 author         author                                                author    author



                         (7)       (8)              (10)     (11)                             (18)     (19)      (21)         (22)
                        first      last              first     last                             first     last      first         last



                       John       Doe           Mary       Smith                            Peter     Jones     Sue         Robinson




                                                                                                                                       58
Computing Query Answers
                     Problem Setting
  Entire XML document
  usually too big to serve as a good query answer

  Matched nodes alone
  usually not informative

  Meaningful results
  a smaller unit has to be found




                                                    59
Computing Query Answers
                                         Approaches
1. Determine entities based on the schema
   only keyword matches within one entity or
   apply LCA at entity-level
   done manually or using schema partitioning with
   cardinality as criterion
    Lowest Entity Node, Minimal Information Unit, XSeek...

2. Determine entities by connecting keyword matches
   Answer entity:
    contains (at least) one match for each keyword

                                                             60
Computing Query Answers
   Connecting Keyword Matches
 Classic concept: Lowest Common Ancestor (LCA)
  Use LCA to find the maximally specific concept that is
  common to the keyword matches

 LCA: Lowest node that is ancestor to all elements in
 an answer set




                                                         61
Computing Query Answers
   Lowest Common Ancestor (LCA)
                                                LCA(S1={3,11}) = 2
                                                                        (1)
                                                                        bib



                                            (2)                                              (13)
                                          article                                           article



                (3)        (4)             (5)               (12)                  (14)      (15)       (16)               (23)
               title      year           authors           journal                 title     year     authors            journal



                                   (6)            (9)                                                  (17)      (20)
...Semantic Web...     2005                                  Computer Journal   ...XML...   2003                             Web Journal
                                 author         author                                                author    author



                         (7)       (8)              (10)     (11)                             (18)     (19)      (21)         (22)
                        first      last              first     last                             first     last      first         last



                       John       Doe           Mary       Smith                            Peter     Jones     Sue         Robinson




                                                                                                                                           62
Computing Query Answers
   Lowest Common Ancestor (LCA)
                       LCA(S2={11,23}) = 1 — False positive
                                                                         (1)
                                                                         bib



                                             (2)                                              (13)
                                           article                                           article



                (3)         (4)             (5)               (12)                  (14)      (15)       (16)               (23)
               title       year           authors           journal                 title     year     authors            journal



                                    (6)            (9)                                                  (17)      (20)
...Semantic Web...      2005                                  Computer Journal   ...XML...   2003                             Web Journal
                                  author         author                                                author    author



                          (7)       (8)              (10)     (11)                             (18)     (19)      (21)         (22)
                         first      last              first     last                             first     last      first         last



                        John       Doe           Mary       Smith                            Peter     Jones     Sue         Robinson




                                                                                                                                            63
2.1.1
Improving LCA
Approaches to improve LCA for
      computing answer entities
                              64
Computing Query Answers
   Lowest Common Ancestor (LCA)
 Many approaches to improve over LCA
  SLCA, MLCA, Interconnection Semantics, VLCA, Amoeba
  Join...

 Filter LCAs to avoid false positives
 Result is subset of LCA result




                                                        65
Improving LCA
             Overview of Approaches
1. Interconnection relationship
2. XKSearch: Smallest LCA
3. Meaningful LCA (MLCA)
4. Amoeba Join
5. (Valuable LCA (VLCA), Compact LCA, CVLCA)
6. (Relaxed Tightest Fragment (RTF))
7. XRank: Exclusive LCA


                                               66
Improving LCA
        Interconnection Relationship
  Idea: Two different nodes with the same label
  correspond to different entities of the same type.
  Two nodes are interconnected if,
  on the shortest path between them,
  every node label occurs only once

  Interconnection in answer sets:
  star-related: a node in Si interconnected with all nodes in Si
  all-pairs related: all nodes in Si pair-wise interconnected


                                                                   67
Improving LCA
        Interconnection Relationship
  Idea: Two different nodes with the same label
  correspond to different entities of the same type.
  Two nodes are interconnected if,
  on the shortest path between them,
  every node label occurs only once

  Interconnection in answer sets:
  star-related: a node in Si interconnected with all nodes in Si
  all-pairs related: all nodes in Si pair-wise interconnected
                                           S. Cohen, J. Mamou, Y. Kanza, and Y.
                                               Sagiv. XSearch: A Semantic
                                           Search Engine for XML. VLDB 2003
                                                                                  67
Improving LCA
        Interconnection Relationship

                                                                        (1)
                                                                        bib



                                            (2)                                              (13)
                                          article                                           article



                (3)        (4)             (5)               (12)                  (14)      (15)       (16)               (23)
               title      year           authors           journal                 title     year     authors            journal



                                   (6)            (9)                                                  (17)      (20)
...Semantic Web...     2005                                  Computer Journal   ...XML...   2003                             Web Journal
                                 author         author                                                author    author



                         (7)       (8)              (10)     (11)                             (18)     (19)      (21)         (22)
                        first      last              first     last                             first     last      first         last



                       John       Doe           Mary       Smith                            Peter     Jones     Sue         Robinson




                                                                                                                                           68
Improving LCA
        Interconnection Relationship
                                                                        (1)
                                                                        bib




                                            (2)                                              (13)
                                          article                                           article




                (3)        (4)             (5)               (12)                  (14)      (15)       (16)               (23)
               title      year           authors           journal                 title     year     authors            journal



                                   (6)            (9)                                                  (17)      (20)
...Semantic Web...     2005                                  Computer Journal   ...XML...   2003                             Web Journal
                                 author         author                                                author    author



                         (7)       (8)              (10)     (11)                             (18)     (19)      (21)         (22)
                        first      last              first     last                             first     last      first         last



                       John       Doe           Mary       Smith                            Peter     Jones     Sue         Robinson




                                                                                                                                           69
Improving LCA
        Interconnection Relationship
  For |K| = 2, star-related ≡ all-pairs related
  Consider S1={3, 8, 11}




                                                  70
Improving LCA
        Interconnection Relationship
                                  3 and 8 are interconnected
                                                                        (1)
                                                                        bib



                                            (2)                                              (13)
                                          article                                           article



                (3)        (4)             (5)               (12)                  (14)      (15)       (16)               (23)
               title      year           authors           journal                 title     year     authors            journal



                                   (6)            (9)                                                  (17)      (20)
...Semantic Web...     2005                                  Computer Journal   ...XML...   2003                             Web Journal
                                 author         author                                                author    author



                         (7)       (8)              (10)     (11)                             (18)     (19)      (21)         (22)
                        first      last              first     last                             first     last      first         last



                       John       Doe           Mary       Smith                            Peter     Jones     Sue         Robinson




                                                                                                                                           71
Improving LCA
        Interconnection Relationship
                                                     ...so are 3 and 11
                                                                        (1)
                                                                        bib



                                            (2)                                              (13)
                                          article                                           article



                (3)        (4)             (5)               (12)                  (14)      (15)       (16)               (23)
               title      year           authors           journal                 title     year     authors            journal



                                   (6)            (9)                                                  (17)      (20)
...Semantic Web...     2005                                  Computer Journal   ...XML...   2003                             Web Journal
                                 author         author                                                author    author



                         (7)       (8)              (10)     (11)                             (18)     (19)      (21)         (22)
                        first      last              first     last                             first     last      first         last



                       John       Doe           Mary       Smith                            Peter     Jones     Sue         Robinson




                                                                                                                                           72
Improving LCA
        Interconnection Relationship
                                            ...but 8 and 11 are not
                                                                           (1)
                                                                           bib



                                               (2)                                                (13)
                                             article                                             article



             (3)             (4)               (5)             (12)                     (14)      (15)       (16)               (23)
            title           year             authors         journal                    title     year     authors            journal




                                     (6)           (9)                                                      (17)      (20)
...Semantic Web...   2005                                         Computer Journal   ...XML...   2003                             Web Journal
                                   author        author                                                    author    author




                         (7)          (8)       (10)       (11)                                    (18)     (19)      (21)         (22)
                        first         last       first       last                                    first     last      first         last



                        John         Doe       Mary       Smith                                  Peter     Jones     Sue         Robinson




                                                                                                                                                73
Improving LCA
        Interconnection Relationship
  S1={3, 8, 11} is
  star-related but
  not all-pairs related

  False negative when using all-pairs relatedness
  Consider S1={3,7,9}




                                                    74
Improving LCA
        Interconnection Relationship
                                                                       (1)
                                                                       bib



                                         (2)                                                (11)
                                       article                                             article




              (3)       (4)             (5)                 (10)                  (12)      (13)           (14)                (19)
             name      year           authors             journal                 title     year         authors             journal




                                (6)            (8)                                                         (15)     (17)
...Semantic Web...   2005                                   Computer Journal   ...XML...   2003                                  Web Journal
                              author         author                                                       author   author




                                (7)                 (9)                                                   (16)        (18)
                               name                name                                                  name        name




                            John Doe             Mary Smith                                          Peter Jones    Sue Robinson




                                                                                                                                               75
Improving LCA
        Interconnection Relationship
                                                                       (1)
                                                                       bib



                                         (2)                                                (11)
                                       article                                             article




              (3)       (4)             (5)                 (10)                  (12)      (13)           (14)                (19)
             name      year           authors             journal                 title     year         authors             journal




                                (6)            (8)                                                         (15)     (17)
...Semantic Web...   2005                                   Computer Journal   ...XML...   2003                                  Web Journal
                              author         author                                                       author   author




                                (7)                 (9)                                                   (16)        (18)
                               name                name                                                  name        name




                            John Doe             Mary Smith                                          Peter Jones    Sue Robinson




                                                                                                                                               76
Improving LCA
         Interconnection Relationship
                                                                          (1)
                                                                          bib



                                          (2)                                                  (11)
                                        article                                               article



               (3)      (4)              (5)                 (10)                    (12)      (13)           (14)                (19)
              name     year            authors             journal                   title     year         authors             journal



                                 (6)            (8)                                                           (15)     (17)
...Semantic Web...   2005                                      Computer Journal   ...XML...   2003                                  Web Journal
                               author         author                                                         author   author




                               (7)                   (9)                                                     (16)        (18)
                              name                  name                                                    name        name




                            John Doe              Mary Smith                                            Peter Jones    Sue Robinson




                                                                                                                                                  77
Improving LCA
        Interconnection Relationship
  S1={3,7,9} is a false negative in both measures
  False positives for both when node labels are
  different but refer to similar concepts
  e.g. “article” vs. “book” vs. “text”

  False negatives when node labels are identical but
  refer to different concepts
  e.g. the name of a person vs. the name of a journal




                                                        78
Improving LCA
     XKSearch: Smallest LCA (SLCA)
 Idea: Enhance LCA with a minimality constraint
 SLCA nodes are LCAs which
  do not have LCA nodes among their descendants

 Similar to XRank/ELCA but stricter




                                                  79
Improving LCA
     XKSearch: Smallest LCA (SLCA)
 Idea: Enhance LCA with a minimality constraint
 SLCA nodes are LCAs which
  do not have LCA nodes among their descendants

 Similar to XRank/ELCA but stricter




                                      Y. Xu and Y. Papakonstantinou. Efficient
                                      Keyword Search for Smallest LCAs in
                                          XML Databases. SIGMOD 2005



                                                                           79
Improving LCA
     XKSearch: Smallest LCA (SLCA)
                                                                        (1)
                                                                        bib



                                            (2)                                              (13)
                                          article                                           article



                (3)        (4)             (5)               (12)                  (14)      (15)       (16)               (23)
               title      year           authors           journal                 title     year     authors            journal



                                   (6)            (9)                                                  (17)      (20)
...Semantic Web...     2005                                  Computer Journal   ...XML...   2003                             Web Journal
                                 author         author                                                author    author



                         (7)       (8)              (10)     (11)                             (18)     (19)      (21)         (22)
                        first      last              first     last                             first     last      first         last



                       John       Doe           Mary       Smith                            Peter     Jones     Sue         Robinson




                                                                                                                                           80
Improving LCA
     XKSearch: Smallest LCA (SLCA)
                                                            No false positive

                                                                                                              (1)
                                                                                                              bib



                                            (2)                                                     (13)                                               (1035)
                                                                                                                     ...
                                          article                                                  article                                             article



                (3)        (4)             (5)                 (12)               (14)     (15)            (16)                  (23)                        (1038)
                                                                                                                                                 ...                         ...
               title      year           authors             journal              title    year          authors               journal                      authors



                                   (6)            (9)                                                     (17)              (20)                                 (1039)
...Semantic Web...     2005                                 Computer Journal   ...XML...   2003                                          Web Journal
                                 author         author                                                   author            author                                author



                         (7)       (8)              (10)    (11)                            (18)              (19)          (21)          (22)              (1040)        (1041)
                        first      last              first    last                            first              last          first          last               first          last



                       John       Doe           Mary       Smith                           Peter             Jones          Sue          Robinson            Tom          Smith




                                                                                                                                                                                   81
Improving LCA
     XKSearch: Smallest LCA (SLCA)
                                                           But of course...
                                                                         (1)
                                                                         bib



                                            (2)                                               (13)
                                          article                                            article



                (3)        (4)             (5)                (12)                  (14)      (15)       (16)               (23)
               title      year           authors            journal                 title     year     authors            journal



                                   (6)            (9)                                                   (17)      (20)
...Semantic Web...     2005                                   Computer Journal   ...XML...   2003                             Web Journal
                                 author         author                                                 author    author



                         (7)       (8)              (10)      (11)                             (18)     (19)      (21)         (22)
                        first      last              first      last                             first     last      first         last



                       John       Doe           Mary        Smith                            Peter     Jones     Sue         Robinson




                                                                                                                                            82
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web
Web Queries: From a Web of Data to a Semantic Web

More Related Content

What's hot

Neo4j Training Cypher
Neo4j Training CypherNeo4j Training Cypher
Neo4j Training CypherMax De Marzi
 
Performance of graph query languages
Performance of graph query languagesPerformance of graph query languages
Performance of graph query languagesAthiq Ahamed
 
MySQL & NoSQL from a PHP Perspective
MySQL & NoSQL from a PHP PerspectiveMySQL & NoSQL from a PHP Perspective
MySQL & NoSQL from a PHP PerspectiveTim Juravich
 
Xcap tutorial
Xcap tutorialXcap tutorial
Xcap tutorialwanglixue
 
NOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jNOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jTobias Lindaaker
 
쉽게 이해하는 LOD
쉽게 이해하는 LOD쉽게 이해하는 LOD
쉽게 이해하는 LODMyungjin Lee
 
JSON-LD update DC 2017
JSON-LD update DC 2017JSON-LD update DC 2017
JSON-LD update DC 2017Gregg Kellogg
 
Query-Load aware partitioning of RDF data
Query-Load aware partitioning of RDF dataQuery-Load aware partitioning of RDF data
Query-Load aware partitioning of RDF dataLuis Galárraga
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge GraphTrey Grainger
 
CSC 8101 Non Relational Databases
CSC 8101 Non Relational DatabasesCSC 8101 Non Relational Databases
CSC 8101 Non Relational Databasessjwoodman
 
Intro to Neo4j - Nicole White
Intro to Neo4j - Nicole WhiteIntro to Neo4j - Nicole White
Intro to Neo4j - Nicole WhiteNeo4j
 
Getting started with Graph Databases & Neo4j
Getting started with Graph Databases & Neo4jGetting started with Graph Databases & Neo4j
Getting started with Graph Databases & Neo4jSuroor Wijdan
 
Family tree of data – provenance and neo4j
Family tree of data – provenance and neo4jFamily tree of data – provenance and neo4j
Family tree of data – provenance and neo4jM. David Allen
 
GoodRelations Tutorial Part 4
GoodRelations Tutorial Part 4GoodRelations Tutorial Part 4
GoodRelations Tutorial Part 4guestecacad2
 
Relational to Graph - Import
Relational to Graph - ImportRelational to Graph - Import
Relational to Graph - ImportNeo4j
 
NCompass Live: Linked Data and Libraries: What? Why? How?
NCompass Live: Linked Data and Libraries: What? Why? How?NCompass Live: Linked Data and Libraries: What? Why? How?
NCompass Live: Linked Data and Libraries: What? Why? How?Nebraska Library Commission
 
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)Emil Eifrem
 

What's hot (20)

Neo4j Training Cypher
Neo4j Training CypherNeo4j Training Cypher
Neo4j Training Cypher
 
ETL into Neo4j
ETL into Neo4jETL into Neo4j
ETL into Neo4j
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Performance of graph query languages
Performance of graph query languagesPerformance of graph query languages
Performance of graph query languages
 
MySQL & NoSQL from a PHP Perspective
MySQL & NoSQL from a PHP PerspectiveMySQL & NoSQL from a PHP Perspective
MySQL & NoSQL from a PHP Perspective
 
Xcap tutorial
Xcap tutorialXcap tutorial
Xcap tutorial
 
NOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jNOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4j
 
쉽게 이해하는 LOD
쉽게 이해하는 LOD쉽게 이해하는 LOD
쉽게 이해하는 LOD
 
JSON-LD update DC 2017
JSON-LD update DC 2017JSON-LD update DC 2017
JSON-LD update DC 2017
 
Query-Load aware partitioning of RDF data
Query-Load aware partitioning of RDF dataQuery-Load aware partitioning of RDF data
Query-Load aware partitioning of RDF data
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
CSC 8101 Non Relational Databases
CSC 8101 Non Relational DatabasesCSC 8101 Non Relational Databases
CSC 8101 Non Relational Databases
 
Intro to Neo4j - Nicole White
Intro to Neo4j - Nicole WhiteIntro to Neo4j - Nicole White
Intro to Neo4j - Nicole White
 
Getting started with Graph Databases & Neo4j
Getting started with Graph Databases & Neo4jGetting started with Graph Databases & Neo4j
Getting started with Graph Databases & Neo4j
 
Family tree of data – provenance and neo4j
Family tree of data – provenance and neo4jFamily tree of data – provenance and neo4j
Family tree of data – provenance and neo4j
 
GoodRelations Tutorial Part 4
GoodRelations Tutorial Part 4GoodRelations Tutorial Part 4
GoodRelations Tutorial Part 4
 
RDFa Tutorial
RDFa TutorialRDFa Tutorial
RDFa Tutorial
 
Relational to Graph - Import
Relational to Graph - ImportRelational to Graph - Import
Relational to Graph - Import
 
NCompass Live: Linked Data and Libraries: What? Why? How?
NCompass Live: Linked Data and Libraries: What? Why? How?NCompass Live: Linked Data and Libraries: What? Why? How?
NCompass Live: Linked Data and Libraries: What? Why? How?
 
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
 

Similar to Web Queries: From a Web of Data to a Semantic Web

CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialLeeFeigenbaum
 
RDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use itRDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use itJose Luis Lopez Pino
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overviewAmit Sheth
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Peter Mika
 
NetIKX Semantic Search Presentation
NetIKX Semantic Search PresentationNetIKX Semantic Search Presentation
NetIKX Semantic Search Presentationurvics
 
Semantic Search Tutorial at SemTech 2012
Semantic Search Tutorial at SemTech 2012 Semantic Search Tutorial at SemTech 2012
Semantic Search Tutorial at SemTech 2012 Thanh Tran
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherObjectRocket
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBJustin Smestad
 
Kellogg XML Holland Speech
Kellogg XML Holland SpeechKellogg XML Holland Speech
Kellogg XML Holland SpeechDave Kellogg
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic WebRoberto García
 
Linked Open Data and Digital Curation (Islandora)
Linked Open Data and Digital Curation (Islandora)Linked Open Data and Digital Curation (Islandora)
Linked Open Data and Digital Curation (Islandora)Hong (Jenny) Jing
 
Approaches to mobile site development
Approaches to mobile site developmentApproaches to mobile site development
Approaches to mobile site developmentErik Mitchell
 
ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2Martin Hepp
 
GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2guestecacad2
 
Elastic Web Mining
Elastic Web MiningElastic Web Mining
Elastic Web MiningKen Krugler
 
Web search engines and search technology
Web search engines and search technologyWeb search engines and search technology
Web search engines and search technologyStefanos Anastasiadis
 

Similar to Web Queries: From a Web of Data to a Semantic Web (20)

CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web Tutorial
 
Semtech2006
Semtech2006Semtech2006
Semtech2006
 
Metadata is back!
Metadata is back!Metadata is back!
Metadata is back!
 
RDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use itRDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use it
 
DataHub
DataHubDataHub
DataHub
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overview
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
 
NetIKX Semantic Search Presentation
NetIKX Semantic Search PresentationNetIKX Semantic Search Presentation
NetIKX Semantic Search Presentation
 
Semantic Search Tutorial at SemTech 2012
Semantic Search Tutorial at SemTech 2012 Semantic Search Tutorial at SemTech 2012
Semantic Search Tutorial at SemTech 2012
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better Together
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Kellogg XML Holland Speech
Kellogg XML Holland SpeechKellogg XML Holland Speech
Kellogg XML Holland Speech
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
Linked Open Data and Digital Curation (Islandora)
Linked Open Data and Digital Curation (Islandora)Linked Open Data and Digital Curation (Islandora)
Linked Open Data and Digital Curation (Islandora)
 
Approaches to mobile site development
Approaches to mobile site developmentApproaches to mobile site development
Approaches to mobile site development
 
ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2
 
GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2
 
Elastic Web Mining
Elastic Web MiningElastic Web Mining
Elastic Web Mining
 
Mythbusters
MythbustersMythbusters
Mythbusters
 
Web search engines and search technology
Web search engines and search technologyWeb search engines and search technology
Web search engines and search technology
 

Recently uploaded

Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 

Recently uploaded (20)

Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 

Web Queries: From a Web of Data to a Semantic Web

  • 1. Web Queries From a Web of Data to a Semantic Web François Bry, Tim Furche, Klara Weiand based on work in the European project KiWi “Knowledge in a Wiki” 1
  • 2. 2
  • 3. 3
  • 5. ? 5
  • 6. ? but what does Web querying do? 5
  • 7. 1 100 200 300 400 500 Share value: 27.40$ 6
  • 8. 1 100 200 300 400 500 Share value: 27.40$ Automatically sell at < 28.00$ Action! 6
  • 9. ? 7
  • 10. ? if you believe that I have some bank stocks for you ... 7
  • 11. or query: value of specific element Share value: 27.40$ 8
  • 12. or query: value of specific element Share value: 27.40$ rule: automatically sell at < 28.00$ Action! 8
  • 13. Web Search Web Queries Scale “Web”, TB/PBs a few “documents”, GBs for basic selection lang. high, Parallelizability very high (NC) otherwise very low independent documents, trees (XML) or graphs (RDF), Data heterogenous homogenous (almost) everyone, many casual Used by few experts users Expertise Level/Knowledge low very high required Expressiveness very low very high Result presentation Ranking, clustering... programmed Matching often fuzzy or vague only precise answers Documents (or summaries Return value parts of trees or graphs thereof) Actionability very low very high 9
  • 14. web search Expressive Power Ease of use web queries In a nutshell... 10
  • 15. Web Queries Search + Query = Easy + Automation Goals: make web querying accessible to casual users e.g., to enable data aggregation/mashups by users allow precise queries over vastly heterogeneous data precise queries and rules critical to the (Social) Semantic Web unless they are accessible no widespread adoption Classification of approaches to combining Web search & query enhance search: add data extraction / object search enhance querying: add keyword search / information retrieval keyword-based QLs for structured data: grounds-up redesign 11
  • 16. Search + Query Approach 1: Enhance Search “Peek” into web documents Extract data items / Web objects from web documents to provide more fine-grained answers Examples Google (Squared) Google Rich Snippets Yahoo! Search Monkey 12
  • 17. Search + Query Approach 1: Enhance Search “Peek” into web documents Extract data items / Web objects from web documents to provide more fine-grained answers Examples Google (Squared) Google Rich Snippets Yahoo! Search Monkey 12
  • 18. Search + Query Approach 2: Enhance Querying Enhance existing (XML) web query languages by adding information retrieval functionality ranking, scoring, fuzzy matching Examples XQuery and XPath Full Text 1.0, W3C Cand. Recommendation 13
  • 19. Search + Query Approach 2: Enhance Querying Enhance existing (XML) web query languages by adding information retrieval functionality ranking, scoring, fuzzy matching Examples XQuery and XPath Full Text 1.0, W3C Cand. Recommendation 1 doc('bib.xml')/bib/book[title ftcontains 'programming'] 2 for where $b score $s in doc('bib.xml')/bib/book $b ftcontains 'web' ftand 'query' order by $s descending return <title> {$b//title} </title> 13
  • 20. Search + Query Approach 3: Keyword Queries Keyword query for structured Web (XML and RDF) data apply web search paradigm to querying tree and graph data operates in the same setting as e.g. XQuery and SPARQL few or single document, no Web scale but: easier handling of very heterogeneous data querying of semi-structured data through easy-to-use interface Ultimate goal: Easy usage combined with enough power → to automate data processing tasks 14
  • 21. Search + Query Approach 3: Keyword Queries Keyword query for structured Web (XML and RDF) data apply web search paradigm to querying tree and graph data operates in the same setting as e.g. XQuery and SPARQL few or single document, no Web scale K. Weiand, T. Furche, and F. Bry. Quo vadis, web queries? but: easier handling of very heterogeneous data In Web4Web, 2008. querying of semi-structured data through easy-to-use interface Ultimate goal: Easy usage combined with enough power → to automate data processing tasks Focus of this tutorial 14
  • 22. Web Queries Overview 1. Summary of web query research in the 00s 1.1. XML 1.2. RDF 2. Keyword query languages 2.1. Motivation 2.2. Classification 2.3. Issues 3. KWQL 4. Discussion and Outlook 15
  • 23. 16
  • 24. Part 1 Web Queries Summary of XML & RDF Query Language Research 17
  • 25. 1.1 XML Tree Data & Tree Queries Data–XPath–XQuery 18
  • 26. nted of that element. The following listing shows a small XML . Fi- fragment that illustrates elements and element nesting: s an <bib xmlns:dc="http://purl.org/dc/elements/1.1/"> n of 2 <article journal="Computer Journal" id="12"> -E). <dc:title>...Semantic Web...</dc:title> how 4 <year>2005</year> <authors> m to 6 <author> the <first>John</first> <last>Doe</last> </author> Web 8 <author> ches <first>Mary</first> <last>Smith</last> </author> most 10 </authors> </article> ased 12 <article journal="Web Journal"> the <dc:title>...Web...</dc:title> arch 14 <year>2003</year> <authors> 16 <author> <first>Peter</first> <last>Jones</last> </author> 18 <author> <first>Sue</first> <last>Robinson</last> </author> tion 20 </authors> XML </article> eral. 22 </bib> lica- ML, In addition, we can observe attributes (name, value (sometimes graph), mostly uniform edges ordered tree pairs and associated with start tags) that are essentially like elements rop- but may only contain character data, no other nested 19 122] attributes or elements. Also, by definition, element order is tion significant, attribute order is not. For instance
  • 27. ery languages ble interfaces we can write other elements or character data as children implemented of that element. The following listing shows a small XML ion VI-D). Fi- fragment that illustrates elements and element nesting: ries not as an <bib xmlns:dc="http://purl.org/dc/elements/1.1/"> r extension of 2 <article journal="Computer Journal" id="12"> Section VI-E). <dc:title>...Semantic Web...</dc:title> mary of how 4 <year>2005</year> <authors> d RDF aim to 6 <author> !"# ther with the <first>John</first> <last>Doe</last> </author> aditional Web 8 <author> $%$ g approaches <first>Mary</first> <last>Smith</last> </author> , are the most 10 </authors> </article> eyword-based 12 <article journal="Web Journal"> nges, and the <dc:title>...Web...</dc:title> of Web search 14 <year>2003</year> <authors> 16 <author> !&# !"-#. ND RDF <first>Peter</first> <last>Jones</last> </author> 18 <author> '()%*+, '()%*+, <first>Sue</first> <last>Robinson</last> </author> epresentation 20 </authors> </article> ata in general. 22 </bib> er of applica- kup (XHTML, In addition, we can observe attributes (name, value pairs G 7 [141]) and associated with start tags) that are essentially like elements (Apple’s prop- !-# but may only contain character data, no other nested !3# !5#. !"&# !"3# !"5# !"/#. !&-# nd XSLT [122] attributes or elements. Also, by definition, element order is r serialization significant, attribute order is not. For instance )%)+, 4,'( '0)12(6 720(8'+ )%)+, 4,'( '0)12(6 720(8'+ rmats such as <author><last>Doe</last><first>John</first></author> describing the represents different information than the author element rText Markup in lines 6–9, but e on the web, <article id="12" journal="Computer are fixed. XML Journal">...</article>!!!"#$%&'()*" !/#. !9#. !":# !&=#. by specifying 4556 7.%89($2"-.92'&: !!!+$,!!! 455> +$,"-.92'&: represents the same element+$,"!!! item as lines 2–15. information Figure 1 gives a graphical representation of the XML doc- '0)12( '0)12( '0)12( '0)12( ation in XML ument that is referenced in preceding illustrations. When et [68] which represented as a graph, an XML document without links is ML document. a labeled tree where each node in the tree corresponds to parts, closely an element and its type. Edges connect nodes and their children, that is, elements and the elements nested in el, we provide ates from the them, elements and their content and elements and their !:# !<# !"=# !""# !"<# !"9#. !&"# !&&# attributes. Since the visual distinction between the parent- ved and thus ped. This view child relationship can be made without edge labels and ;(6) +'6) ;(6) +'6) ;(6) +'6) ;(6) +'6) since attributes are not addressed or receive no special uch as Xcerpt treatment in the research presented in this text, edges will follow XPath not be labeled in the following figures. Elements, attributes, and character data are XML’s most XML is a syn- common information types. In addition, XML documents ems are called may also contain comments, processing instructions (name- end tags, both -./' 0.$ 1&23 #%)(/ ;$($2 -.'$< #9$ =.,)'<.' XML value pair with specific semantics that can be placed any- r>...</author> where an element can be placed), document level informa- place of ‘. . . ’, tion (such as the XML or the document type declarations), entities, and notations, which are essentially just other kinds of information containers. Fig. 1. Visual representation of sample XML document ordered tree (sometimes graph), mostly uniform edges On top of these information types, two additional fa- B. Resource Description Framework (RDF) cilities relevant to the information content of XML docu- 19 As the second preeminent data format on the S ments are introduced by subsequent specifications: Names- Web, the Resource Description Format (RDF) [109 paces [33] and Base URIs [140]. Namespaces allow the
  • 28. ? what’s new about trees? didn’t we try that before? 20
  • 29. r ce sto an self parent child preceding-sibling following-sibling fol ing lo win d ce g pre descendant XPath axis for navigation/selection in tree, context, horizontal axes for order 21
  • 30. XPath: Navigation in Trees Intro in 5 Points Data: rooted, ordered, unranked, finite trees (no ID/IDREF resolution) Paths are sequences of steps (axis & test on properties of node) adorned with existential predicates (in []) to obtain tree queries some more advanced features: value joins, aggregation, … Answers are sets of nodes from the input document /child::html/descendant::h1[not(preceding::h1  =   “Introduction”)]/child::p[attribute::class=”abstract”] no variables, no construction, no grouping/ordering 22
  • 31. axis Nodes (n) = {(n : R axis (n, n )} are conc λ Nodes (n) = {(n : Labλ (n )} widely su node() Nodes (n) = Nodes(T ) tension o axis::nt[qual] Nodes (n)= {n : n ∈ axis Nodes ∧ n ∈ nt Nodes ∧ in XQue qual Bool (n )} in XQue step/path Nodes (n) = {n : n ∈ step Nodes (n)∧ which ca n ∈ path Nodes (n )} Indeed, path1 ∪ path2 Nodes (n) = path1 Nodes (n) ∪ path2 Nodes (n) normaliz h) X path Bool (n) = path Nodes (n) sion of X use cases path1 ∧ path2 Bool (n) = path1 Bool (n) ∧ path2 Bool (n) not only path1 ∨ path2 Bool (n) = path1 Bool (n) ∨ path2 Bool (n) The mos ¬path Bool (n) = ¬ path Bool (n) 1) Sequ lab() = λ Bool (n) = Labλ (n) expr path1 = path2 Bool (n) = ∃n , n : n ∈ path1 Nodes (n)∧ sequ n ∈ path2 Nodes (n)∧ (n , n ) from 23 trast TABLE I of n
  • 32. XPath: Navigation in Trees A Success Story Commercial success: One of the most successful QLs widely implemented and used, several W3C standards Research success: designed with little concern for formal “beauty” but with few restrictions: turns out it hit a formal sweet spot Expressiveness: monadic datalog (datalog with only unary intensional predicates, i.e., answer predicates) also: two-variable first-order logic (FO2) Polynomial complexity, linear for navigational XPath 24
  • 33. XPath: Navigation in Trees A Success Story (cont.) Completeness: falls short of being first-order complete for first-order completeness a “conditional axis” (UNTIL operator) needed e.g., all a nodes reachable by a path of only b nodes but: partially justified by results on elimination of reverse axes reverse axis: parent, ancestor, preceeding, … eliminating these axes possible in navigational XPath though in few cases at exponential cost or resulting in introduction of expensive node-identity joins we can safely ignore them in most cases in conditional XPath this elimination is not possible 25
  • 34. XPath: Navigation in Trees A Success Story (cont.) Completeness: falls short of being first-order complete for first-order completeness a “conditional axis” (UNTIL operator) needed XPath, the M. Marx. Conditional First Order Complete XPath e.g., all a nodes reachable by a path of only b nodes Dialect. PODS 2004. but: partially justified by results on elimination of reverse axes reverse axis: parent, ancestor, preceeding, … eliminating these axes possible in navigational XPath D. Olteanu, H. Meuss, T. Furche, and F. Bry. XPath: Looking Forward. though in few cases at exponential cost or resulting in XMLDM @ EDBT, LNCS 2490, 2002. introduction of expensive node-identity joins we can safely ignore them in most cases in conditional XPath this elimination is not possible C. Ley and M. Benedikt. How big must complete XML query languages be? ICDT 2009. 25
  • 35. Following (x, y ) :⇔ x pre y ∧ x post y From these axes, all others can be defined in first-order logic. pre- and post-orders are sufficient to represent the full tree XPath: Navigation in Trees structure. A Success Story (cont.) Evaluation: naïve decomposition: Structural Joins sequence of joins, descendant with closure over child relation Computing descendants by a single theta-join on the better: structural joins for descendant (single lookup using tree labeling) representation relation. e.g., pre/post encoding R pre post lab a:1:7 1 7 a fits nicely with relational storage 2 3 b b:2:3 a:5:6 3 1 a 4 2 c 5 6 a a:3:1 c:4:2 b:6:4 d:7:5 6 4 b 7 5 d twig joins: stack-based, holistic, not easily adapted to relational DBS Example CREATE VIEW descendant AS SELECT r1.pre, r2.pre FROM R r1, R r2 WHERE r1.pre r2.pre AND r2.post r1.post; 26 Such joins are called structural joins [Al-Khalifa et al., 2002].
  • 36. XPath: Navigation in Trees A Success Story (cont.) 1 simple, fairly easy to learn 2 formal “sweet” spot 3 efficient evaluation 27
  • 37. XPath: Navigation in Trees A Success Story (cont.) 1 simple, fairly easy to learn M. Benedikt and C. Koch. 2 formal “sweet” spot XPath Leashed. In ACM Computing Surveys, 2007 3 efficient evaluation 27
  • 39. let  $auction  :=  doc(’auction.xml’)   for  $o  in  $auction/open_auctions/open_auction   return   corrupt  id=’{  $o/@id  }’   {  if  (sum($o/(initial  |  bidder/increase))  =  $o/current)   then  text  {  ’no’  }   else  $auction//people/person[@id  =  $o//@person]/name  }   /corrupt   29
  • 40. XQuery: Graph Queries Construction From XPath to Hell Adds an enormous set of features to XPath price: highly complex language, Turing-complete, very hard to implement here: just some highlights Variables: XPath plus variables → no longer FO2 “an article that has the conference’s chair as author” not any more polynomial, but NP-/PSPACE-complete Construction: harmless if only at end of query but: composition (i.e., querying of data constructed in the same query) “find all papers written by the author with the most papers” NEXPTIME-complete or in EXPSPACE (can not be captured by datalog) 30
  • 41. XQuery: Graph Queries Construction From XPath to Hell Recursion: programmable with composition → Turing complete i.e., like Prolog or datalog with value invention Sequences: rather than sets without composition little effect with composition: drastically harder optimization as iteration semantics of a query must be precisely observed 31
  • 42. s type [α] (Haskell), Array (Ruby) or 0 Q3 iter pos item for y in properly reflect this bleT (Linq). To 2001 to 2008 loop(s0 ) loop(s1 ) πiter:outer, Operator 1 1 ’WD/CR/PR’ Seman return herently unordered relational database s1 iter iter pos:pos1 , item 1 2 ’WD/CR/PR’ . . . . . . if ( y lt 2007) we embed order in the data and use 1 ’REC’ 1 pos1 :sort,posπa:b. . . project ···else . σa 1 8 ’REC’ select r then ’WD/CR/PR’ elseon the bles with columns pos|item (shown ’REC’ iter pos item . ✶ 7 1 ’REC’ . iter=inner × Cartes present such sequences. Note that the 8 1 ’REC’ . s need not be dense and not even be of 8∪ ✶a=b , a=b . equi-jo (a) Query Q3 . ordered domain will do. In specific cases (b) Associated loop tables. @item:’REC’ @item:’WD/CR/PR’ ∪, ∪, (disjoin may already reflected by the items xi @pos:1 @pos:1 δ elimina ···then f a sequence of encoded nodes resulting πiter πiter @a:c pos ’WD/···’ iter item attach on step evaluation)—column itemand its W3C Recommendation Figure 3: may σitem σitem1 #a 1 1 ’WD/CR/PR’ 2 1 ’WD/CR/PR’ attach track maturity level (Query Q3 ). Annotations ¬s0,1 e of pos. a:b1 ,..,bn . . . . . . attach item1 :item . . . y lt 2007 6 1 ’WD/CR/PR’ attach Query denote semantics is principally dynamic iteration scopes. #a:b1 ,..,bn /c iter pos item πiter,pos,item:item2 πa:b1 ,..,bn attach 1 1 true as the core language construct: any 2 1 true . . . item2 :item1 ,item outer:iter, sidered (Linq) . . . Enumerable.Range(2001,8).Select( to be iteratively evaluated in the agga:b/c inner, sort:pos attach . . . ✶ most enclosing for loop. The top level of ? ’WD/CR/PR’ : ’REC’ ) y = y 2007 8 1 false iter1 =iter ssumed (Links) to be wrapped inside the scope π :iter, for (y - [2001,2002,. . .,2008])iter11 :item @item:2007 Table 1: Excerpt of item gle-iteration loop for _ in (0) return e [if (y 2007) then WD/CR/PR else REC] occur free in e (i.e., the choice of 0 is y @pos:1 @pos:1 (with agg ∈ {count 2007 iter pos item iter pos item (Ruby) ndamental idea behind(2001..2008).collect { 2001 loop lifting is to 1 1 πiter:inner, πiter:inner 1 1 2007 item 2 1 2007 |y|a y 2007 ? ’WD/CR/PR’ : ’REC’ } ode that consumes and emits “fully un- 2 1 2002 . . . . . . . . . . . . . . . . . . esentation of e’s value. if y unrolling then WD/CR/PR else REC | #inner (Haskell) [ Here, 2007 8 1 2008 gramming languag 8 1 2007 le that y - [2001..2008] ] 2001 to 2008 iter pos item set-oriented algebr 1 1 2001 ble with schema iter|pos|item holds the if y 2007 else ’REC’ (Python) [ ’WD/CR/PR’ 1 2 2002 . . . work for database- for y in range(2001,2008) ] ues that e assumes during its iterative . . . . 1 . . 8 2008 not stumble if pro 32 stances [13]. Figure 4: A sample of iterative constructs found in able has key iter, pos since e may yield ’s companion languages (paraphrases of Q3 ). code to evaluatecode. s in each distinct iteration—only if e’s Figure 5: Loop-lifted algebraic Algebraic Q3 W
  • 43. s type [α] (Haskell), Array (Ruby) or 0 Q3 iter pos item for y in properly reflect this bleT (Linq). To 2001 to 2008 loop(s0 ) loop(s1 ) πiter:outer, Operator 1 1 ’WD/CR/PR’ Seman return herently unordered relational database s1 iter iter pos:pos1 , item 1 2 ’WD/CR/PR’ . . . . . . if ( y lt 2007) we embed order in the data and use 1 ’REC’ 1 pos1 :sort,posπa:b. . . project ···else . σa 1 8 ’REC’ select r then ’WD/CR/PR’ elseon the bles with columns pos|item (shown ’REC’ iter pos item . ✶ 7 1 ’REC’ . iter=inner × Cartes present such sequences. Note that the 8 1 ’REC’ . s need not be dense and not even be of 8∪ ✶a=b , a=b . equi-jo (a) Query Q3 . ordered domain will do. In specific cases (b) Associated loop tables. @item:’REC’ @item:’WD/CR/PR’ ∪, ∪, (disjoin may already reflected by the items xi @pos:1 @pos:1 δ elimina ···then f a sequence of encoded nodes resulting πiter πiter @a:c pos ’WD/···’ iter item attach on step evaluation)—column itemand its W3C Recommendation Figure 3: may σitem σitem1 #a 1 1 ’WD/CR/PR’ 2 1 ’WD/CR/PR’ attach track maturity level (Query Q3 ). Annotations ¬s0,1 e of pos. a:b1 ,..,bn . . . . . . attach . . item1 :item . y lt 2007 6 1 ’WD/CR/PR’ attach Query denote semantics is principally dynamic iteration scopes. #a:b1 ,..,bn /c iter pos item πiter,pos,item:item2 πa:b1 ,..,bn attach 1 1 true as the core language construct: any 2 1 true . . . item2 :item1 ,item outer:iter, sidered (Linq) . . . Enumerable.Range(2001,8).Select( to be iteratively evaluated in the agga:b/c inner, sort:pos attach . . . ✶ most enclosing for loop. The top level of ? ’WD/CR/PR’ : ’REC’ ) y = y 2007 8 1 false iter1 =iter ssumed (Links) to be wrapped inside the scope π :iter, for (y - [2001,2002,. . .,2008])iter11 :item @item:2007 Table 1: Excerpt of item gle-iteration loop for _ in (0) return e [if (y 2007) then WD/CR/PR else REC] occur free in e (i.e., the choice of 0 is y @pos:1 @pos:1 (with agg ∈ {count 2007 iter pos item iter pos item (Ruby) ndamental idea behind(2001..2008).collect { 2001 loop lifting is to 1 1 πiter:inner, πiter:inner 1 1 2007 item 2 1 2007 |y|a y 2007 ? ’WD/CR/PR’ : ’REC’ } ode that consumes and emits “fully un- 2 1 2002 . . . . . . . . . . . . . . . . . . esentation of e’s value. if y unrolling then WD/CR/PR else REC | #inner (Haskell) [ Here, 2007 8 1 2008 gramming languag 8 1 2007 le that P. Boncz, T. Grust, M. van Keulen, S. 2001 to 2008 y - [2001..2008] ] set-oriented algebr Manegold, J. Rittinger, J. Teubner. iter pos item 1 1 2001 MonetDB/XQuery: ’WD/CR/PR’ (Python) [ A Fast XQuery ble with schema iter|pos|item holds the if y 2007 else ’REC’ 1 2 2002 work for database- . . . Processor Poweredy in range(2001,2008) ] for by a ues that e assumes during its iterative . . . . . . not stumble if pro Relational Engine, SIGMOD 2006. 1 8 2008 32 stances [13]. Figure 4: A sample of iterative constructs found in able has key iter, pos since e may yield ’s companion languages (paraphrases of Q3 ). code to evaluatecode. s in each distinct iteration—only if e’s Figure 5: Loop-lifted algebraic Algebraic Q3 W
  • 44. XQuery: Graph Queries Construction From XPath to Hell 1 significantly harder than XPath composition recursion 2 expensive operations yet significant steps toward 3 fast implementations 33
  • 45. XQuery: Graph Queries Construction From XPath to Hell 1 significantly harder than XPath composition recursion M. Benedikt and C. Koch. 2 expensive operations Interpreting Tree-to-Tree Queries. ICALP 2006. yet significant steps toward 3 fast implementations 33
  • 46. XQuery: Graph Queries Construction From XPath to Hell 1 significantly harder than XPath composition recursion M. Benedikt and C. Koch. 2 expensive operations Interpreting Tree-to-Tree Queries. ICALP 2006. P. Boncz, T. Grust, et al. yet significant steps toward 3 fast implementations MonetDB/XQuery: A Fast XQuery Processor Powered by a Relational Engine, SIGMOD 2006. 33
  • 47. XQuery: Graph Queries Construction From XPath to Hell 1 significantly harder than XPath composition recursion M. Benedikt and C. Koch. 2 expensive operations Interpreting Tree-to-Tree Queries. ICALP 2006. P. Boncz, T. Grust, et al. yet significant steps toward 3 fast implementations MonetDB/XQuery: A Fast XQuery Processor Powered by a Relational Engine, SIGMOD 2006. 33
  • 48. 34
  • 50. Legend Other Smith Class Literal Resource Person last Mary type first Computer 11 Bag Journal Person number name Journal type _2 type type _1 first John 2005 isPartOf author year last smith2005 type Article Doe RDF title ...Semantic Web... unordered, unranked, unrooted finite graph, edge and node labeled, different node kinds 36
  • 51. ? what’s new about that? isn’t that just a fancy repr. of a ternary relation? 37
  • 52. RDF: Semantics for the Web? What’s new about RDF? data model of RDF very similar to relational but: “unique” identifiers in form of URIs that can be shared on the Web uniqueness only thanks to careful assignment practice (unlike GUIDs) human-readable (unlike GUIDs) but: existential information in form of blank nodes like named null values in SQL, also known as Codd tables but: implied information due to RDF/S semantics (and, thus, entailment) e.g., class hierarchy and typing, domain and range of properties, axiomatic triples notion of redundancy-free or lean graph/answer 38
  • 53. bib:hasPart.5 queries are rang (CONSTRUCT or S CONSTRUCT { ?j bib:hasPart ?a } (WHERE clause) o 2 WHERE { ?a rdf:type bib:Article AND ?a bib:isPartOf ?j AND ?j bib:name ‘Computer Journal’ } expressions (in c FILTER expressio The query illustrates SPARQLs fundamental query con- condition must struct: a pattern (s, p, o) for RDF triples (whose components first limitation i are usually thought of as subject, predicate, object). Any allowed in stand RDF triple is also a triple pattern, but triple patterns allow priori and rewritt variables for each component. Furthermore, SPARQL also (as FILTER expre allows literals in subject position, anticipating the same which, in turn, change also in RDF itself. We use the variant syntax for boolean value” i SPARQL discussed in [161] to ease the definition of syntax Finally, we a and semantics of the language. For instance, standard CONSTRUCT clause SPARQL, uses . instead of AND for triple conjunction. We variables occurri consider two forms of SPARQL queries, viz. SELECT queries literals, and all v that return list of variable bindings and CONSTRUCT queries only ever bound that return new RDF graphs. Triple patterns contained in a The first conditio CONSTRUCT clause (or “template”) are instantiated with the adding appropria variable bindings provided by the evaluation of the triple query body. pattern in the WHERE clause. We omit named graphs and Following [16 assume that all queries are on the single input graph. An queries based
  • 54. (s, p, o) D Subst = {θ : dom(θ) = Vars((s, p, o)) ∧ t θ ∈ D} pattern1 AND pattern2 D Subst = pattern1 D pattern2 D Subst Subst pattern1 UNION pattern2 D Subst = pattern1 D ∪ pattern2 D Subst Subst pattern1 MINUS pattern2 D Subst = pattern1 D pattern2 D Subst Subst pattern1 OPT pattern2 D Subst = pattern1 D pattern2 D Subst Subst pattern FILTER condition D Subst = {θ ∈ pattern D : Vars(condition) ⊂ dom(θ) Subst ∧ condition D (θ)} Bool condition1 ∧ condition2 D (θ) Bool = condition1 D (θ) ∧ condition2 D (θ) Bool Bool condition1 ∨ condition2 D (θ) Bool = condition1 D (θ) ∨ condition2 D (θ) Bool Bool ¬condition D (θ) Bool = ¬ condition D (θ) Bool BOUND(?v) D (θ) Bool = vθ nil isLITERAL(?v) D (θ) Bool = vθ ∈ L isIRI(?v) D (θ) Bool = vθ ∈ I isBLANK(?v) D (θ) Bool = vθ ∈ B ?v = literal D (θ) Bool = vθ = literal ?u = ?v D (θ) Bool = uθ = vθ ∧ uθ nil triple D (θ) Graph = tripleθ if ∀v ∈ Vars(triple) : vθ nil, otherwise template1 AND template2 D (θ) = template1 D (θ) ∪ template2 D (θ) Graph Graph Graph CONSTRUCT t WHERE p D = std(t ) D (θ) θ∈ P D Subst Graph SELECT V WHERE p D = πV ( P D ) Subst TABLE V S EMANTICS FOR SPARQL
  • 55. RDF: Semantics for the Web? Simple RDF QL? SPARQL: Simple Protocol and RDF Query Language really simple? same expressiveness as full relational algebra, PSPACE-complete but: no composition, no order → simpler than full SQL or XQuery really an RDF query language? blank node construction only limited (no quantifier alternation) talks vaguely about extension to entailment regimes but no support in SPARQL as defined no support for lean answers (justified partially by high computational cost) 40
  • 56. RDF: Semantics for the Web? SPARQL: Simple RDF QL? Complexity: SPARQL with only AND, FILTER, and UNION: NP-complete i.e., no computationally interesting subset of full relational SPJU queries full SPARQL: PSPACE-complete again no computationally interesting subset of full first-order queries why? due to negation “hidden” in OPTIONAL reduction from 3SAT using isBound to encode negation lacks completeness w.r.t. RDF transformations: relational algebra: can express all PSPACE-transformations on relations SPARQL: fails at the same for RDF 41
  • 57. RDF: Semantics for the Web? SPARQL: Simple RDF QL? J. Perez, M. Arenas, and Complexity: C. Gutierrez. Semantics and Complexity of SPARQL. SPARQL with only AND, FILTER, and UNION: NP-complete ISWC 2006 i.e., no computationally interesting subset of full relational SPJU queries full SPARQL: PSPACE-complete again no computationally interesting subset of full first-order queries why? due to negation “hidden” in OPTIONAL reduction from 3SAT using isBound to encode negation lacks completeness w.r.t. RDF transformations: relational algebra: can express all PSPACE-transformations on relations SPARQL: fails at the same for RDF 41
  • 58. Select all persons and return one blank node connected to all these persons using “member” Mary John member Joe member member Not expressible in SPARQL SPARQL limitations of blank node construction Why not? no quantifier alternation
  • 59. Select all persons and return one blank node connected to all these persons using “member” Mary John member Joe member member T. Furche, C. Ley, B. Linse, B. Marnette, and O. F. Bry, Poppe. SPARQLog: SPARQL with Rules and Quantification. In: Semantic Web Information Management: A Model-based Perspective, 2009. Not expressible in SPARQL SPARQL limitations of blank node construction Why not? no quantifier alternation
  • 60. RDF: Semantics for the Web? SPARQL: Future doesn’t hit a sweet spot (like XPath) at least from a formal, database perspective contributions from database community very rare yet a significant improvement over most previous RDF QLs already forms basis for future QL research on RDF rule extensions for RDF “From SPARQL to Rules” (Polleres, WWW 2007), Networked Graphs (Schenk et al, WWW 2008) no or limited blank node support but: with rules, restricted quantification as in SPARQL (∃∀) as expressive as unrestricted quantifier alternation 43
  • 61. RDF: Semantics for the Web? SPARQL: Future doesn’t hit a sweet spot (like XPath) at least from a formal, database perspective contributions from database community very rare yet a significant improvement over most previous RDF QLs already forms basis for future QL research on RDF rule extensions for RDF “From SPARQL to Rules” (Polleres, WWW 2007), Networked Graphs (Schenk et al, WWW 2008) no or limited blank node support but: with rules, restricted quantification as in SPARQL (∃∀) as expressive as unrestricted quantifier alternation F. Bry, T. Furche, C. Ley, B. Linse, B. Marnette, and O. Poppe. SPARQLog: SPARQL with Rules and Quantification. In: Semantic Web Information Management: A Model-based Perspective, 2009. 43
  • 62. RDF: Semantics for the Web? SPARQL: Summary 1 syntax for first-order queries on RDF foundation for research but little 2 innovation in the language itself 3 lackluster support for RDF specifics 44
  • 63. RDF: Semantics for the Web? SPARQL: Summary 1 syntax for first-order queries on RDF J. Perez, M. Arenas, and foundation for research but little C. Gutierrez. Semantics and 2 innovation in the language itself Complexity of SPARQL. ISWC 2006 3 lackluster support for RDF specifics 44
  • 64. 45
  • 65. Part 2 Keyword Queries Classification main issues 46
  • 66. 47
  • 67. Queries as Keywords Main Characteristics Queries are (mostly) unstructured bags of words Used in general purpose web search engines but also elsewhere (Amazon, Facebook,...) Implicit conjunctive semantics (with limitations) Often combined with IR techniques ranking, fuzzy matching Research focuses on application to semi-structured data general structured data (Web objects tables, relations) 48
  • 68. Queries as Keywords Why Keyword Qs for Structured Data Success in other areas Users are already familiarized with the paradigm Allow casual users to query structured data without having deep knowledge of the query language the structure schema of the data Enable querying of heterogeneous data 49
  • 69. Queries as Keywords Classification of Keyword QLs Data type XML RDF Implementation stand-alone systems translation to conventional query language keyword-enhanced query languages 50
  • 70. Queries as Keywords Classification of Keyword QLs Complexity of atomic queries keyword-only label-keyword keyword-enhanced Querying of elements Values Node labels Edge labels 51
  • 71. Queries as Keywords XML Keyword QLs Stand-alone Enhancement Keyword-only 9 0 Label-keyword 2 0 Keyword-enhanced -- 3 52
  • 72. Queries as Keywords XML Keyword QLs K. Weiand, T. Furche, and F. Bry. Quo vadis, web queries? In Web4Web, 2008. Stand-alone Enhancement Keyword-only 9 0 Label-keyword 2 0 Keyword-enhanced -- 3 52
  • 73. Queries as Keywords RDF Keyword QLs Stand-alone Translation Keyword-only 2 2 Label-keyword 0 1 Keyword-enhanced -- -- 53
  • 74. Queries as Keywords RDF Keyword QLs K. Weiand, T. Furche, and F. Bry. Quo vadis, web queries? In Web4Web, 2008. Stand-alone Translation Keyword-only 2 2 Label-keyword 0 1 Keyword-enhanced -- -- 53
  • 75. Queries as Keywords XML Keyword QLs More Popular Time first XML keyword query languages are as old as RDF Familiarity with XML Complexity of RDF Graph-shaped Labeled Edges Blank nodes 54
  • 76. Queries as Keywords XML Keyword QLs More Popular Time first XML keyword query languages are as old as RDF Familiarity with XML Complexity of RDF Graph-shaped Labeled Edges Blank nodes 54
  • 77. Queries as Keywords Focus Issues 1. Grouping keyword matches 2. Determining answer representations 3. Expressive power 4. Ranking 5. Limitations 55
  • 78. 2.1 Computing Query Answers Turning keyword matches into answers 56
  • 79. Computing Query Answers Problem Setting Query K yields match lists L1 ... L|K| Answer sets S1 … Sm are constructed from L may contain either only one match per match list (m = |K|) or several (m ≥ |K|) Data-centric vs. document-centric XML 57
  • 80. Computing Query Answers K ={Smith, web}, Problem Setting L1 ={11}, L 2 ={3,23}, S1 ={11,3}, S 2 ={11,23} (1) bib (2) (13) article article (3) (4) (5) (12) (14) (15) (16) (23) title year authors journal title year authors journal (6) (9) (17) (20) ...Semantic Web... 2005 Computer Journal ...XML... 2003 Web Journal author author author author (7) (8) (10) (11) (18) (19) (21) (22) first last first last first last first last John Doe Mary Smith Peter Jones Sue Robinson 58
  • 81. Computing Query Answers Problem Setting Entire XML document usually too big to serve as a good query answer Matched nodes alone usually not informative Meaningful results a smaller unit has to be found 59
  • 82. Computing Query Answers Approaches 1. Determine entities based on the schema only keyword matches within one entity or apply LCA at entity-level done manually or using schema partitioning with cardinality as criterion Lowest Entity Node, Minimal Information Unit, XSeek... 2. Determine entities by connecting keyword matches Answer entity: contains (at least) one match for each keyword 60
  • 83. Computing Query Answers Connecting Keyword Matches Classic concept: Lowest Common Ancestor (LCA) Use LCA to find the maximally specific concept that is common to the keyword matches LCA: Lowest node that is ancestor to all elements in an answer set 61
  • 84. Computing Query Answers Lowest Common Ancestor (LCA) LCA(S1={3,11}) = 2 (1) bib (2) (13) article article (3) (4) (5) (12) (14) (15) (16) (23) title year authors journal title year authors journal (6) (9) (17) (20) ...Semantic Web... 2005 Computer Journal ...XML... 2003 Web Journal author author author author (7) (8) (10) (11) (18) (19) (21) (22) first last first last first last first last John Doe Mary Smith Peter Jones Sue Robinson 62
  • 85. Computing Query Answers Lowest Common Ancestor (LCA) LCA(S2={11,23}) = 1 — False positive (1) bib (2) (13) article article (3) (4) (5) (12) (14) (15) (16) (23) title year authors journal title year authors journal (6) (9) (17) (20) ...Semantic Web... 2005 Computer Journal ...XML... 2003 Web Journal author author author author (7) (8) (10) (11) (18) (19) (21) (22) first last first last first last first last John Doe Mary Smith Peter Jones Sue Robinson 63
  • 86. 2.1.1 Improving LCA Approaches to improve LCA for computing answer entities 64
  • 87. Computing Query Answers Lowest Common Ancestor (LCA) Many approaches to improve over LCA SLCA, MLCA, Interconnection Semantics, VLCA, Amoeba Join... Filter LCAs to avoid false positives Result is subset of LCA result 65
  • 88. Improving LCA Overview of Approaches 1. Interconnection relationship 2. XKSearch: Smallest LCA 3. Meaningful LCA (MLCA) 4. Amoeba Join 5. (Valuable LCA (VLCA), Compact LCA, CVLCA) 6. (Relaxed Tightest Fragment (RTF)) 7. XRank: Exclusive LCA 66
  • 89. Improving LCA Interconnection Relationship Idea: Two different nodes with the same label correspond to different entities of the same type. Two nodes are interconnected if, on the shortest path between them, every node label occurs only once Interconnection in answer sets: star-related: a node in Si interconnected with all nodes in Si all-pairs related: all nodes in Si pair-wise interconnected 67
  • 90. Improving LCA Interconnection Relationship Idea: Two different nodes with the same label correspond to different entities of the same type. Two nodes are interconnected if, on the shortest path between them, every node label occurs only once Interconnection in answer sets: star-related: a node in Si interconnected with all nodes in Si all-pairs related: all nodes in Si pair-wise interconnected S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv. XSearch: A Semantic Search Engine for XML. VLDB 2003 67
  • 91. Improving LCA Interconnection Relationship (1) bib (2) (13) article article (3) (4) (5) (12) (14) (15) (16) (23) title year authors journal title year authors journal (6) (9) (17) (20) ...Semantic Web... 2005 Computer Journal ...XML... 2003 Web Journal author author author author (7) (8) (10) (11) (18) (19) (21) (22) first last first last first last first last John Doe Mary Smith Peter Jones Sue Robinson 68
  • 92. Improving LCA Interconnection Relationship (1) bib (2) (13) article article (3) (4) (5) (12) (14) (15) (16) (23) title year authors journal title year authors journal (6) (9) (17) (20) ...Semantic Web... 2005 Computer Journal ...XML... 2003 Web Journal author author author author (7) (8) (10) (11) (18) (19) (21) (22) first last first last first last first last John Doe Mary Smith Peter Jones Sue Robinson 69
  • 93. Improving LCA Interconnection Relationship For |K| = 2, star-related ≡ all-pairs related Consider S1={3, 8, 11} 70
  • 94. Improving LCA Interconnection Relationship 3 and 8 are interconnected (1) bib (2) (13) article article (3) (4) (5) (12) (14) (15) (16) (23) title year authors journal title year authors journal (6) (9) (17) (20) ...Semantic Web... 2005 Computer Journal ...XML... 2003 Web Journal author author author author (7) (8) (10) (11) (18) (19) (21) (22) first last first last first last first last John Doe Mary Smith Peter Jones Sue Robinson 71
  • 95. Improving LCA Interconnection Relationship ...so are 3 and 11 (1) bib (2) (13) article article (3) (4) (5) (12) (14) (15) (16) (23) title year authors journal title year authors journal (6) (9) (17) (20) ...Semantic Web... 2005 Computer Journal ...XML... 2003 Web Journal author author author author (7) (8) (10) (11) (18) (19) (21) (22) first last first last first last first last John Doe Mary Smith Peter Jones Sue Robinson 72
  • 96. Improving LCA Interconnection Relationship ...but 8 and 11 are not (1) bib (2) (13) article article (3) (4) (5) (12) (14) (15) (16) (23) title year authors journal title year authors journal (6) (9) (17) (20) ...Semantic Web... 2005 Computer Journal ...XML... 2003 Web Journal author author author author (7) (8) (10) (11) (18) (19) (21) (22) first last first last first last first last John Doe Mary Smith Peter Jones Sue Robinson 73
  • 97. Improving LCA Interconnection Relationship S1={3, 8, 11} is star-related but not all-pairs related False negative when using all-pairs relatedness Consider S1={3,7,9} 74
  • 98. Improving LCA Interconnection Relationship (1) bib (2) (11) article article (3) (4) (5) (10) (12) (13) (14) (19) name year authors journal title year authors journal (6) (8) (15) (17) ...Semantic Web... 2005 Computer Journal ...XML... 2003 Web Journal author author author author (7) (9) (16) (18) name name name name John Doe Mary Smith Peter Jones Sue Robinson 75
  • 99. Improving LCA Interconnection Relationship (1) bib (2) (11) article article (3) (4) (5) (10) (12) (13) (14) (19) name year authors journal title year authors journal (6) (8) (15) (17) ...Semantic Web... 2005 Computer Journal ...XML... 2003 Web Journal author author author author (7) (9) (16) (18) name name name name John Doe Mary Smith Peter Jones Sue Robinson 76
  • 100. Improving LCA Interconnection Relationship (1) bib (2) (11) article article (3) (4) (5) (10) (12) (13) (14) (19) name year authors journal title year authors journal (6) (8) (15) (17) ...Semantic Web... 2005 Computer Journal ...XML... 2003 Web Journal author author author author (7) (9) (16) (18) name name name name John Doe Mary Smith Peter Jones Sue Robinson 77
  • 101. Improving LCA Interconnection Relationship S1={3,7,9} is a false negative in both measures False positives for both when node labels are different but refer to similar concepts e.g. “article” vs. “book” vs. “text” False negatives when node labels are identical but refer to different concepts e.g. the name of a person vs. the name of a journal 78
  • 102. Improving LCA XKSearch: Smallest LCA (SLCA) Idea: Enhance LCA with a minimality constraint SLCA nodes are LCAs which do not have LCA nodes among their descendants Similar to XRank/ELCA but stricter 79
  • 103. Improving LCA XKSearch: Smallest LCA (SLCA) Idea: Enhance LCA with a minimality constraint SLCA nodes are LCAs which do not have LCA nodes among their descendants Similar to XRank/ELCA but stricter Y. Xu and Y. Papakonstantinou. Efficient Keyword Search for Smallest LCAs in XML Databases. SIGMOD 2005 79
  • 104. Improving LCA XKSearch: Smallest LCA (SLCA) (1) bib (2) (13) article article (3) (4) (5) (12) (14) (15) (16) (23) title year authors journal title year authors journal (6) (9) (17) (20) ...Semantic Web... 2005 Computer Journal ...XML... 2003 Web Journal author author author author (7) (8) (10) (11) (18) (19) (21) (22) first last first last first last first last John Doe Mary Smith Peter Jones Sue Robinson 80
  • 105. Improving LCA XKSearch: Smallest LCA (SLCA) No false positive (1) bib (2) (13) (1035) ... article article article (3) (4) (5) (12) (14) (15) (16) (23) (1038) ... ... title year authors journal title year authors journal authors (6) (9) (17) (20) (1039) ...Semantic Web... 2005 Computer Journal ...XML... 2003 Web Journal author author author author author (7) (8) (10) (11) (18) (19) (21) (22) (1040) (1041) first last first last first last first last first last John Doe Mary Smith Peter Jones Sue Robinson Tom Smith 81
  • 106. Improving LCA XKSearch: Smallest LCA (SLCA) But of course... (1) bib (2) (13) article article (3) (4) (5) (12) (14) (15) (16) (23) title year authors journal title year authors journal (6) (9) (17) (20) ...Semantic Web... 2005 Computer Journal ...XML... 2003 Web Journal author author author author (7) (8) (10) (11) (18) (19) (21) (22) first last first last first last first last John Doe Mary Smith Peter Jones Sue Robinson 82

Editor's Notes

  1. Web search as Google or Yahoo!
  2. discovering information among the vast amount available on the Web, operate on (nearly) all the Web operate on all kinds of Web documents at the same time
  3. very simple interface (usually just a bag of words), many casual users are familiar limited to &amp;#xFB01;ltering &amp; ranking relevant documents (for human consumption) search request are often rather vaguely related to the search intent and, at best, a ranking can be provided highly parallel (and thus scalable) evaluation of Web searches requires a human in the loop to ultimately judge the relevance of a search result
  4. that&amp;#x2019;s search, but what about &amp;#x201C;querying&amp;#x201D;
  5. allow automated processing, aggregation, and deduction of data
  6. allow automated processing, aggregation, and deduction of data
  7. allow automated processing, aggregation, and deduction of data
  8. really? If you believe that I have some bank stocks
  9. database-style queries on Web (mostly XML or RDF) data operate on Web data formats such as XML and RDF, the presumptive foundation for the Semantic Web (a small set of ) documents, operate on small fraction of the Web&amp;#x2BC;s data require knowledge of the language itself as well as of the data to be queried require signi&amp;#xFB01;cant training to be employed effectively, comparable to a programming task but: allow automated processing, aggregation, and deduction of data precise selection of data items in Web documents scale not better than traditional SQL database technology, and thus are clearly unable to process signi&amp;#xFB01;cant subsets of all Web data
  10. database-style queries on Web (mostly XML or RDF) data operate on Web data formats such as XML and RDF, the presumptive foundation for the Semantic Web (a small set of ) documents, operate on small fraction of the Web&amp;#x2BC;s data require knowledge of the language itself as well as of the data to be queried require signi&amp;#xFB01;cant training to be employed effectively, comparable to a programming task but: allow automated processing, aggregation, and deduction of data precise selection of data items in Web documents scale not better than traditional SQL database technology, and thus are clearly unable to process signi&amp;#xFB01;cant subsets of all Web data
  11. database-style queries on Web (mostly XML or RDF) data operate on Web data formats such as XML and RDF, the presumptive foundation for the Semantic Web (a small set of ) documents, operate on small fraction of the Web&amp;#x2BC;s data require knowledge of the language itself as well as of the data to be queried require signi&amp;#xFB01;cant training to be employed effectively, comparable to a programming task but: allow automated processing, aggregation, and deduction of data precise selection of data items in Web documents scale not better than traditional SQL database technology, and thus are clearly unable to process signi&amp;#xFB01;cant subsets of all Web data
  12. sort by score
  13. hierarchical databases have been there before? -&gt; what&amp;#x2019;s new? declarative access/query languages
  14. XPath axes and their efficient evaluation already are and will remain one of the chief outcomes of XML reasearch
  15. trotz hoher Funktionalit&amp;#xE4;t, einfache aber m&amp;#xE4;chtig &amp; = combiniert mit
  16. example of composition is still an easy one, hard (but too long for this talk) are construction of intermediary trees
  17. Dec 2008/Jan 2009 spike in XQuery news: release of new XQuery 1.1 working draft?
  18. Dec 2008/Jan 2009 spike in XQuery news: release of new XQuery 1.1 working draft?
  19. Dec 2008/Jan 2009 spike in XQuery news: release of new XQuery 1.1 working draft?
  20. @TODO: &amp;#x201C;query/results&amp;#x201D; ?