SlideShare una empresa de Scribd logo
1 de 35
Index Structures and Top-k Joins for Native Keyword
Search Databases
Günter Ladwig, Thanh Tran
Conference on Information and Knowledge Management (CIKM2011)


Institute of Applied Informatics and Formal Description Methods (AIFB)




KIT – University of the State of Baden-Württemberg and
National Large-scale Research Center of the Helmholtz Association        www.kit.edu
Contents

       Introduction:
               Native keyword search
               Contributions
       Index Structures
               d-length 2-Hop Cover
               Path indexes
       Keyword Query Processing
               Integrated Query Plan
               Operator Ranking
       Evaluation
       Conclusion


2    October 25th, 2011   CIKM 2011, Glasgow   Institute of Applied Informatics and Formal Description Methods (AIFB)
Keyword Search on Graph-Structured Data
                                  “john”
                                                     “2009”
                 “acme”

                                                                         Queries
                                           “steve”
                    “mary”                                                  “steve 2009”
                                                                            “john steve alice”
                                           “2009”
          “2009”
                          “alice”



       Keyword queries over structured data
       Approaches
               Query translation (based on schema exploration)
               Native keyword search (based on data graph exploration)


3    October 25th, 2011   CIKM 2011, Glasgow                  Institute of Applied Informatics and Formal Description Methods (AIFB)
Native Keyword Search
                                  “john”                                                       Queries
                                                     “2009”                                       “steve 2009”
                 “acme”
                                                                                                  “john steve alice”
                                                          “john”
                                                                    “2009”
                                           “steve”
                    “mary”

                                                                                                                   “steve”
                                           “2009”                                    “mary”
          “2009”                                              “steve”
                          “alice”
                                                                                                                    “2009”


       Match keywords to elements of the data graphs
       Find structures connecting these elements (Steiner graphs)
               More expensive than query translation approaches
       Preprocess data to reduce online effort

4    October 25th, 2011   CIKM 2011, Glasgow                       Institute of Applied Informatics and Formal Description Methods (AIFB)
Native Keyword Search: EASE

       Indexes at the level of r-maximal subgraphs
               Given keyword query find relevant subgraphs using index
               Explore subgraphs to construct Steiner graphs
                   “john”                                                                                 “john”
                                “2009”                                                                                     “2009”
                                                      “john”
“acme”
                                                                          Query
                                                                          “steve 2009”
                                             “mary”         “steve”                                             “steve”
    “mary”                “steve”
                                                                            Exploration

                                                                                                                                       “steve”
                          “2009”                               “2009”                                     “mary”
    “alice”                             “2009”
                                                  “alice”

               High redundancy                                                                                                           “2009”

               Requires special operations: exploration, pruning

5    October 25th, 2011     CIKM 2011, Glasgow                          Institute of Applied Informatics and Formal Description Methods (AIFB)
Native Keyword Search using Top-k Joins

       Fine-grained indexing at the level of paths
                          “john”                       “john”

“steve”                               “2009”                “steve”
                          “mary”                                       Query                          “steve” “john”                “2009”
                                                                       “steve 2009”

                                                                               Joins
                                        “mary”
                “2009”                            “steve”                                            “steve” “mary” “2009”


       More pruning, less redundancy: less storage required
       Enables use of database query processing concepts
               Data access and top-k joins
               Keyword search is now a “traditional” query processing problem


6    October 25th, 2011      CIKM 2011, Glasgow                       Institute of Applied Informatics and Formal Description Methods (AIFB)
Contributions

       We propose a new processing strategy for the keyword
       search problem based on standard database operations
       data access and join
       For efficient data access we extend the 2-hop cover to pre-
       compute and materialize neighborhoods of data
       elements, indexing the data at the level of paths
       Keyword search requires consideration of a large number
       of query plans: push-based top-k join procedure ranks
       query plans during processing




7    October 25th, 2011   CIKM 2011, Glasgow   Institute of Applied Informatics and Formal Description Methods (AIFB)
INDEX STRUCTURES


8   October 25th, 2011   CIKM 2011, Glasgow   Institute of Applied Informatics and Formal Description Methods (AIFB)
d-length 2-Hop Cover

       Compact representation of connections in a graph
               Used to find paths between two nodes
               Extension of 2-Hop Cover to store only paths of length d or less
       2-Hop Cover labels all nodes u with neighborhood NBu
               If two nodes u,v are connected via paths of length d or less then

               All paths of length d or less between center nodes u and v are of
               the form

               w is called a hop node
       Construction prunes redundant entries from
       neighborhoods to reduce size of the cover


9    October 25th, 2011   CIKM 2011, Glasgow       Institute of Applied Informatics and Formal Description Methods (AIFB)
Finding Paths Using Joins

        To find paths between two nodes u and v
                Retrieve neighborhoods NBu and NBv
                Intersect NBuand NBv to obtain all hop nodes
                Reconstruct paths between u and v through hop nodes

                                                   “steve”   “steve”                                                     hop node
                                                                                           “2009”

                   “2009”          “mary”                                              “john”
                                                             “mary”                                             center node

                    “alice”
                                                                                                              “acme”



        Intersection is performed as rank join
        Rank join requires input to be sorted
10    October 25th, 2011      CIKM 2011, Glasgow                       Institute of Applied Informatics and Formal Description Methods (AIFB)
Index Storage

        Pruned neighborhoods are stored as path entries
        Path entry (w,s) for each hop node w in NBu

        Path entry index maps nodes to its                                    Node Path Entries

        path entries (sorted)                                                                     (w1, 1.0)
                                                                                 u1               (w2, 2.0)
                                                                                                  (w3, 2.0)
        Path index                                                               u2               (w5, 1.0)
                Stores paths for all center nodes and                                           …
                their path entries
                Used to reconstruct paths




11    October 25th, 2011   CIKM 2011, Glasgow       Institute of Applied Informatics and Formal Description Methods (AIFB)
KEYWORD QUERY
       PROCESSING

12   October 25th, 2011   CIKM 2011, Glasgow   Institute of Applied Informatics and Formal Description Methods (AIFB)
Keyword Query Processing

        Use joins to find connections between matching elements
        for all keywords
        Base inputs: keyword neighborhood for each keyword
                Union of matching elements’ neighborhoods
        Process
                Data access to retrieve keyword
                neighborhoods
                Joins to connect keyword matching
                elements

                                                               steve                     john                    alice
        Are all possible plans valid?



13    October 25th, 2011   CIKM 2011, Glasgow       Institute of Applied Informatics and Formal Description Methods (AIFB)
Query Plans

                                                                                    “john”

      No results!
                                                        d=2
                                                                                               “steve”




           alice              john              steve                     “alice”


        Join order matters
                No single join order delivers all results (some might even be empty)
                We do not know in advance which orders deliver results
        Consider all possible join orders

14    October 25th, 2011   CIKM 2011, Glasgow                 Institute of Applied Informatics and Formal Description Methods (AIFB)
Integrated Query Plan

        Join operators in all query plans:
        Query plans for different join orders overlap
                Share as many operators as possible
        Join operators with sharing:




                      |K|         N’(K)          N(|K|, K)
                       2          2              1
                       3          12             6
                       4          72             24
                       5          480            100

15    October 25th, 2011    CIKM 2011, Glasgow               Institute of Applied Informatics and Formal Description Methods (AIFB)
Top-k Keyword-Join Processing

        High number of operators
        Terminate early after computing top-k instead of all results
                Rank join operators
                Top-k union operator
        Integrated Query Plan is a composition of many sub-plans
        Some sub-plans might produce no results
                Pull-based operators will block until result can be produced
                Use push-based operators: execution driven by inputs instead of
                results
        Some sub-plans might produce results earlier than others
                Rank not only results, but also rank operators



16    October 25th, 2011   CIKM 2011, Glasgow      Institute of Applied Informatics and Formal Description Methods (AIFB)
Operator Ranking

        Prefer operators that have “promising” results
        Global score of rank join operator, based on current results
        and upper bounds for subsequent join operations
                R: intermediate results
                NBK: keyword neighborhoods not yet covered
                Global score defined as

                Join operators have a global score when they have results ready
        Only the operator with the highest global score can push
        results to subsequent operators
        Otherwise, lower level data access operators are activated


17    October 25th, 2011   CIKM 2011, Glasgow      Institute of Applied Informatics and Formal Description Methods (AIFB)
EVALUATION


18   October 25th, 2011   CIKM 2011, Glasgow   Institute of Applied Informatics and Formal Description Methods (AIFB)
Evaluation

        Four approaches
                EASE: indexing at the level of graphs
                KJ: keyword join approach
                KJU: keyword join approach without operator ranking
        Datasets
                BTC: 10M triples
                DBLP1/5/10: 1M, 5M, 10M triples (from SP2Bench)
        9 keyword queries for each dataset

        Reduction of index storage size
                50% (DBLP1) – 79% (DBLP10)



19    October 25th, 2011   CIKM 2011, Glasgow     Institute of Applied Informatics and Formal Description Methods (AIFB)
Results




        KJ, KJU outperform EASE
        Operator ranking is beneficial

20    October 25th, 2011   CIKM 2011, Glasgow   Institute of Applied Informatics and Formal Description Methods (AIFB)
Results




        Benefit of operator ranking more pronounced for larger
        queries as these need more join operators

21    October 25th, 2011   CIKM 2011, Glasgow   Institute of Applied Informatics and Formal Description Methods (AIFB)
Conclusion

        Native keyword search based on data access and join
        d-length 2-Hop Cover
                Index at the level of paths, instead of graphs
        Top-k Keyword Join
                Exploration transformed into series of join operators
                Operator ranking
        Reduces storage requirement and increases performance




22    October 25th, 2011   CIKM 2011, Glasgow        Institute of Applied Informatics and Formal Description Methods (AIFB)
Thank you for your attention! Questions?



         Günter Ladwig, guenter.ladwig@kit.edu


23   October 25th, 2011   CIKM 2011, Glasgow     Institute of Applied Informatics and Formal Description Methods (AIFB)
BACKUP SLIDES


24   October 25th, 2011   CIKM 2011, Glasgow   Institute of Applied Informatics and Formal Description Methods (AIFB)
Introduction

        Keyword search on graph-structured data (RDF)
        Query Translation
                Translate keywords into structured query using schema knowledge
        Native Keyword Search
                No translation
                Match keywords to elements of the data graphs
                Find structures connecting these elements (Steiner graphs)
                More expensive than query translation approaches


        Preprocess data and create special indexes
                Reduces search space during online query processing
                Requires offline preprocessing and storage


25    October 25th, 2011   CIKM 2011, Glasgow      Institute of Applied Informatics and Formal Description Methods (AIFB)
Example                                    Query: “alice malta peter”


                  Malta               l1                                                      l1                      Malta

                                         locatedIn                                locatedIn


           ABC Corp                  o1                                                      o2                      ABC Corp
                                                 worksAt
                                                                                        worksAt
                            worksAt
                                           knows
                                                           p3   knows   p2
                                                                                     knows

                   Alice             p4                                                      p1                       Richard
                                                      Peter             Mary

        Match keyword elements
        Find connections between keyword elements




26    October 25th, 2011   CIKM 2011, Glasgow                            Institute of Applied Informatics and Formal Description Methods (AIFB)
Problem Definition

        Given a graph GE=(NE,ER)
        Find Steiner graphs connection keyword elements




27    October 25th, 2011   CIKM 2011, Glasgow   Institute of Applied Informatics and Formal Description Methods (AIFB)
Scoring

        Assumption: more compact Steiner graphs are more
        relevant
        Scoring function
                GS: Steiner graph
                P: set of paths connecting its keyword elements



        Other functions possible, but not part of this work




28    October 25th, 2011   CIKM 2011, Glasgow      Institute of Applied Informatics and Formal Description Methods (AIFB)
Approaches

        Bidirectional Search
                Explore graph from keyword elements to find connections
                Does not scale well
        EASE
                Indexes neighborhood graphs to restrict search space for
                exploration
        Our approach
                Use database operations: data access and join
                Transform graph exploration into a series of join operations
                Improves storage requirements and performance




29    October 25th, 2011   CIKM 2011, Glasgow       Institute of Applied Informatics and Formal Description Methods (AIFB)
d-Length 2-Hop Cover

        Preliminaries




        Compact representation of connections in a graph
                Used to find paths between two nodes in a graph




30    October 25th, 2011   CIKM 2011, Glasgow     Institute of Applied Informatics and Formal Description Methods (AIFB)
Construction

        Trivial d-length 2-hop cover is the set of all d-
        neighborhoods of GE, but contains redundancies
        Finding a minimal 2-hop cover is NP-hard (Minimum Set
        Cover)
        Approximation algorithm
                Select a “best” node covering a large amount of paths
                Use its neighborhood to prune redundant paths from all other
                neighborhoods




31    October 25th, 2011   CIKM 2011, Glasgow      Institute of Applied Informatics and Formal Description Methods (AIFB)
Example: Pruning
                                                    center node
     d=2                             p3                          hop node                          p2
                                                 knows                                                          knows
                       knows                                       prune        worksAt
                                        worksAt                                                       knows

                      p4             o1              p2                       o2                                       p3
                                                                                                   p1

                    locatedIn                                                     locatedIn
                                     worksAt             knows                                     worksAt                 knows


                      l1             o1              p1                        l2                  o1                  p4

        Pruned paths between two nodes can be reconstructed by
        intersecting their neighborhoods
        Store each pruned neighborhood as a list of path entries


32    October 25th, 2011    CIKM 2011, Glasgow                              Institute of Applied Informatics and Formal Description Methods (AIFB)
Neighborhood Join

                                                                                                                     hop node
                                                      o1         o1                                o3

                                                                                                            center node
                       l1             p4                         p4                  p2


                                                      p3         p3                                 l2



      Result: Keyword Graphs
                                                 p4        o1         p2                    stands for all paths of
                                                                                            length d between p4 and
                                                                                            p2 through o1
                                                 p4        p3         p2

                                                           ...

33    October 25th, 2011    CIKM 2011, Glasgow                   Institute of Applied Informatics and Formal Description Methods (AIFB)
Graph Join

        Expand keyword graphs to keyword graph neighborhoods

     Keyword Graph                                        Keyword Graph Neighborhood
     p4                    o1               p2                  p4                o1                    p2                  o3

                                                                p4                o1                    p2                   l2

                                                     l1         p4                o1                    p2

                                                                                   ...

        Graph Join: joins keyword graph neighborhood with
        keyword neighborhood


34    October 25th, 2011        CIKM 2011, Glasgow               Institute of Applied Informatics and Formal Description Methods (AIFB)
Integrated Query Plan

        Number of join operators without operator sharing

        Number of join operators with operator sharing




                                                        |K|              N’(K)                  N(|K|, K)
                                                          2              2                      1
                                                          3              12                     6
                                                          4              72                     24
                                                          5              480                    100


35    October 25th, 2011   CIKM 2011, Glasgow   Institute of Applied Informatics and Formal Description Methods (AIFB)

Más contenido relacionado

Destacado

поляризация диэлектриков
поляризация диэлектриковполяризация диэлектриков
поляризация диэлектриков
AndronovaAnna
 
Semantic Search Tutorial at SemTech 2012
Semantic Search Tutorial at SemTech 2012 Semantic Search Tutorial at SemTech 2012
Semantic Search Tutorial at SemTech 2012
Thanh Tran
 
Lifecycle support in architectures for ontology-based information systems - iswc
Lifecycle support in architectures for ontology-based information systems - iswcLifecycle support in architectures for ontology-based information systems - iswc
Lifecycle support in architectures for ontology-based information systems - iswc
Thanh Tran
 

Destacado (10)

KOIOS: Utilizing Semantic Search for Easy-Access and Visualization of Structu...
KOIOS: Utilizing Semantic Search for Easy-Access and Visualization of Structu...KOIOS: Utilizing Semantic Search for Easy-Access and Visualization of Structu...
KOIOS: Utilizing Semantic Search for Easy-Access and Visualization of Structu...
 
Graphinder semantic search
Graphinder semantic searchGraphinder semantic search
Graphinder semantic search
 
Searching Linked Data
Searching Linked DataSearching Linked Data
Searching Linked Data
 
поляризация диэлектриков
поляризация диэлектриковполяризация диэлектриков
поляризация диэлектриков
 
Query Processing Using Structure Index for RDF Data on the Web
Query Processing Using Structure Index for RDF Data on the WebQuery Processing Using Structure Index for RDF Data on the Web
Query Processing Using Structure Index for RDF Data on the Web
 
Semantic Web Search - Searching Documents and Semantic Data on the Web
Semantic Web Search - Searching Documents and Semantic Data on the WebSemantic Web Search - Searching Documents and Semantic Data on the Web
Semantic Web Search - Searching Documents and Semantic Data on the Web
 
Big data search
Big data search Big data search
Big data search
 
Semantic Search Tutorial at SemTech 2012
Semantic Search Tutorial at SemTech 2012 Semantic Search Tutorial at SemTech 2012
Semantic Search Tutorial at SemTech 2012
 
Гастро-тур в Италию
Гастро-тур в ИталиюГастро-тур в Италию
Гастро-тур в Италию
 
Lifecycle support in architectures for ontology-based information systems - iswc
Lifecycle support in architectures for ontology-based information systems - iswcLifecycle support in architectures for ontology-based information systems - iswc
Lifecycle support in architectures for ontology-based information systems - iswc
 

Index Structures and Top-k Joins for Native Keyword Search Databases

  • 1. Index Structures and Top-k Joins for Native Keyword Search Databases Günter Ladwig, Thanh Tran Conference on Information and Knowledge Management (CIKM2011) Institute of Applied Informatics and Formal Description Methods (AIFB) KIT – University of the State of Baden-Württemberg and National Large-scale Research Center of the Helmholtz Association www.kit.edu
  • 2. Contents Introduction: Native keyword search Contributions Index Structures d-length 2-Hop Cover Path indexes Keyword Query Processing Integrated Query Plan Operator Ranking Evaluation Conclusion 2 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 3. Keyword Search on Graph-Structured Data “john” “2009” “acme” Queries “steve” “mary” “steve 2009” “john steve alice” “2009” “2009” “alice” Keyword queries over structured data Approaches Query translation (based on schema exploration) Native keyword search (based on data graph exploration) 3 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 4. Native Keyword Search “john” Queries “2009” “steve 2009” “acme” “john steve alice” “john” “2009” “steve” “mary” “steve” “2009” “mary” “2009” “steve” “alice” “2009” Match keywords to elements of the data graphs Find structures connecting these elements (Steiner graphs) More expensive than query translation approaches Preprocess data to reduce online effort 4 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 5. Native Keyword Search: EASE Indexes at the level of r-maximal subgraphs Given keyword query find relevant subgraphs using index Explore subgraphs to construct Steiner graphs “john” “john” “2009” “2009” “john” “acme” Query “steve 2009” “mary” “steve” “steve” “mary” “steve” Exploration “steve” “2009” “2009” “mary” “alice” “2009” “alice” High redundancy “2009” Requires special operations: exploration, pruning 5 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 6. Native Keyword Search using Top-k Joins Fine-grained indexing at the level of paths “john” “john” “steve” “2009” “steve” “mary” Query “steve” “john” “2009” “steve 2009” Joins “mary” “2009” “steve” “steve” “mary” “2009” More pruning, less redundancy: less storage required Enables use of database query processing concepts Data access and top-k joins Keyword search is now a “traditional” query processing problem 6 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 7. Contributions We propose a new processing strategy for the keyword search problem based on standard database operations data access and join For efficient data access we extend the 2-hop cover to pre- compute and materialize neighborhoods of data elements, indexing the data at the level of paths Keyword search requires consideration of a large number of query plans: push-based top-k join procedure ranks query plans during processing 7 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 8. INDEX STRUCTURES 8 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 9. d-length 2-Hop Cover Compact representation of connections in a graph Used to find paths between two nodes Extension of 2-Hop Cover to store only paths of length d or less 2-Hop Cover labels all nodes u with neighborhood NBu If two nodes u,v are connected via paths of length d or less then All paths of length d or less between center nodes u and v are of the form w is called a hop node Construction prunes redundant entries from neighborhoods to reduce size of the cover 9 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 10. Finding Paths Using Joins To find paths between two nodes u and v Retrieve neighborhoods NBu and NBv Intersect NBuand NBv to obtain all hop nodes Reconstruct paths between u and v through hop nodes “steve” “steve” hop node “2009” “2009” “mary” “john” “mary” center node “alice” “acme” Intersection is performed as rank join Rank join requires input to be sorted 10 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 11. Index Storage Pruned neighborhoods are stored as path entries Path entry (w,s) for each hop node w in NBu Path entry index maps nodes to its Node Path Entries path entries (sorted) (w1, 1.0) u1 (w2, 2.0) (w3, 2.0) Path index u2 (w5, 1.0) Stores paths for all center nodes and … their path entries Used to reconstruct paths 11 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 12. KEYWORD QUERY PROCESSING 12 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 13. Keyword Query Processing Use joins to find connections between matching elements for all keywords Base inputs: keyword neighborhood for each keyword Union of matching elements’ neighborhoods Process Data access to retrieve keyword neighborhoods Joins to connect keyword matching elements steve john alice Are all possible plans valid? 13 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 14. Query Plans “john” No results! d=2 “steve” alice john steve “alice” Join order matters No single join order delivers all results (some might even be empty) We do not know in advance which orders deliver results Consider all possible join orders 14 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 15. Integrated Query Plan Join operators in all query plans: Query plans for different join orders overlap Share as many operators as possible Join operators with sharing: |K| N’(K) N(|K|, K) 2 2 1 3 12 6 4 72 24 5 480 100 15 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 16. Top-k Keyword-Join Processing High number of operators Terminate early after computing top-k instead of all results Rank join operators Top-k union operator Integrated Query Plan is a composition of many sub-plans Some sub-plans might produce no results Pull-based operators will block until result can be produced Use push-based operators: execution driven by inputs instead of results Some sub-plans might produce results earlier than others Rank not only results, but also rank operators 16 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 17. Operator Ranking Prefer operators that have “promising” results Global score of rank join operator, based on current results and upper bounds for subsequent join operations R: intermediate results NBK: keyword neighborhoods not yet covered Global score defined as Join operators have a global score when they have results ready Only the operator with the highest global score can push results to subsequent operators Otherwise, lower level data access operators are activated 17 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 18. EVALUATION 18 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 19. Evaluation Four approaches EASE: indexing at the level of graphs KJ: keyword join approach KJU: keyword join approach without operator ranking Datasets BTC: 10M triples DBLP1/5/10: 1M, 5M, 10M triples (from SP2Bench) 9 keyword queries for each dataset Reduction of index storage size 50% (DBLP1) – 79% (DBLP10) 19 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 20. Results KJ, KJU outperform EASE Operator ranking is beneficial 20 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 21. Results Benefit of operator ranking more pronounced for larger queries as these need more join operators 21 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 22. Conclusion Native keyword search based on data access and join d-length 2-Hop Cover Index at the level of paths, instead of graphs Top-k Keyword Join Exploration transformed into series of join operators Operator ranking Reduces storage requirement and increases performance 22 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 23. Thank you for your attention! Questions? Günter Ladwig, guenter.ladwig@kit.edu 23 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 24. BACKUP SLIDES 24 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 25. Introduction Keyword search on graph-structured data (RDF) Query Translation Translate keywords into structured query using schema knowledge Native Keyword Search No translation Match keywords to elements of the data graphs Find structures connecting these elements (Steiner graphs) More expensive than query translation approaches Preprocess data and create special indexes Reduces search space during online query processing Requires offline preprocessing and storage 25 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 26. Example Query: “alice malta peter” Malta l1 l1 Malta locatedIn locatedIn ABC Corp o1 o2 ABC Corp worksAt worksAt worksAt knows p3 knows p2 knows Alice p4 p1 Richard Peter Mary Match keyword elements Find connections between keyword elements 26 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 27. Problem Definition Given a graph GE=(NE,ER) Find Steiner graphs connection keyword elements 27 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 28. Scoring Assumption: more compact Steiner graphs are more relevant Scoring function GS: Steiner graph P: set of paths connecting its keyword elements Other functions possible, but not part of this work 28 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 29. Approaches Bidirectional Search Explore graph from keyword elements to find connections Does not scale well EASE Indexes neighborhood graphs to restrict search space for exploration Our approach Use database operations: data access and join Transform graph exploration into a series of join operations Improves storage requirements and performance 29 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 30. d-Length 2-Hop Cover Preliminaries Compact representation of connections in a graph Used to find paths between two nodes in a graph 30 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 31. Construction Trivial d-length 2-hop cover is the set of all d- neighborhoods of GE, but contains redundancies Finding a minimal 2-hop cover is NP-hard (Minimum Set Cover) Approximation algorithm Select a “best” node covering a large amount of paths Use its neighborhood to prune redundant paths from all other neighborhoods 31 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 32. Example: Pruning center node d=2 p3 hop node p2 knows knows knows prune worksAt worksAt knows p4 o1 p2 o2 p3 p1 locatedIn locatedIn worksAt knows worksAt knows l1 o1 p1 l2 o1 p4 Pruned paths between two nodes can be reconstructed by intersecting their neighborhoods Store each pruned neighborhood as a list of path entries 32 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 33. Neighborhood Join hop node o1 o1 o3 center node l1 p4 p4 p2 p3 p3 l2 Result: Keyword Graphs p4 o1 p2 stands for all paths of length d between p4 and p2 through o1 p4 p3 p2 ... 33 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 34. Graph Join Expand keyword graphs to keyword graph neighborhoods Keyword Graph Keyword Graph Neighborhood p4 o1 p2 p4 o1 p2 o3 p4 o1 p2 l2 l1 p4 o1 p2 ... Graph Join: joins keyword graph neighborhood with keyword neighborhood 34 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 35. Integrated Query Plan Number of join operators without operator sharing Number of join operators with operator sharing |K| N’(K) N(|K|, K) 2 2 1 3 12 6 4 72 24 5 480 100 35 October 25th, 2011 CIKM 2011, Glasgow Institute of Applied Informatics and Formal Description Methods (AIFB)