SlideShare una empresa de Scribd logo
1 de 46
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
Rya: Accumulo Indexing
Strategies for Searching
Semantic Networks
Dr. Caleb Meier, Puja Valiyil, David Lotts, Aaron Mihalik,
Dr. Adina Crainiceanu
00.00.00
Presenter’s NameDISTRIBUTION STATEMENT A. Approved for
public release; distribution is unlimited.
ONR Case Number 43-2117-16 EXIM APPROVED Parsons #459 8 OCT 16
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
Acknowledgements
• The work presented herein was funded by the Office of
Naval Research (ONR) and the National Geospatial-
Intelligence Agency (NGA) under contract # N00014-12-C-
0365
• This presentation was sponsored by Parsons
• This work is the collective effort of:
 Parsons’ Rya Team: Puja Valiyil, Aaron Mihalik, Caleb
Meier, David Lotts, Jennifer Brown
 Rya Founders: Roshan Punnoose, Adina Crainiceanu,
and David Rapp
1
Rya: Accumulo Indexing Strategies for Searching Semantic Networks 2
Agenda
• Rya Background
• Materialized Views in Rya
• Entity-Centric Index
Rya: Accumulo Indexing Strategies for Searching Semantic Networks 3
Agenda
•Rya
Background
• Materialized Views in Rya
• Entity-Centric Index
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• Apache Rya (incubating): Resource Description
Framework (RDF) Triplestore built on top of
Accumulo or MongoDB
• RDF: W3C standard for representing
linked/graph data
 Represents data as statements (assertions) about
resources
 Serialized as triples in {subject, predicate, object} form
 Example:
 {Caleb, worksAt, Parsons}
 {Caleb, livesIn, Virginia}
4
Rya and RDF
Background
Caleb
Parsons
Virginia
worksAt
livesIn
Rya: Accumulo Indexing Strategies for Searching Semantic Networks 5
SPARQL
Background
SELECT ?people WHERE {
?people <worksAt> <Parsons>.
?people <livesIn> <Virginia>.
}
• RDF Queries are described using SPARQL
 SPARQL Protocol and RDF Query Language
• SQL-like syntax for finding triples matching
specific patterns
 Look for subgraphs that match triple statement patterns
 Joins are performed when there are variables common to two or
more statement patterns
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• RDF4J* Interface for interacting with RDF data
stored on Accumulo
 RDF4J Open Source Java
framework for storing and
querying RDF data
 RDF4J provides several
interfaces/abstractions
central for interacting with
an RDF datastore
 SAIL interface for
interacting with underlying persisted RDF model
 SAIL: Storage And Inference Layer
6
Rya Architecture
Background
Data storage layer
Query processing in SAIL layer
SPARQL
Rya and RDF4J
Rya QueryPlanner
Accumulo
*The RDF4J.org project was previously named “Open RDF”, then “Sesame”.
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• 3 Tables
 SPO : subject, predicate, object
 POS : predicate, object, subject
 OSP : object, subject, predicate
• Store triples in the Row ID of the table
• Store graph name (context) in the Column Family
• Advantages:
 Native lexicographical sorting of row keys  fast range queries
 All patterns can be translated into a scan of one of these tables
7
Storage: Triple Table Index
Accumulo Composite Index and Table Design
Key for SPO Table
Row ID
Column Timestam
pFamily Qualifier Visibility
subject, predicate, object,
type
graph name (not used) visibility timestamp
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
Subject, Predicate,
Object
…
Bob, livesIn, California
…
Greta, livesIn, Virginia
…
John, livesIn, Virginia
…
Anatomy of a RYA SPARQL Query
Query Planning
Step 2: For each ?x, SPO
Table lookup
Subject, Predicate, Object
…
Greta, commuteMethod,
bike
…
John, commuteMethod, Bus
…
Step 3: For each ?x, SPO
Table lookup
Step 1: POS Table – scan
range for worksAt
?x worksAt ?y ?x livesIn Virginia
?x commuteMethod bike
SELECT ?x, ?y WHERE {
?x <worksAt> ?y.
?x <livesIn> Virginia.
?x <commuteMethod> bike.
}
Predicate, Object,
Subject
…
studiesAt, Joe,
Georgetown
talksTo, Joe, Bob
worksAt, Netflix, Bob
worksAt, Parsons, Greta
worksAt, PlayStation,
John
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• Out of the box, Rya SPARQL query evaluation requires
joining Accumulo scan results client side
• Joins are performed using a hybrid of nested loop and
hash joins
 Range prefixes for scans are determined by results of previous
scan
 Sorted nested loop
 Results of scan are joined with any values not used to form range
prefix
 Hash joins
• BatchScanner boosts performance
 Evaluates results in batches
 Client side join evaluation is still a bottle neck
 Especially for queries with:
 Large intermediate join results
 Large number of joins
9
Costly Joins of Accumulo Scans
Query Evaluation
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• Materialized Views in Rya: Index Frequently Issued Queries or Frequent
Subgraphs
1. Pre-compute a portion of the query
2. Store SPARQL describing the query along with pre-computed values in
Accumulo
3. Normalize query variables to match stored SPARQL variables during query
execution
• Entity-Centric Index: Add new indices to eliminate need for hash joins
1. Apply Document Partitioned Indexing to graph data
 Design tables so that all properties for each entity appear on single tablet
2. Find entities with intersecting properties
 Use a variation of an Intersecting Iterator
 Perform merge joins on the server
• Additional indexing strategies could eliminate the need for client side
joins
 Clear trade off between query performance and memory footprint
 Mitigate “data plume”
10
Indexing Strategies for Dealing With Costly Joins
Query Evaluation
Rya: Accumulo Indexing Strategies for Searching Semantic Networks 11
Agenda
• Rya Background
•Materialized
Views in Rya
• Entity-Centric Index
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• Determine which query results to cache given that
capacity is limited
• Caching Strategies
 Preference: manually submit queries to cache
 Recent use
 Complexity: number of joins, estimates of intermediate results
 Relevance
 Relevance to data (frequent subgraphs)
 Relevance to users (query logs)
• A comprehensive strategy should use all of the
criteria above
• Terms: Materialized Views and Precomputed Joins
 A Materialized View is a cache of query results
 Precomputed Joins in Rya are Materialized Views for SPARQL
queries consisting of joins and filters
12
Determining which Queries to Cache
Materialized Views in Rya
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• During query planning, statement patterns in a query are grouped
according to common variables and common constants
• Groups can be precomputed and cached, trading faster response times
for increased storage
13
Query Planning: Anatomy of a Rya SPARQL Query using a Precomputed Join
Materialized Views in Rya
?x :livesIn :Arlington
?y :talksTo ?x
?y :livesIn :DC.
?x :commutesBy :Bike
Pre-Computed Join Index
table
Select ?a, ?b where {
?a :livesIn :DC . ?a :talksTo
?b .}
?a=Joe, ?b=Caleb
?a=Mike, ?b=Dave
…
?a=Rob, ?b=Aaron
Pre-
Computed
Join Index
Node
Join
Join
Join
select ?x ?y where { ?y :livesIn :DC. ?y :talksTo ?x.
?x :livesIn :Arlington. ?x :commutesBy :Bike.}
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• Batch Update
 Re-compute precomputed join tables periodically using MapReduce
 Benefit:
 Minimizes data plume
 Drawback:
 Large possibility of stale data
 Query plans that use precomputed joins may contain inaccurate results
• Incrementally update tables as triples are ingested
 Use some sort of observer framework to update intermediate results as
new triples are ingested
 Benefit:
 No staleness in query results
 Query plans that use precomputed joins are more accurate
 Significantly less latency for updates
 Drawbacks:
 Data plume
 Intermediate query results have to be stored to incrementally update
results
 Observer framework increases the complexity of the system
14
Batch Updates
Strategies for Maintaining Materialized Views
Rya: Accumulo Indexing Strategies for Searching Semantic Networks 15
Incremental Transactions Using Apache Fluo (incubating)
Fluo Background
• Address maintenance problem by incrementally
updating cached results using Fluo
 Fluo was created for Accumulo based on the Percolator paper1
by Google Inc.
 Fluo provides additional features that Accumulo does not:
 Multi-row transactions prevent write-write conflicts
 Observer framework (next slide)
• Use cases
 Maintain large scale computation using series of small
transaction updates
 Join existing large data cache with new data
 Formerly done by periodic batch processing jobs recreating the data
cache
1. Daniel Peng, Frank Dabek. USENIX. 2010. Large-scale Incremental Processing Using Distributed Transactions and
Notifications. http://research.google.com/pubs/pub36726.html
Rya: Accumulo Indexing Strategies for Searching Semantic Networks 16
Creating an Application using the Observer Framework
Fluo Background
• Overview of the Fluo Observer Framework
 Observers monitor a given Fluo table column
 When the observed column is updated, a notification is
triggered which tells the observer to perform a
transaction
 The transaction is specified by the implementation of
the observer’s process method
 Takes in a transaction object, row and column
 Uses the data to perform an action such as writing to
another column in the table
• Perform complex incremental updates by
Chaining Observers
 Decompose problem into a collection of interacting observers
 Observers can write notifications that trigger other observers
Rya: Accumulo Indexing Strategies for Searching Semantic Networks 17
Join Observer:
Output: {?patron, ?employee,
?business}
Statement Pattern
Observer:
?patron <http://talksTo> ?employee
Output: {?patron, ?employee}
Statement Pattern Observer:
?employee <http://worksAt> ?business
Output: {?employee, ?business}
People who talk to employees and
where that employee works.
Pairs of people who talk to each
other
Where people
work
Streamed
Statements
Streamed
Statements
SPARQL Query
SELECT ?patron ?employee ?business
WHERE {
?patron <http://talksTo> ?employee.
?employee <http://worksAt> ?business
}
Formulating a SPARQL Query as a Chain of Observers
Maintaining Precomputed Joins
Rya: Accumulo Indexing Strategies for Searching Semantic Networks 18
Join Observer:
Output: {?patron, ?employee,
?business}
Statement Pattern
Observer:
?patron <http://talksTo>
?employee
Output: {?patron, ?employee}
Statement Pattern Observer:
?employee <http://worksAt> ?business
Output: {?employee, ?business}
{patron=Alice, employee=Bob,
business=CoffeeShop}
{patron=Alice,
employee=Bob},
{patron=Charlie,
employee=David}
{employee=Bob,
business=CoffeeShop},
{employee=Eve,
business=PizzaPlace}
{Alice, talksTo, Bob},
{Charlie, talksTo, David},
{Bob, worksAt,
CoffeeShop},
{Eve, worksAt,
PizzaPlace}
{Alice, talksTo, Bob},
{Charlie, talksTo, David},
{Bob, worksAt,
CoffeeShop},
{Eve, worksAt,
PizzaPlace}
Incrementally Creating Results Using Query Observers (1 of 2)
Maintaining Precomputed Joins
Rya: Accumulo Indexing Strategies for Searching Semantic Networks 19
Join Observer:
Output: {?patron, ?employee,
?business}
Statement Pattern
Observer:
?patron <http://talksTo>
?employee
Output: {?patron, ?employee}
Statement Pattern Observer:
?employee <http://worksAt> ?business
Output: {?employee, ?business}
{patron=Alice, employee=Bob,
business=CoffeeShop},
{patron=Charlie, employee=David,
business=CoffeeShop}
{patron=Alice,
employee=Bob},
{patron=Charlie,
employee=David}
{employee=Bob,
business=CoffeeShop},
{employee=Eve,
business=PizzaPlace},
{employee=David,
business=CoffeeShop}
{David, worksAt,
CoffeeShop}
{David, worksAt,
CoffeeShop}
Incrementally Creating Results Using Query Observers (2 of 2)
Maintaining Precomputed Joins
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
Adding or deleting
statements to the
repository requires
updating the
precomputed join
index table
This requires
updates to
intermediate results
within the Fluo
Table
20
Overview of Rya Fluo Application
Maintaining Precomputed Joins
Triple
Observer
Join
Observer
Filter
Observer
Statement
Pattern
Observer
Query
Result
Observer
Fluo
Rya Precomputed Join (PCJ) App
Rya Client
Insert Triples
Accumulo
PCJ Index Table
Rya: Accumulo Indexing Strategies for Searching Semantic Networks 21
System Overview
Maintaining Precomputed Joins
PCJ
Client
Statement
Stream
Rya
Client
Rya Core Tables
(SPO, POS,
OSP)
Rya PCJ Table
Fluo App Table
Accumulo
Fluo
Incremental
PCJ App
Processes
ExportsResults
Inserts Statements
Inserts Historic
SP Matches
Rya PCJ Table
Rya PCJ Table
Rya PCJ Table
1
2
3
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
PCJ
Client
PCJ
Index
Core Rya
Tables
Fluo App
Registering a New Query
Maintaining Precomputed Joins
1. Register Query
2. Scan for historic
Statement Pattern matches
4. Compute Results
5. Export Results to PCJ Index
22
3. Insert Statement
Pattern matches
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
PCJ
Client
Statement
Stream
Core Rya
Tables
Fluo App
Streaming While Registering Query
Maintaining Precomputed Joins
2. Scan for historic
Statement Pattern matches
3. Insert Statement Pattern
matches
4. Compute Results
1. Register Query
A. Write new Statement
B. Write new Statement
23
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• The results on this slide and the next were obtained using proprietary queries and data
 Q1 is the most complex and consists of 14 joins, 6 of which are left joins
 Q2 obtained from Q1 by removing two left joins and two joins
 Q3-Q5 decrease in complexity as well and are obtained using a similar process
 Q5 is similar to Q1, with all left joins replaced by joins
• 4192 results were obtained by querying a Rya Instance with 500,000 triples installed
on Parsons’ internal cluster
 8 worker nodes, each with 2 x 6 Core Xeon E5-2440 (2.4GHz) Processors and 48 GB RAM
• Table below presents results with average query time over 10 iterations with standard
deviation:
24
Benchmark results for Rya with No Precomputed Joins and No Optimizations
Materialized Views in Rya
Q Rya with one exact PCJ (s) Rya with no PCJ (s)
Q1 1.284 ± 0.047 516.774 ± 6.265
Q2 0.851 ± 0.042 345.606 ± 5.991
Q3 0.598 ± 0.026 180.663 ± 3.354
Q4 0.368 ± 0.026 63.588 ± 1.527
Q5 1.334 ± 0.074 97.101 ± 1.765
Rya: Accumulo Indexing Strategies for Searching Semantic Networks 25
Agenda
• Rya Background
• Materialized Views in Rya
•Entity-Centric
Index
Rya: Accumulo Indexing Strategies for Searching Semantic Networks 26
Facebook’s Unicorn Paper: Motivating Example
Entity-Centric Index
The problem is reduced to finding the intersection of lists:
How do I find all documents containing “dog” and “bark”?
1. View docs and terms as a graph, with edges drawn from docs to the
terms they contain
2. Efficiently represent graph as a collection of adjacency lists
bark doc4 doc5dog doc6
doc1
doc4
doc2
doc5
doc7
dog
bark
doc3 doc8
doc6
dog
bark
doc1 doc2 doc3
doc4 doc5 doc6
doc4 doc5 doc6
doc7 doc8
Adjacency lists of dog and bark
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
What if the adjacency lists are really large? The terms
“dog” and “barks” could appear in lots of documents!
• Distribute the problem by partitioning adjacency lists of
documents across servers
 Involves some type of sharding
• Each server finds intersections of smaller lists:
27
Facebook’s Unicorn Paper: Distributing the Problem
Entity-Centric Index
dog
bark
dog
bark
1 2 3
4 5 6
4 5 …
7 8 …
Server 1
ShardID = 0
Server 2
ShardID = 1
ShardID = (doc num)%3
Server 3
ShardID = 2
3 6 ...
6 ...
1 4 ...
4 7 ...
2 5 ...
5 8 ...
dog
dog
bark
bark
Rya: Accumulo Indexing Strategies for Searching Semantic Networks 28
Unicorn Applied to Accumulo
Entity-Centric Index
Accumulo Key
Row: doc shard Column
CF: term CQ: document id
• Unicorn Framework outlines the basis for a distributed
document partitioned index
• Accumulo has a framework1 in place for creating this index
 Uses IndexedDocIterator which is an extension of an
IntersectingIterator
• Uses the following key design:
1. Accumulo: Application Development, Table Design, and Best Practices, Cordova A., Rinaldi B.,
Wall M., O’Reilly 2015
Rya: Accumulo Indexing Strategies for Searching Semantic Networks 29
RowID ColF ColQ
0 bark 6
0 dog 3
0 dog 6
RowID ColF ColQ
1 bark 1
1 bark 4
1 dog 4
1 dog 7
RowID ColF ColQ
2 bark 2
2 bark 5
2 dog 5
2 dog 8
Server 1
Server 2 Server 3
Documents that
contain dog and bark
Iter1
Iter2
Iter1
Iter2
Iter1
Iter2
Q
Q Q
R:6 R:4
R:5
Elements in adjacency lists of “bark” and
“dog” stored in Accumulo in a Document
Partitioned Index
• RowID = shardID (doc num % 3)
• Column Family = term (bark or dog)
• Column Qualifier = adjacency
element (document number)
Using this index, can evaluate “entity-
centric queries” entirely on server
• On each server,
• iter1 scans “bark”
• iter2 scans “dog”
• Iterators intersect when colQ1 =
colQ2, then return result
Unicorn Implemented in Accumulo using Intersecting Iterators
Entity-Centric Index
Rya: Accumulo Indexing Strategies for Searching Semantic Networks 30
Unicorn Applied to RDF
Entity-Centric Index
• Adjacency Lists capture the edge label as well as the
connection
pers1 pers2 pers3
dog SUV
pers4
hasPet
obj/dog
employs
subj/USGovt
pers1 pers2 pers4
pers2 pers4
Adjacency lists of SUV and USGovt and do
• This SPARQL query asks for all people who own a dog,
drive a SUV, and work for the U.S. Government:
SELECT ?person WHERE { ?person <hasPet> <dog> .
?person <drives> <SUV> .
<USGovt> <employs> ?person . }
drives
hasPet
employs
USGovt
employs
drives
drives
drives
obj/SUV
pers2 pers3 pers4
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• For each triple (subj, pred, obj, graph-name), insert the following two
Accumulo keys in the same Entity-Centric Index table:
31
Key Design in Accumulo
Entity-Centric Index
Row: uri:John, CF: uri:worksAt, CQ: parsonsEmployeesx00objectx00uri:Parsons
Row: uri:Parsons, CF: uri:worksAt, CQ: parsonsEmployeesx00subjectx00uri:John
Accumulo Key
Row:<subject
>
Column
CF:<predicate
>
CQ:<graphName>x00objectx00<object>
Accumulo Key
Row:<object> Column
CF:<predicate>
CQ:<graphName>x00subjectx00<subject
>
• The triple (uri:John, uri:worksAt, uri:Parsons, graph context:
parsonEmployees) is added as the following two rows:
Rya: Accumulo Indexing Strategies for Searching Semantic Networks 32
Query Evaluation: Merge Joins and the Reduction of Network Traffic
Entity-Centric Index
SELECT ?person WHERE {
?person <hasPet> <dog>.
?person <drives> <SUV>.
<USGovt> <employs> ?person.}
Iter2
Q Q
R:pers2 R:pers4
Iter1
Iter3
Iter1
Iter3
Iter2
Using this index, can evaluate “entity-centric queries” entirely on server
• iter1 scans col: employs colQ: subject USGovt,
• iter2 scans col: drives colQ: object SUV,
• iter3 scans col: hasPet colQ: object dog
• Iterators intersect when rowID 1 = rowID 2 = rowID 3
RowI
D
ColF ColQ
dog hasPet S ….
pers1 hasPet O dog
pers2 employ
s
S USGovt
pers2 drives O bicycle
pers2 drives O SUV
pers2 hasPet O dog
RowI
D
ColF ColQ
pers3 drives 0 SUV
pers4 employ
s
S USGovt
pers4 drives O SUV
pers4 hasPet O dog
pers5 drives O SUV
SUV drives S ….
USGov
t
employ
s
O pers2
USGov
t
employ
s
O pers4
Server 1
Server 2
Rya: Accumulo Indexing Strategies for Searching Semantic Networks 33
Which Queries Can We Evaluate?
Entity-Centric Index
• Generalize Document Partitioned Index to accommodate a broad
range of SPARQL queries
• Solve as many Entity-Centric queries server side as possible
 Entity-Centric means all statement patterns share a common variable or constant
33
select ?x ?y ?z
where{
A aa ?x
A bb ?y
A cc ?z
}
select ?x
where{
?x aa C
?x bb B
?x cc D
}
select ?x ?y ?z
where{
B aa ?x
?x bb ?y
?x cc ?z
}
B C
D
?x
Entity with Properties
?x ?y
?z
A
Properties for an Entity
B ?y
?z
?x
“Friends of Friends”
aa
bb
cc
aa
bb
cc
aa
bb
cc
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• During query planning, group statement patterns according to
common variables and common constants
• Groups which have the highest “priority” are consolidated into an
Entity-Centric Index node
34
Query Planning: Anatomy of a Rya SPARQL Query using the Entity-Centric Index
Entity-Centric Index
?x livesIn Arlington
?y talksTo ?x
?y livesIn D.C.
?x commutesBy Bike
Entity-Centric Index
…
Joe, livesIn, D.C.
Joe, talksTo, Rob
…
Rob, commutesBy, Bike
Rob, livesIn, Arlington
…
Entity-
Centric
Index
Node
Entity-Centric Index Node
1
2
Join
Join
Join
Rya: Accumulo Indexing Strategies for Searching Semantic Networks 35
Entity Centric Benchmarking Results
Entity-Centric Index
• Results were obtained by running 13 queries (see Appendix) against the Lehigh
University Benchmark (LUBM) data set consisting of 33.34 million triples
• The Entity Centric Index table was split into 19 tablets distributed across 8
servers
 All predicates found in LUBM data were set as locality groups for the Entity Centric Index table
• Queries were issued using a BatchScanner with 15 threads
Query Entity Total Time (s) Rya Total Time (s) Results Ret.
LUBMStar Q1 23.6 624.702 1024789
LUBMStar Q2 0.3724 0.732 7
LUBMStar Q3 0.545 1.221 499
LUBMStar Q4 4.37 379.239 180002
LUBMStar Q5 1.475 6.072 40665
LUBMStar Q6 0.222 6.613 5003
LUBMStar Q7 11 0.3258 3
LUBMStar Q8 7.2 0.267 1
LUBMStar Q9 12.763 0.748 8
LUBMStar Q10 34.934 1929.984 1,259,374
LUBMStar Q11 0.0412 0.284 3
LUBMStar Q12 0.0358 0.311 2
LUBMStar Q13 0.0291 0.137 30
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• Next Steps:
 Implement strategies to determine PCJs dynamically
 Calculate frequent subgraphs
 Develop query logging framework
 Perform semantic analysis of queries to determine common
components to cache
 Streamline query planning with respect to PCJs
 Query planning time increases as number of PCJs increases
 Explore strategies for pruning PCJ query plan search space
to quickly determine efficient PCJ combinations for query
plans
 Index PCJs using underlying query components so that PCJs
can be efficiently discovered using the matching subquery
36
Next Steps for Precomputed Joins
Future Research
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
• Next Steps for Entity Centric Index
 Explore ways to improve join performance of Entity Centric Query
Nodes
 Add capability to explicitly define Entity Types for Index
 Entities implicitly defined as nodes containing specified combination of
properties
 Explicitly register entities with index and allow users to query by type
 Specify entities using OWL (Web Ontology Language) class and
property combinations
 Leverage additional structure using more targeted queries involving
identifying features for the give entity type
• Future Research in Server Side Join Evaluation
 Utilize Spark GraphX or Spark DataFrames to create a distributed
query evaluation framework for Rya
 Joins performed on Rya Resilient Distributed Datasets (RDDs) in
remote SparkContext on Server
37
Next Steps for Entity Centric Index and Server Side Join Evaluation
Future Research
Rya: Accumulo Indexing Strategies for Searching Semantic Networks 38
Questions?
Rya: Accumulo Indexing Strategies for Searching Semantic Networks 39
• Useful Links
• Entity Centric LUBM Star Queries
Appendix
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
Useful Links
SPARQL
 Standard -- http://www.w3.org/TR/rdf-sparql-query/
 Tutorial -- http://jena.apache.org/tutorials/sparql.html
RDF
 Primer -- http://www.w3.org/TR/rdf11-primer/
Unicorn
 Paper -- Michael Curtiss, et al., Unicorn: A System for Searching the Social Graph, Facebook
Inc. https://research.facebook.com/publications/unicorn-a-system-for-searching-the-social-
graph/
Apache Rya (Incubating)
 Home -- http://rya.apache.org/ Home page for Apache Rya (Incubating)
 Rya Office Hours -- Biweekly phone conference. Updates, issues, upcoming features.
Up-coming announcements with dial-in numbers are sent on the dev mailing list
 Mailing List -- dev@rya.incubator.apache.org is for usage questions, help, and
people who want to contribute code to Rya. subscribe, unsubscribe, archives
 Javadoc OpenRDF=Sesame=RDF4J -- http://archive.rdf4j.org/javadoc/sesame-2.7.16/
 Tutorial for RDF4J -- http://rdf4j.org/doc/programming-with-rdf4j/
 Paper -- Punnoose R., Crainiceanu A., Rapp D. 2012. Rya: a scalable RDF triple
store for the clouds. Proceedings of the 1st International Workshop on Cloud
Intelligence. http://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf
 Paper -- Roshan Punnoose, Adina Crainiceanu, David Rapp. SPARQL in the Clouds
Using Rya. Information Systems Journal (2013).
http://www.usna.edu/Users/cs/adina/research/Rya_ISjournal2013.pdf 40
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
Entity Centric LUBM Star Queries (1 of 5)
The Entity Centric index was tested by issuing the following queries against the LUBM data set.
LUBM Star Q1 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
SELECT ?X ?Y1 ?Y2 ?Y3 ?Y4
WHERE
{
?X ub:doctoralDegreeFrom ?Y2
?X ub:undergraduateDegreeFrom ?Y4
?Y1 ub:advisor ?X
?X ub:emailAddress ?Y3
}
LUBM Star Q2 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
SELECT ?X ?Y1
WHERE
{
?X ub:doctoralDegreeFrom <http://www.University104.edu>
?X ub:headOf ?Y1
}
LUBM Star Q3 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
SELECT ?X ?Y1
WHERE
{
?X ub:doctoralDegreeFrom <http://www.University104.edu>
?X ub:teacherOf ?Y1
}
41
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
LUBM Star Q4 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
SELECT ?X ?Y2 ?Y3 ?Y4
WHERE
{
?X ub:doctoralDegreeFrom ?Y2
?X ub:undergraduateDegreeFrom ?Y4
?Y1 ub:advisor ?X
?X ub:emailAddress ?Y3
}
LUBM Star Q5 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
SELECT ?X ?Y1 ?Y2
WHERE
{
?X ub:headOf ?Y2
?Y1 ub:advisor ?X
}
LUBM Star Q6 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
SELECT ?X ?Y1 ?Y2
WHERE
{
?X ub:headOf ?Y1
?X ub:doctoralDegreeFrom ?Y2
}
42
Entity Centric LUBM Star Queries (2 of 5)
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
LUBM Star Q7 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
SELECT ?X ?Y1 ?Y2
WHERE
{
<http://www.Department0.University0.edu/UndergraduateStudent106> ub:takesCourse ?Y1
?Y2 ub:teacherOf ?Y1
}
LUBM Star Q8 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
SELECT ?X ?Y
WHERE
{
<http://www.Department0.University114.edu/UndergraduateStudent168> ub:memberOf ?X
?X ub:subOrganizationOf ?Y
}
LUBM Star Q9 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
SELECT ?X ?Y1 ?Y2 ?Y3 ?Y4
WHERE
{
?X ub:takesCourse <http://www.Department0.University101.edu/GraduateCourse31>
?X ub:undergraduateDegreeFrom ?Y1
?X ub:emailAddress ?Y2
?X ub:memberOf ?Y3
}
43
Entity Centric LUBM Star Queries (3 of 5)
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
LUBM Star Q10 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
SELECT ?X ?Y1 ?Y2 ?Y3 ?Y4
WHERE
{
?X ub:takesCourse ?Y4
?X ub:undergraduateDegreeFrom ?Y1
?X ub:emailAddress ?Y2
?X ub:memberOf ?Y3
}
LUBM Star Q11 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
SELECT ?X ?Y1 ?Y2 ?Y3 ?Y4
WHERE
{
<http://www.Department9.University150.edu/GraduateStudent72> ub:takesCourse ?Y4
<http://www.Department9.University150.edu/GraduateStudent72> ub:undergraduateDegreeFrom ?Y1
<http://www.Department9.University150.edu/GraduateStudent72> ub:emailAddress ?Y2
<http://www.Department9.University150.edu/GraduateStudent72> ub:memberOf ?Y3
}
LUBM Star Q12 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
SELECT ?X ?Y1 ?Y2 ?Y3 ?Y4
WHERE
{
<http://www.Department17.University156.edu/GraduateStudent21> ub:takesCourse ?Y4
<http://www.Department17.University156.edu/GraduateStudent21> ub:undergraduateDegreeFrom ?Y1
<http://www.Department17.University156.edu/GraduateStudent21> ub:emailAddress ?Y2
<http://www.Department17.University156.edu/GraduateStudent21> ub:memberOf ?Y3
"}
44
Entity Centric LUBM Star Queries (4 of 5)
Rya: Accumulo Indexing Strategies for Searching Semantic Networks
LUBM Star Q13 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
SELECT ?Y1 ?Y2
WHERE"
{
?Y1 ub:takesCourse <http://www.Department0.University0.edu/Course0>
?Y2 ub:teacherOf <http://www.Department0.University0.edu/Course0>
}
45
Entity Centric LUBM Star Queries (5 of 5)

Más contenido relacionado

La actualidad más candente

Sparser: Faster Parsing of Unstructured Data Formats in Apache Spark with Fir...
Sparser: Faster Parsing of Unstructured Data Formats in Apache Spark with Fir...Sparser: Faster Parsing of Unstructured Data Formats in Apache Spark with Fir...
Sparser: Faster Parsing of Unstructured Data Formats in Apache Spark with Fir...Databricks
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Spark Summit
 
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...Revolution Analytics
 
Spark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit
 
New directions for Apache Spark in 2015
New directions for Apache Spark in 2015New directions for Apache Spark in 2015
New directions for Apache Spark in 2015Databricks
 
On-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy ModelsOn-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy ModelsDatabricks
 
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...Databricks
 
Spark Summit EU talk by Miha Pelko and Til Piffl
Spark Summit EU talk by Miha Pelko and Til PifflSpark Summit EU talk by Miha Pelko and Til Piffl
Spark Summit EU talk by Miha Pelko and Til PifflSpark Summit
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkDatabricks
 
Using spark for timeseries graph analytics
Using spark for timeseries graph analyticsUsing spark for timeseries graph analytics
Using spark for timeseries graph analyticsSigmoid
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Big Data Spain
 
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark Summit
 
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
Spark Summit San Francisco 2016 - Ali Ghodsi KeynoteSpark Summit San Francisco 2016 - Ali Ghodsi Keynote
Spark Summit San Francisco 2016 - Ali Ghodsi KeynoteDatabricks
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Databricks
 
Enabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and REnabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and RDatabricks
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at UberDatabricks
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
An elastic batch-and stream-processing stack with Pravega and Apache Flink
An elastic batch-and stream-processing stack with Pravega and Apache FlinkAn elastic batch-and stream-processing stack with Pravega and Apache Flink
An elastic batch-and stream-processing stack with Pravega and Apache FlinkDataWorks Summit
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Spark Summit
 

La actualidad más candente (20)

Sparser: Faster Parsing of Unstructured Data Formats in Apache Spark with Fir...
Sparser: Faster Parsing of Unstructured Data Formats in Apache Spark with Fir...Sparser: Faster Parsing of Unstructured Data Formats in Apache Spark with Fir...
Sparser: Faster Parsing of Unstructured Data Formats in Apache Spark with Fir...
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
 
Spark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer Agarwal
 
New directions for Apache Spark in 2015
New directions for Apache Spark in 2015New directions for Apache Spark in 2015
New directions for Apache Spark in 2015
 
On-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy ModelsOn-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy Models
 
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
 
Spark Summit EU talk by Miha Pelko and Til Piffl
Spark Summit EU talk by Miha Pelko and Til PifflSpark Summit EU talk by Miha Pelko and Til Piffl
Spark Summit EU talk by Miha Pelko and Til Piffl
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache Spark
 
Using spark for timeseries graph analytics
Using spark for timeseries graph analyticsUsing spark for timeseries graph analytics
Using spark for timeseries graph analytics
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
 
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
 
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
Spark Summit San Francisco 2016 - Ali Ghodsi KeynoteSpark Summit San Francisco 2016 - Ali Ghodsi Keynote
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
 
Enabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and REnabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and R
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at Uber
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
An elastic batch-and stream-processing stack with Pravega and Apache Flink
An elastic batch-and stream-processing stack with Pravega and Apache FlinkAn elastic batch-and stream-processing stack with Pravega and Apache Flink
An elastic batch-and stream-processing stack with Pravega and Apache Flink
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
 

Destacado

Accumulo Summit 2016: Effective Testing of Apache Accumulo Iterators
Accumulo Summit 2016: Effective Testing of Apache Accumulo IteratorsAccumulo Summit 2016: Effective Testing of Apache Accumulo Iterators
Accumulo Summit 2016: Effective Testing of Apache Accumulo IteratorsAccumulo Summit
 
HBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User GroupHBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User GroupCloudera, Inc.
 
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in AccumuloAccumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in AccumuloAccumulo Summit
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataMarin Dimitrov
 
Machine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and AccumuloMachine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and AccumuloRahul Singh
 

Destacado (6)

Accumulo Summit 2016: Effective Testing of Apache Accumulo Iterators
Accumulo Summit 2016: Effective Testing of Apache Accumulo IteratorsAccumulo Summit 2016: Effective Testing of Apache Accumulo Iterators
Accumulo Summit 2016: Effective Testing of Apache Accumulo Iterators
 
HBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User GroupHBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User Group
 
SQL on Accumulo
SQL on AccumuloSQL on Accumulo
SQL on Accumulo
 
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in AccumuloAccumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Machine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and AccumuloMachine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and Accumulo
 

Similar a Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Networks

Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Lucy McKenna
 
State of Florida Neo4j Graph Briefing - Cyber IAM
State of Florida Neo4j Graph Briefing - Cyber IAMState of Florida Neo4j Graph Briefing - Cyber IAM
State of Florida Neo4j Graph Briefing - Cyber IAMNeo4j
 
Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation WorkflowsSCAPE Project
 
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AGOLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AGLucidworks
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
Information Exploitation at BBN
Information Exploitation at BBNInformation Exploitation at BBN
Information Exploitation at BBNPlamen Petrov
 
UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"
UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"
UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"CTSI at UCSF
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-Systeminside-BigData.com
 
Scaling Analytics with Apache Spark
Scaling Analytics with Apache SparkScaling Analytics with Apache Spark
Scaling Analytics with Apache SparkQuantUniversity
 
Data Café — A Platform For Creating Biomedical Data Lakes
Data Café — A Platform For Creating Biomedical Data LakesData Café — A Platform For Creating Biomedical Data Lakes
Data Café — A Platform For Creating Biomedical Data LakesPradeeban Kathiravelu, Ph.D.
 
FAIRsharing - ENVRI-FAIR Webinar
FAIRsharing - ENVRI-FAIR WebinarFAIRsharing - ENVRI-FAIR Webinar
FAIRsharing - ENVRI-FAIR WebinarPeter McQuilton
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabSri Ambati
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sqlaftab alam
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
 
Southwest Power Pool big data case study
Southwest Power Pool big data case study Southwest Power Pool big data case study
Southwest Power Pool big data case study Seeling Cheung
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 PresentationsAna Rebelo
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...DataWorks Summit
 

Similar a Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Networks (20)

Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
 
State of Florida Neo4j Graph Briefing - Cyber IAM
State of Florida Neo4j Graph Briefing - Cyber IAMState of Florida Neo4j Graph Briefing - Cyber IAM
State of Florida Neo4j Graph Briefing - Cyber IAM
 
Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation Workflows
 
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AGOLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
 
Lawless-3-jun15
Lawless-3-jun15Lawless-3-jun15
Lawless-3-jun15
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Information Exploitation at BBN
Information Exploitation at BBNInformation Exploitation at BBN
Information Exploitation at BBN
 
Data replication and synchronization tool
Data replication and synchronization toolData replication and synchronization tool
Data replication and synchronization tool
 
UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"
UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"
UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
Scaling Analytics with Apache Spark
Scaling Analytics with Apache SparkScaling Analytics with Apache Spark
Scaling Analytics with Apache Spark
 
Data Café — A Platform For Creating Biomedical Data Lakes
Data Café — A Platform For Creating Biomedical Data LakesData Café — A Platform For Creating Biomedical Data Lakes
Data Café — A Platform For Creating Biomedical Data Lakes
 
FAIRsharing - ENVRI-FAIR Webinar
FAIRsharing - ENVRI-FAIR WebinarFAIRsharing - ENVRI-FAIR Webinar
FAIRsharing - ENVRI-FAIR Webinar
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
Southwest Power Pool big data case study
Southwest Power Pool big data case study Southwest Power Pool big data case study
Southwest Power Pool big data case study
 
ODSC and iRODS
ODSC and iRODSODSC and iRODS
ODSC and iRODS
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 Presentations
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 

Último

Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 

Último (20)

Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 

Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Networks

  • 1. Rya: Accumulo Indexing Strategies for Searching Semantic Networks Rya: Accumulo Indexing Strategies for Searching Semantic Networks Dr. Caleb Meier, Puja Valiyil, David Lotts, Aaron Mihalik, Dr. Adina Crainiceanu 00.00.00 Presenter’s NameDISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. ONR Case Number 43-2117-16 EXIM APPROVED Parsons #459 8 OCT 16
  • 2. Rya: Accumulo Indexing Strategies for Searching Semantic Networks Acknowledgements • The work presented herein was funded by the Office of Naval Research (ONR) and the National Geospatial- Intelligence Agency (NGA) under contract # N00014-12-C- 0365 • This presentation was sponsored by Parsons • This work is the collective effort of:  Parsons’ Rya Team: Puja Valiyil, Aaron Mihalik, Caleb Meier, David Lotts, Jennifer Brown  Rya Founders: Roshan Punnoose, Adina Crainiceanu, and David Rapp 1
  • 3. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 2 Agenda • Rya Background • Materialized Views in Rya • Entity-Centric Index
  • 4. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 3 Agenda •Rya Background • Materialized Views in Rya • Entity-Centric Index
  • 5. Rya: Accumulo Indexing Strategies for Searching Semantic Networks • Apache Rya (incubating): Resource Description Framework (RDF) Triplestore built on top of Accumulo or MongoDB • RDF: W3C standard for representing linked/graph data  Represents data as statements (assertions) about resources  Serialized as triples in {subject, predicate, object} form  Example:  {Caleb, worksAt, Parsons}  {Caleb, livesIn, Virginia} 4 Rya and RDF Background Caleb Parsons Virginia worksAt livesIn
  • 6. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 5 SPARQL Background SELECT ?people WHERE { ?people <worksAt> <Parsons>. ?people <livesIn> <Virginia>. } • RDF Queries are described using SPARQL  SPARQL Protocol and RDF Query Language • SQL-like syntax for finding triples matching specific patterns  Look for subgraphs that match triple statement patterns  Joins are performed when there are variables common to two or more statement patterns
  • 7. Rya: Accumulo Indexing Strategies for Searching Semantic Networks • RDF4J* Interface for interacting with RDF data stored on Accumulo  RDF4J Open Source Java framework for storing and querying RDF data  RDF4J provides several interfaces/abstractions central for interacting with an RDF datastore  SAIL interface for interacting with underlying persisted RDF model  SAIL: Storage And Inference Layer 6 Rya Architecture Background Data storage layer Query processing in SAIL layer SPARQL Rya and RDF4J Rya QueryPlanner Accumulo *The RDF4J.org project was previously named “Open RDF”, then “Sesame”.
  • 8. Rya: Accumulo Indexing Strategies for Searching Semantic Networks • 3 Tables  SPO : subject, predicate, object  POS : predicate, object, subject  OSP : object, subject, predicate • Store triples in the Row ID of the table • Store graph name (context) in the Column Family • Advantages:  Native lexicographical sorting of row keys  fast range queries  All patterns can be translated into a scan of one of these tables 7 Storage: Triple Table Index Accumulo Composite Index and Table Design Key for SPO Table Row ID Column Timestam pFamily Qualifier Visibility subject, predicate, object, type graph name (not used) visibility timestamp
  • 9. Rya: Accumulo Indexing Strategies for Searching Semantic Networks Subject, Predicate, Object … Bob, livesIn, California … Greta, livesIn, Virginia … John, livesIn, Virginia … Anatomy of a RYA SPARQL Query Query Planning Step 2: For each ?x, SPO Table lookup Subject, Predicate, Object … Greta, commuteMethod, bike … John, commuteMethod, Bus … Step 3: For each ?x, SPO Table lookup Step 1: POS Table – scan range for worksAt ?x worksAt ?y ?x livesIn Virginia ?x commuteMethod bike SELECT ?x, ?y WHERE { ?x <worksAt> ?y. ?x <livesIn> Virginia. ?x <commuteMethod> bike. } Predicate, Object, Subject … studiesAt, Joe, Georgetown talksTo, Joe, Bob worksAt, Netflix, Bob worksAt, Parsons, Greta worksAt, PlayStation, John
  • 10. Rya: Accumulo Indexing Strategies for Searching Semantic Networks • Out of the box, Rya SPARQL query evaluation requires joining Accumulo scan results client side • Joins are performed using a hybrid of nested loop and hash joins  Range prefixes for scans are determined by results of previous scan  Sorted nested loop  Results of scan are joined with any values not used to form range prefix  Hash joins • BatchScanner boosts performance  Evaluates results in batches  Client side join evaluation is still a bottle neck  Especially for queries with:  Large intermediate join results  Large number of joins 9 Costly Joins of Accumulo Scans Query Evaluation
  • 11. Rya: Accumulo Indexing Strategies for Searching Semantic Networks • Materialized Views in Rya: Index Frequently Issued Queries or Frequent Subgraphs 1. Pre-compute a portion of the query 2. Store SPARQL describing the query along with pre-computed values in Accumulo 3. Normalize query variables to match stored SPARQL variables during query execution • Entity-Centric Index: Add new indices to eliminate need for hash joins 1. Apply Document Partitioned Indexing to graph data  Design tables so that all properties for each entity appear on single tablet 2. Find entities with intersecting properties  Use a variation of an Intersecting Iterator  Perform merge joins on the server • Additional indexing strategies could eliminate the need for client side joins  Clear trade off between query performance and memory footprint  Mitigate “data plume” 10 Indexing Strategies for Dealing With Costly Joins Query Evaluation
  • 12. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 11 Agenda • Rya Background •Materialized Views in Rya • Entity-Centric Index
  • 13. Rya: Accumulo Indexing Strategies for Searching Semantic Networks • Determine which query results to cache given that capacity is limited • Caching Strategies  Preference: manually submit queries to cache  Recent use  Complexity: number of joins, estimates of intermediate results  Relevance  Relevance to data (frequent subgraphs)  Relevance to users (query logs) • A comprehensive strategy should use all of the criteria above • Terms: Materialized Views and Precomputed Joins  A Materialized View is a cache of query results  Precomputed Joins in Rya are Materialized Views for SPARQL queries consisting of joins and filters 12 Determining which Queries to Cache Materialized Views in Rya
  • 14. Rya: Accumulo Indexing Strategies for Searching Semantic Networks • During query planning, statement patterns in a query are grouped according to common variables and common constants • Groups can be precomputed and cached, trading faster response times for increased storage 13 Query Planning: Anatomy of a Rya SPARQL Query using a Precomputed Join Materialized Views in Rya ?x :livesIn :Arlington ?y :talksTo ?x ?y :livesIn :DC. ?x :commutesBy :Bike Pre-Computed Join Index table Select ?a, ?b where { ?a :livesIn :DC . ?a :talksTo ?b .} ?a=Joe, ?b=Caleb ?a=Mike, ?b=Dave … ?a=Rob, ?b=Aaron Pre- Computed Join Index Node Join Join Join select ?x ?y where { ?y :livesIn :DC. ?y :talksTo ?x. ?x :livesIn :Arlington. ?x :commutesBy :Bike.}
  • 15. Rya: Accumulo Indexing Strategies for Searching Semantic Networks • Batch Update  Re-compute precomputed join tables periodically using MapReduce  Benefit:  Minimizes data plume  Drawback:  Large possibility of stale data  Query plans that use precomputed joins may contain inaccurate results • Incrementally update tables as triples are ingested  Use some sort of observer framework to update intermediate results as new triples are ingested  Benefit:  No staleness in query results  Query plans that use precomputed joins are more accurate  Significantly less latency for updates  Drawbacks:  Data plume  Intermediate query results have to be stored to incrementally update results  Observer framework increases the complexity of the system 14 Batch Updates Strategies for Maintaining Materialized Views
  • 16. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 15 Incremental Transactions Using Apache Fluo (incubating) Fluo Background • Address maintenance problem by incrementally updating cached results using Fluo  Fluo was created for Accumulo based on the Percolator paper1 by Google Inc.  Fluo provides additional features that Accumulo does not:  Multi-row transactions prevent write-write conflicts  Observer framework (next slide) • Use cases  Maintain large scale computation using series of small transaction updates  Join existing large data cache with new data  Formerly done by periodic batch processing jobs recreating the data cache 1. Daniel Peng, Frank Dabek. USENIX. 2010. Large-scale Incremental Processing Using Distributed Transactions and Notifications. http://research.google.com/pubs/pub36726.html
  • 17. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 16 Creating an Application using the Observer Framework Fluo Background • Overview of the Fluo Observer Framework  Observers monitor a given Fluo table column  When the observed column is updated, a notification is triggered which tells the observer to perform a transaction  The transaction is specified by the implementation of the observer’s process method  Takes in a transaction object, row and column  Uses the data to perform an action such as writing to another column in the table • Perform complex incremental updates by Chaining Observers  Decompose problem into a collection of interacting observers  Observers can write notifications that trigger other observers
  • 18. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 17 Join Observer: Output: {?patron, ?employee, ?business} Statement Pattern Observer: ?patron <http://talksTo> ?employee Output: {?patron, ?employee} Statement Pattern Observer: ?employee <http://worksAt> ?business Output: {?employee, ?business} People who talk to employees and where that employee works. Pairs of people who talk to each other Where people work Streamed Statements Streamed Statements SPARQL Query SELECT ?patron ?employee ?business WHERE { ?patron <http://talksTo> ?employee. ?employee <http://worksAt> ?business } Formulating a SPARQL Query as a Chain of Observers Maintaining Precomputed Joins
  • 19. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 18 Join Observer: Output: {?patron, ?employee, ?business} Statement Pattern Observer: ?patron <http://talksTo> ?employee Output: {?patron, ?employee} Statement Pattern Observer: ?employee <http://worksAt> ?business Output: {?employee, ?business} {patron=Alice, employee=Bob, business=CoffeeShop} {patron=Alice, employee=Bob}, {patron=Charlie, employee=David} {employee=Bob, business=CoffeeShop}, {employee=Eve, business=PizzaPlace} {Alice, talksTo, Bob}, {Charlie, talksTo, David}, {Bob, worksAt, CoffeeShop}, {Eve, worksAt, PizzaPlace} {Alice, talksTo, Bob}, {Charlie, talksTo, David}, {Bob, worksAt, CoffeeShop}, {Eve, worksAt, PizzaPlace} Incrementally Creating Results Using Query Observers (1 of 2) Maintaining Precomputed Joins
  • 20. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 19 Join Observer: Output: {?patron, ?employee, ?business} Statement Pattern Observer: ?patron <http://talksTo> ?employee Output: {?patron, ?employee} Statement Pattern Observer: ?employee <http://worksAt> ?business Output: {?employee, ?business} {patron=Alice, employee=Bob, business=CoffeeShop}, {patron=Charlie, employee=David, business=CoffeeShop} {patron=Alice, employee=Bob}, {patron=Charlie, employee=David} {employee=Bob, business=CoffeeShop}, {employee=Eve, business=PizzaPlace}, {employee=David, business=CoffeeShop} {David, worksAt, CoffeeShop} {David, worksAt, CoffeeShop} Incrementally Creating Results Using Query Observers (2 of 2) Maintaining Precomputed Joins
  • 21. Rya: Accumulo Indexing Strategies for Searching Semantic Networks Adding or deleting statements to the repository requires updating the precomputed join index table This requires updates to intermediate results within the Fluo Table 20 Overview of Rya Fluo Application Maintaining Precomputed Joins Triple Observer Join Observer Filter Observer Statement Pattern Observer Query Result Observer Fluo Rya Precomputed Join (PCJ) App Rya Client Insert Triples Accumulo PCJ Index Table
  • 22. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 21 System Overview Maintaining Precomputed Joins PCJ Client Statement Stream Rya Client Rya Core Tables (SPO, POS, OSP) Rya PCJ Table Fluo App Table Accumulo Fluo Incremental PCJ App Processes ExportsResults Inserts Statements Inserts Historic SP Matches Rya PCJ Table Rya PCJ Table Rya PCJ Table 1 2 3
  • 23. Rya: Accumulo Indexing Strategies for Searching Semantic Networks PCJ Client PCJ Index Core Rya Tables Fluo App Registering a New Query Maintaining Precomputed Joins 1. Register Query 2. Scan for historic Statement Pattern matches 4. Compute Results 5. Export Results to PCJ Index 22 3. Insert Statement Pattern matches
  • 24. Rya: Accumulo Indexing Strategies for Searching Semantic Networks PCJ Client Statement Stream Core Rya Tables Fluo App Streaming While Registering Query Maintaining Precomputed Joins 2. Scan for historic Statement Pattern matches 3. Insert Statement Pattern matches 4. Compute Results 1. Register Query A. Write new Statement B. Write new Statement 23
  • 25. Rya: Accumulo Indexing Strategies for Searching Semantic Networks • The results on this slide and the next were obtained using proprietary queries and data  Q1 is the most complex and consists of 14 joins, 6 of which are left joins  Q2 obtained from Q1 by removing two left joins and two joins  Q3-Q5 decrease in complexity as well and are obtained using a similar process  Q5 is similar to Q1, with all left joins replaced by joins • 4192 results were obtained by querying a Rya Instance with 500,000 triples installed on Parsons’ internal cluster  8 worker nodes, each with 2 x 6 Core Xeon E5-2440 (2.4GHz) Processors and 48 GB RAM • Table below presents results with average query time over 10 iterations with standard deviation: 24 Benchmark results for Rya with No Precomputed Joins and No Optimizations Materialized Views in Rya Q Rya with one exact PCJ (s) Rya with no PCJ (s) Q1 1.284 ± 0.047 516.774 ± 6.265 Q2 0.851 ± 0.042 345.606 ± 5.991 Q3 0.598 ± 0.026 180.663 ± 3.354 Q4 0.368 ± 0.026 63.588 ± 1.527 Q5 1.334 ± 0.074 97.101 ± 1.765
  • 26. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 25 Agenda • Rya Background • Materialized Views in Rya •Entity-Centric Index
  • 27. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 26 Facebook’s Unicorn Paper: Motivating Example Entity-Centric Index The problem is reduced to finding the intersection of lists: How do I find all documents containing “dog” and “bark”? 1. View docs and terms as a graph, with edges drawn from docs to the terms they contain 2. Efficiently represent graph as a collection of adjacency lists bark doc4 doc5dog doc6 doc1 doc4 doc2 doc5 doc7 dog bark doc3 doc8 doc6 dog bark doc1 doc2 doc3 doc4 doc5 doc6 doc4 doc5 doc6 doc7 doc8 Adjacency lists of dog and bark
  • 28. Rya: Accumulo Indexing Strategies for Searching Semantic Networks What if the adjacency lists are really large? The terms “dog” and “barks” could appear in lots of documents! • Distribute the problem by partitioning adjacency lists of documents across servers  Involves some type of sharding • Each server finds intersections of smaller lists: 27 Facebook’s Unicorn Paper: Distributing the Problem Entity-Centric Index dog bark dog bark 1 2 3 4 5 6 4 5 … 7 8 … Server 1 ShardID = 0 Server 2 ShardID = 1 ShardID = (doc num)%3 Server 3 ShardID = 2 3 6 ... 6 ... 1 4 ... 4 7 ... 2 5 ... 5 8 ... dog dog bark bark
  • 29. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 28 Unicorn Applied to Accumulo Entity-Centric Index Accumulo Key Row: doc shard Column CF: term CQ: document id • Unicorn Framework outlines the basis for a distributed document partitioned index • Accumulo has a framework1 in place for creating this index  Uses IndexedDocIterator which is an extension of an IntersectingIterator • Uses the following key design: 1. Accumulo: Application Development, Table Design, and Best Practices, Cordova A., Rinaldi B., Wall M., O’Reilly 2015
  • 30. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 29 RowID ColF ColQ 0 bark 6 0 dog 3 0 dog 6 RowID ColF ColQ 1 bark 1 1 bark 4 1 dog 4 1 dog 7 RowID ColF ColQ 2 bark 2 2 bark 5 2 dog 5 2 dog 8 Server 1 Server 2 Server 3 Documents that contain dog and bark Iter1 Iter2 Iter1 Iter2 Iter1 Iter2 Q Q Q R:6 R:4 R:5 Elements in adjacency lists of “bark” and “dog” stored in Accumulo in a Document Partitioned Index • RowID = shardID (doc num % 3) • Column Family = term (bark or dog) • Column Qualifier = adjacency element (document number) Using this index, can evaluate “entity- centric queries” entirely on server • On each server, • iter1 scans “bark” • iter2 scans “dog” • Iterators intersect when colQ1 = colQ2, then return result Unicorn Implemented in Accumulo using Intersecting Iterators Entity-Centric Index
  • 31. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 30 Unicorn Applied to RDF Entity-Centric Index • Adjacency Lists capture the edge label as well as the connection pers1 pers2 pers3 dog SUV pers4 hasPet obj/dog employs subj/USGovt pers1 pers2 pers4 pers2 pers4 Adjacency lists of SUV and USGovt and do • This SPARQL query asks for all people who own a dog, drive a SUV, and work for the U.S. Government: SELECT ?person WHERE { ?person <hasPet> <dog> . ?person <drives> <SUV> . <USGovt> <employs> ?person . } drives hasPet employs USGovt employs drives drives drives obj/SUV pers2 pers3 pers4
  • 32. Rya: Accumulo Indexing Strategies for Searching Semantic Networks • For each triple (subj, pred, obj, graph-name), insert the following two Accumulo keys in the same Entity-Centric Index table: 31 Key Design in Accumulo Entity-Centric Index Row: uri:John, CF: uri:worksAt, CQ: parsonsEmployeesx00objectx00uri:Parsons Row: uri:Parsons, CF: uri:worksAt, CQ: parsonsEmployeesx00subjectx00uri:John Accumulo Key Row:<subject > Column CF:<predicate > CQ:<graphName>x00objectx00<object> Accumulo Key Row:<object> Column CF:<predicate> CQ:<graphName>x00subjectx00<subject > • The triple (uri:John, uri:worksAt, uri:Parsons, graph context: parsonEmployees) is added as the following two rows:
  • 33. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 32 Query Evaluation: Merge Joins and the Reduction of Network Traffic Entity-Centric Index SELECT ?person WHERE { ?person <hasPet> <dog>. ?person <drives> <SUV>. <USGovt> <employs> ?person.} Iter2 Q Q R:pers2 R:pers4 Iter1 Iter3 Iter1 Iter3 Iter2 Using this index, can evaluate “entity-centric queries” entirely on server • iter1 scans col: employs colQ: subject USGovt, • iter2 scans col: drives colQ: object SUV, • iter3 scans col: hasPet colQ: object dog • Iterators intersect when rowID 1 = rowID 2 = rowID 3 RowI D ColF ColQ dog hasPet S …. pers1 hasPet O dog pers2 employ s S USGovt pers2 drives O bicycle pers2 drives O SUV pers2 hasPet O dog RowI D ColF ColQ pers3 drives 0 SUV pers4 employ s S USGovt pers4 drives O SUV pers4 hasPet O dog pers5 drives O SUV SUV drives S …. USGov t employ s O pers2 USGov t employ s O pers4 Server 1 Server 2
  • 34. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 33 Which Queries Can We Evaluate? Entity-Centric Index • Generalize Document Partitioned Index to accommodate a broad range of SPARQL queries • Solve as many Entity-Centric queries server side as possible  Entity-Centric means all statement patterns share a common variable or constant 33 select ?x ?y ?z where{ A aa ?x A bb ?y A cc ?z } select ?x where{ ?x aa C ?x bb B ?x cc D } select ?x ?y ?z where{ B aa ?x ?x bb ?y ?x cc ?z } B C D ?x Entity with Properties ?x ?y ?z A Properties for an Entity B ?y ?z ?x “Friends of Friends” aa bb cc aa bb cc aa bb cc
  • 35. Rya: Accumulo Indexing Strategies for Searching Semantic Networks • During query planning, group statement patterns according to common variables and common constants • Groups which have the highest “priority” are consolidated into an Entity-Centric Index node 34 Query Planning: Anatomy of a Rya SPARQL Query using the Entity-Centric Index Entity-Centric Index ?x livesIn Arlington ?y talksTo ?x ?y livesIn D.C. ?x commutesBy Bike Entity-Centric Index … Joe, livesIn, D.C. Joe, talksTo, Rob … Rob, commutesBy, Bike Rob, livesIn, Arlington … Entity- Centric Index Node Entity-Centric Index Node 1 2 Join Join Join
  • 36. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 35 Entity Centric Benchmarking Results Entity-Centric Index • Results were obtained by running 13 queries (see Appendix) against the Lehigh University Benchmark (LUBM) data set consisting of 33.34 million triples • The Entity Centric Index table was split into 19 tablets distributed across 8 servers  All predicates found in LUBM data were set as locality groups for the Entity Centric Index table • Queries were issued using a BatchScanner with 15 threads Query Entity Total Time (s) Rya Total Time (s) Results Ret. LUBMStar Q1 23.6 624.702 1024789 LUBMStar Q2 0.3724 0.732 7 LUBMStar Q3 0.545 1.221 499 LUBMStar Q4 4.37 379.239 180002 LUBMStar Q5 1.475 6.072 40665 LUBMStar Q6 0.222 6.613 5003 LUBMStar Q7 11 0.3258 3 LUBMStar Q8 7.2 0.267 1 LUBMStar Q9 12.763 0.748 8 LUBMStar Q10 34.934 1929.984 1,259,374 LUBMStar Q11 0.0412 0.284 3 LUBMStar Q12 0.0358 0.311 2 LUBMStar Q13 0.0291 0.137 30
  • 37. Rya: Accumulo Indexing Strategies for Searching Semantic Networks • Next Steps:  Implement strategies to determine PCJs dynamically  Calculate frequent subgraphs  Develop query logging framework  Perform semantic analysis of queries to determine common components to cache  Streamline query planning with respect to PCJs  Query planning time increases as number of PCJs increases  Explore strategies for pruning PCJ query plan search space to quickly determine efficient PCJ combinations for query plans  Index PCJs using underlying query components so that PCJs can be efficiently discovered using the matching subquery 36 Next Steps for Precomputed Joins Future Research
  • 38. Rya: Accumulo Indexing Strategies for Searching Semantic Networks • Next Steps for Entity Centric Index  Explore ways to improve join performance of Entity Centric Query Nodes  Add capability to explicitly define Entity Types for Index  Entities implicitly defined as nodes containing specified combination of properties  Explicitly register entities with index and allow users to query by type  Specify entities using OWL (Web Ontology Language) class and property combinations  Leverage additional structure using more targeted queries involving identifying features for the give entity type • Future Research in Server Side Join Evaluation  Utilize Spark GraphX or Spark DataFrames to create a distributed query evaluation framework for Rya  Joins performed on Rya Resilient Distributed Datasets (RDDs) in remote SparkContext on Server 37 Next Steps for Entity Centric Index and Server Side Join Evaluation Future Research
  • 39. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 38 Questions?
  • 40. Rya: Accumulo Indexing Strategies for Searching Semantic Networks 39 • Useful Links • Entity Centric LUBM Star Queries Appendix
  • 41. Rya: Accumulo Indexing Strategies for Searching Semantic Networks Useful Links SPARQL  Standard -- http://www.w3.org/TR/rdf-sparql-query/  Tutorial -- http://jena.apache.org/tutorials/sparql.html RDF  Primer -- http://www.w3.org/TR/rdf11-primer/ Unicorn  Paper -- Michael Curtiss, et al., Unicorn: A System for Searching the Social Graph, Facebook Inc. https://research.facebook.com/publications/unicorn-a-system-for-searching-the-social- graph/ Apache Rya (Incubating)  Home -- http://rya.apache.org/ Home page for Apache Rya (Incubating)  Rya Office Hours -- Biweekly phone conference. Updates, issues, upcoming features. Up-coming announcements with dial-in numbers are sent on the dev mailing list  Mailing List -- dev@rya.incubator.apache.org is for usage questions, help, and people who want to contribute code to Rya. subscribe, unsubscribe, archives  Javadoc OpenRDF=Sesame=RDF4J -- http://archive.rdf4j.org/javadoc/sesame-2.7.16/  Tutorial for RDF4J -- http://rdf4j.org/doc/programming-with-rdf4j/  Paper -- Punnoose R., Crainiceanu A., Rapp D. 2012. Rya: a scalable RDF triple store for the clouds. Proceedings of the 1st International Workshop on Cloud Intelligence. http://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf  Paper -- Roshan Punnoose, Adina Crainiceanu, David Rapp. SPARQL in the Clouds Using Rya. Information Systems Journal (2013). http://www.usna.edu/Users/cs/adina/research/Rya_ISjournal2013.pdf 40
  • 42. Rya: Accumulo Indexing Strategies for Searching Semantic Networks Entity Centric LUBM Star Queries (1 of 5) The Entity Centric index was tested by issuing the following queries against the LUBM data set. LUBM Star Q1 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#> SELECT ?X ?Y1 ?Y2 ?Y3 ?Y4 WHERE { ?X ub:doctoralDegreeFrom ?Y2 ?X ub:undergraduateDegreeFrom ?Y4 ?Y1 ub:advisor ?X ?X ub:emailAddress ?Y3 } LUBM Star Q2 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#> SELECT ?X ?Y1 WHERE { ?X ub:doctoralDegreeFrom <http://www.University104.edu> ?X ub:headOf ?Y1 } LUBM Star Q3 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#> SELECT ?X ?Y1 WHERE { ?X ub:doctoralDegreeFrom <http://www.University104.edu> ?X ub:teacherOf ?Y1 } 41
  • 43. Rya: Accumulo Indexing Strategies for Searching Semantic Networks LUBM Star Q4 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#> SELECT ?X ?Y2 ?Y3 ?Y4 WHERE { ?X ub:doctoralDegreeFrom ?Y2 ?X ub:undergraduateDegreeFrom ?Y4 ?Y1 ub:advisor ?X ?X ub:emailAddress ?Y3 } LUBM Star Q5 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#> SELECT ?X ?Y1 ?Y2 WHERE { ?X ub:headOf ?Y2 ?Y1 ub:advisor ?X } LUBM Star Q6 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#> SELECT ?X ?Y1 ?Y2 WHERE { ?X ub:headOf ?Y1 ?X ub:doctoralDegreeFrom ?Y2 } 42 Entity Centric LUBM Star Queries (2 of 5)
  • 44. Rya: Accumulo Indexing Strategies for Searching Semantic Networks LUBM Star Q7 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#> SELECT ?X ?Y1 ?Y2 WHERE { <http://www.Department0.University0.edu/UndergraduateStudent106> ub:takesCourse ?Y1 ?Y2 ub:teacherOf ?Y1 } LUBM Star Q8 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#> SELECT ?X ?Y WHERE { <http://www.Department0.University114.edu/UndergraduateStudent168> ub:memberOf ?X ?X ub:subOrganizationOf ?Y } LUBM Star Q9 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#> SELECT ?X ?Y1 ?Y2 ?Y3 ?Y4 WHERE { ?X ub:takesCourse <http://www.Department0.University101.edu/GraduateCourse31> ?X ub:undergraduateDegreeFrom ?Y1 ?X ub:emailAddress ?Y2 ?X ub:memberOf ?Y3 } 43 Entity Centric LUBM Star Queries (3 of 5)
  • 45. Rya: Accumulo Indexing Strategies for Searching Semantic Networks LUBM Star Q10 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#> SELECT ?X ?Y1 ?Y2 ?Y3 ?Y4 WHERE { ?X ub:takesCourse ?Y4 ?X ub:undergraduateDegreeFrom ?Y1 ?X ub:emailAddress ?Y2 ?X ub:memberOf ?Y3 } LUBM Star Q11 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#> SELECT ?X ?Y1 ?Y2 ?Y3 ?Y4 WHERE { <http://www.Department9.University150.edu/GraduateStudent72> ub:takesCourse ?Y4 <http://www.Department9.University150.edu/GraduateStudent72> ub:undergraduateDegreeFrom ?Y1 <http://www.Department9.University150.edu/GraduateStudent72> ub:emailAddress ?Y2 <http://www.Department9.University150.edu/GraduateStudent72> ub:memberOf ?Y3 } LUBM Star Q12 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#> SELECT ?X ?Y1 ?Y2 ?Y3 ?Y4 WHERE { <http://www.Department17.University156.edu/GraduateStudent21> ub:takesCourse ?Y4 <http://www.Department17.University156.edu/GraduateStudent21> ub:undergraduateDegreeFrom ?Y1 <http://www.Department17.University156.edu/GraduateStudent21> ub:emailAddress ?Y2 <http://www.Department17.University156.edu/GraduateStudent21> ub:memberOf ?Y3 "} 44 Entity Centric LUBM Star Queries (4 of 5)
  • 46. Rya: Accumulo Indexing Strategies for Searching Semantic Networks LUBM Star Q13 = PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX ub:<http://swat.cse.lehigh.edu/onto/univ-bench.owl#> SELECT ?Y1 ?Y2 WHERE" { ?Y1 ub:takesCourse <http://www.Department0.University0.edu/Course0> ?Y2 ub:teacherOf <http://www.Department0.University0.edu/Course0> } 45 Entity Centric LUBM Star Queries (5 of 5)

Notas del editor

  1. During query planning, statement patterns in a query are grouped according to common variables and common constants Those groups that can match the statement patterns of a cached precomputed join index table can then be replaced by the results stored in that table, trading faster response time for increased storage
  2. Proof that we can stream while constructing new queries: You can stream new results into the Core Rya Tables and Fluo App while you are registering a new query without missing any Statement Pattern matches so long as you finish writing the query to fluo before you start scanning for historic matches AND you write new statements to Rya before you write them to Fluo. There are 4 atoms of work related to this process: Write Statement to Fluo (WF) Write Statement to Rya (WR) Commit Query to Fluo (Q) Scan Rya for historic results (S) Orderings that will result in one of the two ways a statement can be found missing the statement: WF … Q = The Triples observer will miss the Statement for the SP. S … WR = The historic scan will miss the Statement when scanning for the SP. We can avoid the state where both of these cases are true by following these ordering rules: Q must come before S WR must come before WF These rules leave us with the following operation orderings, all of which do not miss any statements: Q, S, WR, WF Q, WR, S, WF Q, WR, WF, S WR, WF, Q, S WR, Q, WF, S WR, Q , S, WF WF, Q, S, WR = missed statement
  3. Using the Entity-Centric Index, lists of “bark” and “dog” are stored in Accumulo in a Document Partitioned Index