SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
An early look at the
LDBC Social Network Benchmark’s
Business Intelligence workload
Gábor Szárnyas, Arnau Prat-Pérez, Alex Averbuch, József Marton, Marcus Paradies,
Moritz Kaufmann, Orri Erling, Peter Boncz, Vlad Haprian, János Benjamin Antal
GRADES-NDA @ SIGMOD
Houston, TX
2
Linked Data Benchmark Council
LDBC is a non-profit organization
dedicated to establishing benchmarks,
benchmark practices and benchmark
results for graph data management SW.
LDBC’s Social Network Benchmark is
an industrial and academic initiative,
formed by principal actors in the field of
graph-like data management.
3
LDBC timeline
2012 2013 2014 2015 2016 2017 2018
30+ papers in total,
including G-CORE
(presented on Thursday)
4
Graph processing landscape
Interactive
Graphalytics
BI
local queries
global queries
computations
5
BI global queries
Graph processing landscape
Interactive
Graphalytics
local queries
computations
Example: “Recently liked by friends”
MATCH
(u:User {id: $uID})-[:FRIEND]-(f:User)-[l:LIKES]->(p:Post)
RETURN f, p
ORDER BY l.timestamp DESC
LIMIT 10
frequent upd.limited data
6
Graph processing landscape
Interactive
Graphalytics
local queries
computations
BI global queries
frequent upd.limited data
Example: “One-sided friendships”
MATCH (u1:User)-[:FRIEND]-(u2:User)-[l:LIKES]->(p:Post),
(u1)-[:AUTHOR_OF]->(p)
WITH u1, u2, count(l) AS likes
WHERE likes > 10
AND NOT (u1)-[:LIKES]->(:Post)<-[:AUTHOR_OF]-(u2)
RETURN u1, u2
lots of data infreq. upd.
7
Graph processing landscape
Interactive
Graphalytics
BI
local queries
global queries
computations
frequent upd.limited data
lots of data infreq. upd.
Example: “Find the most central individuals.”
BFS breadth-first search LCC local clustering coefficient
PR PageRank SSSP single-source shortest path
CDLP community detection by label propagation
WCC weakly connected components
all data no upd.
8
Graph processing landscape
all data no upd.
lots of upd.limited data
lots of data few upd.
Interactive
Graphalytics
BI
local queries
global queries
computations
9
Business
Intelligence
amount of data accessed
expectedexecutiontime
Interactive
LDBC benchmarks at a glance
Graphalytics
10
Business
Intelligence
amount of data accessed
expectedexecutiontime
Interactive
LDBC benchmarks at a glance
Graphalytics
Social Network
Benchmark
11
Social network graph
DATAGEN:
• Generate realistic graphs
• Multiple scale factors (SFs)
Nodes:
• Collection attributes
• Type inheritance
Edges:
• With attributes
• Edges between similar nodes
• Network of Persons
• Reply tree of Posts/Comments
12
Business
Intelligence
amount of data accessed
expectedexecutiontime
Interactive
LDBC benchmarks at a glance
Graphalytics
13
Detailed query specifications
14 design starts here
15
Choke points
• = a challenging aspect of query processing, a well-chosen difficulty
• Allows systematic benchmark design
Peter Boncz, Thomas Neumann, Orri Erling,
TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark,
TPCTC 2013
16
Q5: Top posters in a country
1. Find the top 100 Forums by members in a given Country.
2. For each member of the top 100 Forums, count their Posts in the top 100 Forums.
1 2 3 4
4 3 2 1
Forum to Country
Country to Forum
57 s
0.3 s
CP-2.1 Rich join order optimization
Sparksee
SF10
17
Choke points: optimizations
• “Top-k pushdown” optimization
• New in LDBC (not covered in TPC-H choke points)
18
Q22: International dialog
For each p1-p2 pair, calculate score and get top pair (w/ tie-break)
+ 4 if p1 replied to p2
+ 1 if p2 replied to p1
+15 if p1 knows p2
+10 if p1 liked p2’s msg
+ 1 if p2 liked p1’s msg
= max. 31 in total
19
Q22: International dialog
Avoiding full Cartesian product with Top-k pushdown:
Example #1:
• There are k pairs with maximum points (31).
• A pair cannot possibly achieve max. points  prune
Example #2:
• There are k pairs with at least 20 points.
• A pair fails the condition for 15 points  prune
+ 4 if p1 replied
+ 1 if p2 replied
+15 if p1 knows p2
+10 if p1 liked
+ 1 if p2 liked
= max. 31 in total
20
Q16: Experts in social circle
• CP-1.3 [QOPT] Top-k pushdown
• CP-7.1 [QEXE] Path pattern reuse
• CP-7.2 [QOPT] Cardinality estimation of transitive paths
• CP-7.3 [QEXE] Execution of a transitive step
Baseline 29 s
Top-k 27 s
Top-k + Path pattern reuse 15 s
Sparksee
run times
on SF10
21
Language choke points
New choke points to cover language features.
• CP-8.1: Complex patterns
• CP-8.2: Complex aggregations
• CP-8.3: Ranking-style queries
• CP-8.4: Query composition
• CP-8.5: Dates and times
• CP-8.6: Handling paths
22
Language choke points
New choke points to cover language features.
• CP-8.1: Complex patterns
• CP-8.2: Complex aggregations
• CP-8.3: Ranking-style queries
• CP-8.4: Query composition
• CP-8.5: Dates and times
• CP-8.6: Handling paths
Q22: select top pair for each city1
“LIMIT 1” not sufficient
PostgreSQL: rank()
23
Language choke points
New choke points to cover language features.
• CP-8.1: Complex patterns
• CP-8.2: Complex aggregations
• CP-8.3: Ranking-style queries
• CP-8.4: Query composition
• CP-8.5: Dates and times
• CP-8.6: Handling paths
Q5: top 100 forums
(Important feature of G-CORE.)
24
Language choke points
New choke points to cover language features.
• CP-8.1: Complex patterns
• CP-8.2: Complex aggregations
• CP-8.3: Ranking-style queries
• CP-8.4: Query composition
• CP-8.5: Dates and times
• CP-8.6: Handling paths
Q1: aggregate for each month
“Datetime” features:
• SQL 
• SPARQL 
• Cypher: recently added 
25
Language choke points
New choke points to cover language features.
• CP-8.1: Complex patterns
• CP-8.2: Complex aggregations
• CP-8.3: Ranking-style queries
• CP-8.4: Query composition
• CP-8.5: Dates and times
• CP-8.6: Handling paths
26
Q25: Weighted interaction paths
1. Given two Persons, get all shortest paths on “knows” edges.
2. For each path, for each edge on the path, calculate a weight.
3. For each path, summarize weights.
4. Return paths, ordered by weights (desc).
(Q25 covers 15 CPs, incl. all language-related ones – its SQL impl. is ~2500 chars)
27
CP-8.6 Handling paths
1. Path unwinding: “higher-order” queries
Q25: weights based on additional pattern matching on path elements.
2. Matching semantics for paths
• Homomorphism-based (walks)
• Isomorphism-based
• No-repeated-anything
• No-repeated-edge semantics (trails)
• No-repeated-node semantics (simple path)
Q16: “social circle” – persons connected by an edge-unique paths of [x, y] hops
3. Regular path queries (RPQs)
R. Angles et al.,
Foundations of Modern Query Languages for Graph Databases,
ACM Computing Surveys, 2017
28
CP-8.6 Handling paths
29
Implementing the BI workload
High-level process
1. Generate data set
2. Implement loader
3. Implement driver adapter
4. Implement queries and validate
Very time consuming process, but…
• after 2 validated tools – still bugs in both implementations
• after 3 validated tools – still ambiguities in the spec
Validation
1. Generate validation set
2. Cross-validate for multiple SFs
3. Failure  fix issues and go to 2
30
Implementing the BI workload
Cross-validation for implementations
Cypher Neo4j 25/25
SQL PostgreSQL 25/25
Imperative Sparksee 25/25
SPARQL Stardog 24/25
PGQL Oracle Labs PGX 10/25
In progress: Spark SQL
31
Roadmap
1. Help industry adoption  get more benchmark results
2. Define updates on the graph.
Necessitates complex dependency handling (SIGMOD’15 paper), and
raises many design choices:
• Affected types which nodes/edges? what distribution?
• Nature of changes append-only vs. insert and delete
• Granularity nodes/edges vs. attributes
• Frequency of changes streaming vs. batch
3. Publish as a conference paper
32
Acknowledgements
Gábor Szárnyas was partially supported by NSERC RGPIN-04573-16 (Canada) and
the MTA-BME Lendület Cyber-Physical Systems Research Group (Hungary).
DAMA-UPC research was supported by the grant TIN2017-89244-R from MINECO
(Ministerio de Economia, Industria y Competitividad) and the recognition 2017SGR-856
(MACDA) from AGAUR (Generalitat de Catalunya).
Sparsity thanks the EU H2020 for funding the Uniserver project (ICT-04-2015-688540).
MTA-BME Lendület
Cyber-Physical Systems Research Group
Department of Measurement
and Information Systems
Department of Electrical and
Computer Engineering

Más contenido relacionado

Similar a An early look at the LDBC Social Network Benchmark's Business Intelligence workload

.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf....NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...Karel Zikmund
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015Ioan Toma
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczIoan Toma
 
Msr2010 ibrahim
Msr2010 ibrahimMsr2010 ibrahim
Msr2010 ibrahimSAIL_QU
 
Demystifying Benchmarks: How to Use Them To Better Evaluate Databases
Demystifying Benchmarks: How to Use Them To Better Evaluate DatabasesDemystifying Benchmarks: How to Use Them To Better Evaluate Databases
Demystifying Benchmarks: How to Use Them To Better Evaluate DatabasesClustrix
 
8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark
8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark
8th TUC Meeting – Marcus Paradies (SAP) Social Network BenchmarkLDBC council
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph GeneratorLDBC council
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...eswcsummerschool
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceDataWorks Summit/Hadoop Summit
 
LDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC council
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to ProductionMostafa Majidpour
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczIoan Toma
 
Keynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter BonczKeynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter BonczLDBC council
 
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log InsightVMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log InsightVMworld
 
Chengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big dataChengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big datajins0618
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
 

Similar a An early look at the LDBC Social Network Benchmark's Business Intelligence workload (20)

.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf....NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
 
computer architecture.
computer architecture.computer architecture.
computer architecture.
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
 
Msr2010 ibrahim
Msr2010 ibrahimMsr2010 ibrahim
Msr2010 ibrahim
 
Demystifying Benchmarks: How to Use Them To Better Evaluate Databases
Demystifying Benchmarks: How to Use Them To Better Evaluate DatabasesDemystifying Benchmarks: How to Use Them To Better Evaluate Databases
Demystifying Benchmarks: How to Use Them To Better Evaluate Databases
 
8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark
8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark
8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open Source
 
LDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status update
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter Boncz
 
Keynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter BonczKeynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter Boncz
 
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log InsightVMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
Chengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big dataChengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 

Más de Gábor Szárnyas

GraphBLAS: A linear algebraic approach for high-performance graph queries
GraphBLAS: A linear algebraic approach for high-performance graph queriesGraphBLAS: A linear algebraic approach for high-performance graph queries
GraphBLAS: A linear algebraic approach for high-performance graph queriesGábor Szárnyas
 
What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?Gábor Szárnyas
 
Mapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLMapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLGábor Szárnyas
 
Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesIncremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesGábor Szárnyas
 
Writing a Cypher Engine in Clojure
Writing a Cypher Engine in ClojureWriting a Cypher Engine in Clojure
Writing a Cypher Engine in ClojureGábor Szárnyas
 
Learning Timed Automata with Cypher
Learning Timed Automata with CypherLearning Timed Automata with Cypher
Learning Timed Automata with CypherGábor Szárnyas
 
Időzített automatatanulás Cypherrel
Időzített automatatanulás CypherrelIdőzített automatatanulás Cypherrel
Időzített automatatanulás CypherrelGábor Szárnyas
 
Compiling openCypher graph queries with Spark Catalyst
Compiling openCypher graph queries with Spark CatalystCompiling openCypher graph queries with Spark Catalyst
Compiling openCypher graph queries with Spark CatalystGábor Szárnyas
 
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...Gábor Szárnyas
 
Sharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph QueriesSharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph QueriesGábor Szárnyas
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsGábor Szárnyas
 
IncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph QueriesIncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph QueriesGábor Szárnyas
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudGábor Szárnyas
 

Más de Gábor Szárnyas (14)

GraphBLAS: A linear algebraic approach for high-performance graph queries
GraphBLAS: A linear algebraic approach for high-performance graph queriesGraphBLAS: A linear algebraic approach for high-performance graph queries
GraphBLAS: A linear algebraic approach for high-performance graph queries
 
What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?
 
Mapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLMapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQL
 
Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesIncremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher Queries
 
Writing a Cypher Engine in Clojure
Writing a Cypher Engine in ClojureWriting a Cypher Engine in Clojure
Writing a Cypher Engine in Clojure
 
Learning Timed Automata with Cypher
Learning Timed Automata with CypherLearning Timed Automata with Cypher
Learning Timed Automata with Cypher
 
Időzített automatatanulás Cypherrel
Időzített automatatanulás CypherrelIdőzített automatatanulás Cypherrel
Időzített automatatanulás Cypherrel
 
Parsing process
Parsing processParsing process
Parsing process
 
Compiling openCypher graph queries with Spark Catalyst
Compiling openCypher graph queries with Spark CatalystCompiling openCypher graph queries with Spark Catalyst
Compiling openCypher graph queries with Spark Catalyst
 
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
 
Sharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph QueriesSharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph Queries
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
 
IncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph QueriesIncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph Queries
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the Cloud
 

Último

Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfRagavanV2
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projectssmsksolar
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 

Último (20)

Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 

An early look at the LDBC Social Network Benchmark's Business Intelligence workload

  • 1. An early look at the LDBC Social Network Benchmark’s Business Intelligence workload Gábor Szárnyas, Arnau Prat-Pérez, Alex Averbuch, József Marton, Marcus Paradies, Moritz Kaufmann, Orri Erling, Peter Boncz, Vlad Haprian, János Benjamin Antal GRADES-NDA @ SIGMOD Houston, TX
  • 2. 2 Linked Data Benchmark Council LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark results for graph data management SW. LDBC’s Social Network Benchmark is an industrial and academic initiative, formed by principal actors in the field of graph-like data management.
  • 3. 3 LDBC timeline 2012 2013 2014 2015 2016 2017 2018 30+ papers in total, including G-CORE (presented on Thursday)
  • 5. 5 BI global queries Graph processing landscape Interactive Graphalytics local queries computations Example: “Recently liked by friends” MATCH (u:User {id: $uID})-[:FRIEND]-(f:User)-[l:LIKES]->(p:Post) RETURN f, p ORDER BY l.timestamp DESC LIMIT 10 frequent upd.limited data
  • 6. 6 Graph processing landscape Interactive Graphalytics local queries computations BI global queries frequent upd.limited data Example: “One-sided friendships” MATCH (u1:User)-[:FRIEND]-(u2:User)-[l:LIKES]->(p:Post), (u1)-[:AUTHOR_OF]->(p) WITH u1, u2, count(l) AS likes WHERE likes > 10 AND NOT (u1)-[:LIKES]->(:Post)<-[:AUTHOR_OF]-(u2) RETURN u1, u2 lots of data infreq. upd.
  • 7. 7 Graph processing landscape Interactive Graphalytics BI local queries global queries computations frequent upd.limited data lots of data infreq. upd. Example: “Find the most central individuals.” BFS breadth-first search LCC local clustering coefficient PR PageRank SSSP single-source shortest path CDLP community detection by label propagation WCC weakly connected components all data no upd.
  • 8. 8 Graph processing landscape all data no upd. lots of upd.limited data lots of data few upd. Interactive Graphalytics BI local queries global queries computations
  • 9. 9 Business Intelligence amount of data accessed expectedexecutiontime Interactive LDBC benchmarks at a glance Graphalytics
  • 10. 10 Business Intelligence amount of data accessed expectedexecutiontime Interactive LDBC benchmarks at a glance Graphalytics Social Network Benchmark
  • 11. 11 Social network graph DATAGEN: • Generate realistic graphs • Multiple scale factors (SFs) Nodes: • Collection attributes • Type inheritance Edges: • With attributes • Edges between similar nodes • Network of Persons • Reply tree of Posts/Comments
  • 12. 12 Business Intelligence amount of data accessed expectedexecutiontime Interactive LDBC benchmarks at a glance Graphalytics
  • 15. 15 Choke points • = a challenging aspect of query processing, a well-chosen difficulty • Allows systematic benchmark design Peter Boncz, Thomas Neumann, Orri Erling, TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark, TPCTC 2013
  • 16. 16 Q5: Top posters in a country 1. Find the top 100 Forums by members in a given Country. 2. For each member of the top 100 Forums, count their Posts in the top 100 Forums. 1 2 3 4 4 3 2 1 Forum to Country Country to Forum 57 s 0.3 s CP-2.1 Rich join order optimization Sparksee SF10
  • 17. 17 Choke points: optimizations • “Top-k pushdown” optimization • New in LDBC (not covered in TPC-H choke points)
  • 18. 18 Q22: International dialog For each p1-p2 pair, calculate score and get top pair (w/ tie-break) + 4 if p1 replied to p2 + 1 if p2 replied to p1 +15 if p1 knows p2 +10 if p1 liked p2’s msg + 1 if p2 liked p1’s msg = max. 31 in total
  • 19. 19 Q22: International dialog Avoiding full Cartesian product with Top-k pushdown: Example #1: • There are k pairs with maximum points (31). • A pair cannot possibly achieve max. points  prune Example #2: • There are k pairs with at least 20 points. • A pair fails the condition for 15 points  prune + 4 if p1 replied + 1 if p2 replied +15 if p1 knows p2 +10 if p1 liked + 1 if p2 liked = max. 31 in total
  • 20. 20 Q16: Experts in social circle • CP-1.3 [QOPT] Top-k pushdown • CP-7.1 [QEXE] Path pattern reuse • CP-7.2 [QOPT] Cardinality estimation of transitive paths • CP-7.3 [QEXE] Execution of a transitive step Baseline 29 s Top-k 27 s Top-k + Path pattern reuse 15 s Sparksee run times on SF10
  • 21. 21 Language choke points New choke points to cover language features. • CP-8.1: Complex patterns • CP-8.2: Complex aggregations • CP-8.3: Ranking-style queries • CP-8.4: Query composition • CP-8.5: Dates and times • CP-8.6: Handling paths
  • 22. 22 Language choke points New choke points to cover language features. • CP-8.1: Complex patterns • CP-8.2: Complex aggregations • CP-8.3: Ranking-style queries • CP-8.4: Query composition • CP-8.5: Dates and times • CP-8.6: Handling paths Q22: select top pair for each city1 “LIMIT 1” not sufficient PostgreSQL: rank()
  • 23. 23 Language choke points New choke points to cover language features. • CP-8.1: Complex patterns • CP-8.2: Complex aggregations • CP-8.3: Ranking-style queries • CP-8.4: Query composition • CP-8.5: Dates and times • CP-8.6: Handling paths Q5: top 100 forums (Important feature of G-CORE.)
  • 24. 24 Language choke points New choke points to cover language features. • CP-8.1: Complex patterns • CP-8.2: Complex aggregations • CP-8.3: Ranking-style queries • CP-8.4: Query composition • CP-8.5: Dates and times • CP-8.6: Handling paths Q1: aggregate for each month “Datetime” features: • SQL  • SPARQL  • Cypher: recently added 
  • 25. 25 Language choke points New choke points to cover language features. • CP-8.1: Complex patterns • CP-8.2: Complex aggregations • CP-8.3: Ranking-style queries • CP-8.4: Query composition • CP-8.5: Dates and times • CP-8.6: Handling paths
  • 26. 26 Q25: Weighted interaction paths 1. Given two Persons, get all shortest paths on “knows” edges. 2. For each path, for each edge on the path, calculate a weight. 3. For each path, summarize weights. 4. Return paths, ordered by weights (desc). (Q25 covers 15 CPs, incl. all language-related ones – its SQL impl. is ~2500 chars)
  • 27. 27 CP-8.6 Handling paths 1. Path unwinding: “higher-order” queries Q25: weights based on additional pattern matching on path elements. 2. Matching semantics for paths • Homomorphism-based (walks) • Isomorphism-based • No-repeated-anything • No-repeated-edge semantics (trails) • No-repeated-node semantics (simple path) Q16: “social circle” – persons connected by an edge-unique paths of [x, y] hops 3. Regular path queries (RPQs) R. Angles et al., Foundations of Modern Query Languages for Graph Databases, ACM Computing Surveys, 2017
  • 29. 29 Implementing the BI workload High-level process 1. Generate data set 2. Implement loader 3. Implement driver adapter 4. Implement queries and validate Very time consuming process, but… • after 2 validated tools – still bugs in both implementations • after 3 validated tools – still ambiguities in the spec Validation 1. Generate validation set 2. Cross-validate for multiple SFs 3. Failure  fix issues and go to 2
  • 30. 30 Implementing the BI workload Cross-validation for implementations Cypher Neo4j 25/25 SQL PostgreSQL 25/25 Imperative Sparksee 25/25 SPARQL Stardog 24/25 PGQL Oracle Labs PGX 10/25 In progress: Spark SQL
  • 31. 31 Roadmap 1. Help industry adoption  get more benchmark results 2. Define updates on the graph. Necessitates complex dependency handling (SIGMOD’15 paper), and raises many design choices: • Affected types which nodes/edges? what distribution? • Nature of changes append-only vs. insert and delete • Granularity nodes/edges vs. attributes • Frequency of changes streaming vs. batch 3. Publish as a conference paper
  • 32. 32 Acknowledgements Gábor Szárnyas was partially supported by NSERC RGPIN-04573-16 (Canada) and the MTA-BME Lendület Cyber-Physical Systems Research Group (Hungary). DAMA-UPC research was supported by the grant TIN2017-89244-R from MINECO (Ministerio de Economia, Industria y Competitividad) and the recognition 2017SGR-856 (MACDA) from AGAUR (Generalitat de Catalunya). Sparsity thanks the EU H2020 for funding the Uniserver project (ICT-04-2015-688540). MTA-BME Lendület Cyber-Physical Systems Research Group Department of Measurement and Information Systems Department of Electrical and Computer Engineering