Alexandru Iosup, Full Professor, Vrije Universiteit Amsterdam (VU Amsterdam)
Angela Bonifati, Full Professor of Computer Science, Université de Lyon
Hannes Voigt, Software Engineer, Neo4j
Injustice - Developers Among Us (SciFiDevCon 2024)
The Future is Big Graphs: A Community View on Graph Processing Systems
1. A panel discussion with
Angela Bonifati, Alexandru Iosup, and Hannes Voigt
The Future Is Big Graphs:
A Community View on Graph
Processing Systems
2. Graphs are everywhere!
Graphs provide a universal and simple
blueprint for how to look at the world and
make sense of it.
Tech-driving applications
= data science + multi-hop relationships
Everyone*
uses graphs!
*
not yet :-(
2
3. Graph analytics at scale
Multi-hop analysis faces combinatorial scaling problem: Every step deeper into the
graph multiplies the number of choices and cases to consider
Dealing with this technical challenge is not the typical business interest of a user.
Which challenges are ahead of us to ready graph processing systems for the future?
Challenges to overcome: Abstractions, Ecosystems, Performance
3
[Source: https://upload.wikimedia.org/wikipedia/commons/thumb/b/b5/Maze.svg]
4. Challenges require expertise of many different fields
● Computer systems
● Data management theory
● Data management systems
● Data analytics
● Visualization
● Computer human interaction
● ...
4
[collaborate by ArmOkay from the Noun Project]
5. Community View on Graph Processing System
● Dagstuhl Seminar 19491
● Big Graph Processing Systems
● December 1–6, 2019
● CACM article summarize
the findings of the seminar
5
https://cacm.acm.org/magazines/2021/9/255040-the-future-is-big-graphs/
6. Angela Bonifati
● Professor at Lyon 1 University and
the Liris CNRS lab in France
● Leads the Database group at
Lyon 1
● Adjunct Professor at the
University of Waterloo in Canada
● Co-authored several publications in
major venues of the data
management field, two books and
an invited paper in ACM Sigmod
Record 2018
● Program Chair of ACM Sigmod
2022 and the Chair of the EDBT
Executive Committee.
6
Hannes Voigt
● Software engineer at Neo4j
● PhD in computer science from
TU Dresden, Germany
● Works on graph query language
design and standardization
● Chairs the Expert Group on
GQL in the US standardization
organization INCITS and is a
US national expert in ISO/IEC
JTC1 SC32 Working Group 3
Alexandru Iosup
● Full professor at Vrije
Universiteit Amsterdam
● Chair of the Massivizing
Computer Systems research
group at the VU and of the
SPEC-RG Cloud group
● His work in distributed systems
and ecosystems has received
prestigious recognition, including
membership in the (Young)
Royal Academy of Arts and
Sciences of the Netherlands and
the 2016 Netherlands ICT
Researcher of the Year award.
was a professor at the Institute of Computer Science at University of Tartu, Estonia.
He passed away on March 25, 2020, at the age of 40.
Sherif Sakr
7. Abstractions
● Abstractions are widely used in computer science
and are fundamental for graph-oriented data to
incorporate the user’s conceptual and
domain-specific perspectives
● Several graph data models do exist (RDF,
Property Graphs and their variants etc.)
● The choice of the data model depends on the final
application
● Instead of choosing one data model, we
propose to leverage a lattice of data models
that are interoperable through mappings
● The lattice seamlessly characterizes
expressive power of the various data
models
● Graph operations and graph queries could
benefit from logical formalisms as to create
a “universal” data model
● Graph algebras need to be defined
alongside the standard query language in
order to respond to the needs of graph
processing systems in the near future
7
8. Ecosystems
Graph processing ecosystems are complex, coupling many diverse systems
● Characterize complex workflows, combining heterogeneous queries and
algorithms, managing and processing diverse datasets
● Develop standards as common technical foundation, thereby increasing the
mobility of applications, tooling, developers, users, and stakeholders
● Define a reference architecture for big graph processing and the discussion
around the design, development, and deployment of cloud computing solutions
● Beyond scale-up vs. scale-out: Automatically decide how to run a workload, on
what kind of heterogeneous infrastructure, to meet given SLAs
● Cope with dynamic and complex graph data streams beyond simple filtering,
projections, and aggregation
8
9. Performance
● Invent simple yet general, faithful
performance metrics (better: just one)
● From algorithm to pipeline to
(remote) workflows
● Create tractable benchmarks that produce
credible and reproducible results
● Understand result sensitivity to HPAD,
and (link to ML) predict performance and
port the predictions across platforms
● Tame hardware and software complexity →
system-under-test performance is difficult
to isolate, performance variability to tame
● Processes to curate, audit, and evolve
benchmarks → possibly through LDBC
● Scale workload generators
● Cover combined operations, with temporal,
spatial, and streaming aspects
● Explore trade-off specialization vs.
portability and interoperability
● Explore extreme specialization
● Explore standardizing graph libraries and
query languages
● Explore integrating graph processing into
cross-domain tools, e.g., ML, sim, DB
● A memex for big graph processing
9
10. Q: Pre-relational models represented data as graphs, too. Remember the CODASYL network
model? Is this all a big step back?
10
Q: So far data science blossomed quite well without challenges of “abstractions” and
“ecosystems” and such. A number of Python scripts do the job just fine, aren’t they?
Q: The article seems to favor views of some specific tech companies. Are there solution do not
require solving these challenges?
Discussion on Other Important Topics (Q&A)
11. The Future of Graph Processing Systems
If we want to leverage the power of graphs in breath, mere hacking won’t cut it!
Everyone to mastering all the technical details of scaling graph analytics won’t
either.
Vision: Have a common graph analytics application ready in 1 hour, including:
● Deployment in a specific ecosystem,
● Deployment from the laptop + digital notebook,
● The performance engineering that allows it to operate at scale
14
12. A panel discussion with
Angela Bonifati, Alexandru Iosup, and Hannes Voigt
The Future Is Big Graphs:
A Community View on Graph
Processing Systems
https://cacm.acm.org/magazines/2021/
9/255040-the-future-is-big-graphs/