SlideShare una empresa de Scribd logo
1 de 48
Descargar para leer sin conexión
CIP2017-06-18
Multiple Graphs
Presentation at oCIG 8 on April 11, 2018
Stefan Plantikow, Andrés Taylor, Petra Selmer (Neo4j)
History
● Started working on multiple graphs early 2017
● Parallel work in LDBC query language task force
● Built first non-prototype proposal for oCIM 3 in Nancy (Nov 2017)
● Lots of great feedback but outcome incomplete
● Inspired by Nancy we started to reconstruct the MG proposal
○ Starting from first principles (data model, execution model, language semantics)
○ Incrementally add orthogonal concern
○ Aim for simple, minimal design
○ Reduce scope of work
● Today: Primer for core of new proposal draft
Outline
● Multiple property graphs model
● Query processing model (Catalog and working graph)
● Working with multiple graphs (Matching, updating, returning graphs)
● Graph operations (Graph projection and graph union)
● Updating the catalog
● Examples
● Discussion (Status, timeline, next steps)
Why multiple graphs?
- Combining and transforming graphs from multiple sources
- Versioning, snapshotting, computing difference graphs
- Graph views for access control
- Provisioning applications with tailored graph views
- Shaping and integrating heterogeneous graph data
- Roll-up and drill-down at different levels of detail
Graph
Management
Graph
Modeling
4
Multiple Property Graph Model
Cypher today: (Single) property graph model
Graph Database System
(e.g. a cluster)
The (single) Graph
Application Server
Client 1
Client 2
Client 3
6
Cypher 10: Multiple property graphs model
Graph Database System
(e.g. a cluster)
Multiple Graphs
Application Server
Client 1
Client 2
Client 3
7
Classic Property Graph Model
● A single graph
● Labeled nodes with properties
● Typed, directed relationships with properties
8
Multiple Property Graphs Model
● Multiple graphs
● Labeled nodes with properties and typed, directed
relationships with properties existing in a single graph
● The nodes of a relationship R must be part
of the same graph as R; i.e. each graph is independent
● The graph(entity) function returns the graph id of the entity
○ An entity (a node or a relationship) in Cypher 10 needs to track its graph
(to differentiate whether or not two entities are from the same graph)
○ graph(entity) is a value that uniquely identifies a graph within the
implementation
○ This is a two-way process: a graph will also be aware of the entities it contains
● The id(entity) function returns the (entity) id of entity in graph(entity)
○ id(entity) is a value that uniquely identifies an entity within its graph and types
(i.e. nodes and relationships do not share the id space)
○ The format of ids is implementation specific
○ This does not change between Cypher 9 and Cypher 10
Graph and entity ids
9
Id validity and comparability
10
Validity of ids describes the duration for which an id is not assigned to another object.
Validity of graph and entity ids are only guaranteed for the duration of a query!
Validity of entity ids is only guaranteed from the matching of an entity to the consumption of a
query result. In other words: for the duration of the query.
Validity, entity ids are guaranteed to not be re-used per graph for the duration of the query.
These are the minimal lifetime guarantees mandated by Cypher 10.
Implementations may provide extended validity guarantees as they see fit.
Comparability of ids
Graph and entity ids are suggested to be comparable values,
i.e. can be compared with <, <=, =, >=, >
Query Processing Model
Query processor
12
Query Processor = Query Processing Service (e.g. a single server or a cluster)
G1 G2
G3
Catalog
Graph catalog
● A graph catalog maps graph names to graphs
● A qualified graph name is an (optionally dotted) identifier
● A graph name may be contextual; i.e. only work inside a specific
○ server
○ database
○ ...
● There are no further restrictions placed on graphs; e.g. graphs may be
○ mutable or immutable
○ exist only for a limited time (session, transaction, ...)
○ visible to all or only some of the users of the system
○ ...
13
Syntax for qualified graph names
● Qualified graph name syntax follows function name syntax
● Parts
○ graph namespace (optional)
○ graph name
● Syntax
○ foo.bar.baz
○ `foo`.`bar` `foo.bar`
○ `foo.bar`.`baz.grok` (namespace foo.bar, graph baz.grok)
○ Number of periods can be 0 to *
○ Namespace and name separation is implementation-dependent
○ Last part is always a suffix of the graph name
14
1. A clause is always operating within the context of a working graph
a. by reading from it
b. by updating it
2. During the course of executing a query
a. the working graph may be assigned by name
(i.e. assigned to a graph from the catalog)
b. the working graph may be assigned by graph projection
(i.e. assigned by creating a new working graph)
c. the working graph may be returned as a query result
d. the graph id of the current working graph may be obtained using the graph() function
Working graph
15
Initial working graph
● Determined by Query Processor
● This allows for implementation freedom
○ This includes having no initial working graph
○ Queries that need an initial working graph fail if no working graph has been set
● Example
○ In Neo4j: Upon establishing a session via Bolt, an initial working graph to be used by
subsequent queries is established (perhaps per user etc.)
○ In CAPS: Cypher queries are always executed explicitly against a graph which implicitly
becomes the initial working graph
16
Working graph operations
● Assign working graph by name (from catalog lookup)
○ Access multiple graphs within one query
● Assign working graph by graph projection (next section)
○ Consolidate data from one or more data sets into one graph
● Return (working) graph from query
○ Construct temporary graph via multiple update queries
(e.g. during interactive workflow)
○ Return (export) graphs (e.g. system graph)
17
Working with multiple graphs
// Set the working graph to the graph foo in the catalog for reading
FROM foo
// Set the working graph to the graph foo in the catalog for updating
UPDATE foo
// Change between reading and updating
FROM GRAPH
UPDATE GRAPH
In any case,
● Does not consume the current cardinality
● Retains all variable bindings
Select working graph
19
Example: Reading from multiple graphs
Which friends to invite to my next dinner party?
[1] FROM social_graph
[2] MATCH (a:Person {ssn: $ssn})-[:FRIEND]-(friend)
[3] FROM salary_graph
[4] MATCH (f:Entity {ssn: friend.ssn})<-[:INCOME]-(event)
[5] WHERE $startDate < event.date < $endDate
[6] RETURN friend.name, sum(event.amount) as incomes
20
Matching with multiple graphs
FROM persons
MATCH (a:Person)
FROM cars
MATCH (b:Car)-[:OWNED_BY]->(a) // Does NOT match
Updating multiple graphs
FROM persons
MATCH (a:Person)
UPDATE cars
CREATE (b:Car)-[:OWNED_BY]->(a) // Error
Copy patterns
How to copy data between graphs?
FROM persons
MATCH (a:Person)
UPDATE cars
CREATE (b:Car)-[:OWNED_BY]->(x COPY OF a) // This works
● COPY OF node Copies labels, properties
● COPY OF rel Copies relationship type, properties (ignores start, end!)
Also usable in matching: MATCH (COPY OF a) ...
22
Returning a graph
● Cypher 10 queries may return a graph
● Three variations of consuming a graph returned by a query
a. Deliver graph to client
b. Pass graph to another query; e.g. within a subquery (not covered in these slides)
c. Store graph in the catalog; i.e. registering a returned graph under a new name
● Delivering a result graph from the query to the client is always by value
a. All nodes with their labels and properties
b. All relationships with their relationship type, start node, end node, and their properties
c. Result graph is completely independent from graphs managed by the query processor
23
Syntax for returning a graph
// Returns the working graph
...
RETURN GRAPH
// Removing the current cardinality while keeping the working graph
...
WITH GRAPH
...
// Return a graph from the catalog
FROM graphName
RETURN GRAPH
24
Graph operations
Graph operations
● Graph construction
● Common graph union
● Disjoint graph union
Graph construction
Graph construction dynamically constructs a new working graph
● for querying, storing in the catalog, later updating
● using entities from other graphs (this is called cloning)
Simple example
MATCH (a)-[:KNOWS {from: "Berlin"}]->(b)
CONSTRUCT
CLONE a AS x, b AS y
CREATE (x)-[:MET_IN_BERLIN]->(y)
RETURN GRAPH
Cloning nodes
Take an original entity a and create a representative clone c in the constructed graph
MATCH (a)
CONSTRUCT
CLONE a AS c
// properties(a) = properties(c) AND labels(a) = labels(c)
// but: a <> c because graph(a) <> graph(c)
...
Cloning the same original multiple times still only creates a single clone
MATCH (root)-[:PARENT_OF*]->(child)
CONSTRUCT
CLONE root, child // CLONE can shadow variables like WITH
MERGE (root)-[:ANCESTOR_OF]->(child)
...
Cloning relationships
Cloning a relationship r implicitly clones startNode(r) and endNode(r)
Cloning is powerful enough to reconstruct whole subgraphs!
FROM car_owners
MATCH (a:Person {city: "Berlin"})-[r:OWNERS]->(c:Car)
CONSTRUCT
CLONE r
// cars and their owners from berlin
RETURN GRAPH
Clone paths and complex values
● Cloning a path p clones all nodes(p) and rels(p)
● More generally, cloning a value clones all contained entities!
MATCH (a:Person), (b:Person)
MATCH p = (a)-[:KNOWS*]-(b)
CONSTRUCT
CLONE a, b, p
CREATE (a)<-[:START]-(x:Path {steps: size(p)})-[:END]->(b)
RETURN GRAPH
Cloning graphs
Clone all entities from given graphs into a new graph
CONSTRUCT ON social_graph
Clone all entities from working graph into new graph
CONSTRUCT ON GRAPH
Cloning a graph and a contained entity still only creates a single clone for the entity
MATCH (a) WITH * LIMIT 1
CONSTRUCT ON GRAPH
CLONE a AS x
// creates only one x for the given a in the whole graph
// (re-using clones created by CONSTRUCT ON GRAPH)
Finding clones
Sometimes it may be necessary to find clones instead of creating them
MATCH (x)-[:OWNS]->(y)
CONSTRUCT
CLONE x AS c
CREATE (x)-[:OWNS]->(COPY OF y)
...
WITH x, count(*) AS num_clones // lost x here!
MATCH (CLONE OF x) // found the clone of x again
...
General form of CONSTRUCT
CONSTRUCT
[ON GRAPH]
[ON graph1, graph2, ...]
[CLONE expr [AS alias], variable, ... | * ]
<updating clause 1>
<updating clause 2>
...
WITH ... | WITH GRAPH | RETURN ... | RETURN GRAPH
Provenance tracking
So far, we've only looked at one step of graph construction. What about?
[1] MATCH (a) // single node a
[2] CONSTRUCT
[3] CLONE a AS c // c <> a, it is a clone of a
[4] WITH a, c
[5] CONSTRUCT
[6] CLONE a AS x, c AS y // x <> a is a clone of a, y <> c is a clone of c
[7] ...
But x = y (i.e.they are the same clone). This is called provenance tracking.
It's useful for updatable views and graph union.
Updatable views
CONSTRUCT
...
// build the view
...
WITH GRAPH
UPDATE GRAPH
...
// update the view
...
Updates during graph construction
General rule: Maintain logical consistency regarding provenance tracking
CREATE Create in the constructed graph
MERGE Merge in the constructed graph
SET|REMOVE Not allowed on clones (for now)
[DETACH] DELETE Delete from the constructed graph
Updates after graph construction
General rule: Only allow updates that can be propagated safely
CREATE Not allowed
MERGE
Not allowed
SET|REMOVE Update base data
[DETACH] DELETE Not allowed
Common graph union
Clones of same original node are cloned, other entities are copied
CONSTRUCT
...
RETURN GRAPH
UNION
CONSTRUCT
...
RETURN GRAPH
Disjoint graph union
All entities (including clones) are copied (i.e. this breaks provenance tracking)
CONSTRUCT
...
RETURN GRAPH
UNION ALL
CONSTRUCT
...
RETURN GRAPH
Updating the catalog
CREATE GRAPH foo // Create new graph 'foo'
DELETE GRAPH foo // Delete graph 'foo'
COPY foo TO bar // Copy graph 'foo' with schema
RENAME foo TO bar // Rename graph 'foo' to 'bar'
TRUNCATE foo // Remove data but keep schema in 'foo'
Catalog side-effects
41
// create a new base graph in the catalog
CREATE GRAPH myGraph
// copy content of graph as new graph into the catalog
CREATE GRAPH bar { <some query that returns a graph> }
These are always top-level clauses; i.e. they may not be nested
Create new base graph (snapshot) in catalog
42
Copy data from one base graph to another
We can accomplish this by using DML and hand crafting the queries
MATCH (a)-[r]->(b)
UPDATE foo
CREATE ...
MERGE ...
SET ...
43
Schema updates
Need to select the graph
UPDATE foo // Different keyword?
CREATE CONSTRAINT ...
CREATE INDEX ...
...
44
Catalog function
catalog() Current working graph name in catalog or NULL
catalog(graph(e)) Name of graph of e in catalog or NULL
Summary
Status
● Working on two CIPs
○ Multiple graphs CIP (This talk, also: larger examples)
○ Query composition CIP
○ Features: views, nested subqueries, equivalence, more graph operations
● Future work: More elaborate updatable views, graph aggregation
● Goal is to finalize draft of multiple graphs CIP for next oCIM
● Design heavily influenced by feedback from CAPS project and other
implementation concerns
Feedback
Please feedback by commenting on
● slides or
● CIP or
● openCypher implementers slack
Thanks!

Más contenido relacionado

La actualidad más candente

Answers+of+C+sample+exam.docx
Answers+of+C+sample+exam.docxAnswers+of+C+sample+exam.docx
Answers+of+C+sample+exam.docx
ismailaboshatra
 
Performing calculations using java script
Performing calculations using java scriptPerforming calculations using java script
Performing calculations using java script
Jesus Obenita Jr.
 

La actualidad más candente (20)

GATE Computer Science Solved Paper 2004
GATE Computer Science Solved Paper 2004GATE Computer Science Solved Paper 2004
GATE Computer Science Solved Paper 2004
 
2 R Tutorial Programming
2 R Tutorial Programming2 R Tutorial Programming
2 R Tutorial Programming
 
The dag representation of basic blocks
The dag representation of basic blocksThe dag representation of basic blocks
The dag representation of basic blocks
 
Data structure manual
Data structure manualData structure manual
Data structure manual
 
Oops recap
Oops recapOops recap
Oops recap
 
R Data Visualization Tutorial: Bar Plots
R Data Visualization Tutorial: Bar PlotsR Data Visualization Tutorial: Bar Plots
R Data Visualization Tutorial: Bar Plots
 
CS8391 Data Structures Part B Questions Anna University
CS8391 Data Structures Part B Questions Anna UniversityCS8391 Data Structures Part B Questions Anna University
CS8391 Data Structures Part B Questions Anna University
 
Introduction to MATLAB
Introduction to MATLABIntroduction to MATLAB
Introduction to MATLAB
 
Answers+of+C+sample+exam.docx
Answers+of+C+sample+exam.docxAnswers+of+C+sample+exam.docx
Answers+of+C+sample+exam.docx
 
Essence of the iterator pattern
Essence of the iterator patternEssence of the iterator pattern
Essence of the iterator pattern
 
Performing calculations using java script
Performing calculations using java scriptPerforming calculations using java script
Performing calculations using java script
 
N-Queens Combinatorial Problem - Polyglot FP for fun and profit - Haskell and...
N-Queens Combinatorial Problem - Polyglot FP for fun and profit - Haskell and...N-Queens Combinatorial Problem - Polyglot FP for fun and profit - Haskell and...
N-Queens Combinatorial Problem - Polyglot FP for fun and profit - Haskell and...
 
statistical computation using R- an intro..
statistical computation using R- an intro..statistical computation using R- an intro..
statistical computation using R- an intro..
 
Data Visualization With R: Introduction
Data Visualization With R: IntroductionData Visualization With R: Introduction
Data Visualization With R: Introduction
 
Data Visualization With R
Data Visualization With RData Visualization With R
Data Visualization With R
 
R Markdown Tutorial For Beginners
R Markdown Tutorial For BeginnersR Markdown Tutorial For Beginners
R Markdown Tutorial For Beginners
 
A DSTL to bridge concrete and abstract syntax
A DSTL to bridge concrete and abstract syntaxA DSTL to bridge concrete and abstract syntax
A DSTL to bridge concrete and abstract syntax
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming Applications
 
Graphics in C++
Graphics in C++Graphics in C++
Graphics in C++
 
8. R Graphics with R
8. R Graphics with R8. R Graphics with R
8. R Graphics with R
 

Similar a Multiple graphs in openCypher

breaking_dependencies_the_solid_principles__klaus_iglberger__cppcon_2020.pdf
breaking_dependencies_the_solid_principles__klaus_iglberger__cppcon_2020.pdfbreaking_dependencies_the_solid_principles__klaus_iglberger__cppcon_2020.pdf
breaking_dependencies_the_solid_principles__klaus_iglberger__cppcon_2020.pdf
VishalKumarJha10
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
J Singh
 
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit
 

Similar a Multiple graphs in openCypher (20)

Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
 
Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018
 
ML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talk
 
The 2nd graph database in sv meetup
The 2nd graph database in sv meetupThe 2nd graph database in sv meetup
The 2nd graph database in sv meetup
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
From Cypher 9 to GQL: Conceptual overview of multiple named graphs and compos...
From Cypher 9 to GQL: Conceptual overview of multiple named graphs and compos...From Cypher 9 to GQL: Conceptual overview of multiple named graphs and compos...
From Cypher 9 to GQL: Conceptual overview of multiple named graphs and compos...
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with Go
 
Graph db - Pramati Technologies [Meetup]
Graph db - Pramati Technologies [Meetup]Graph db - Pramati Technologies [Meetup]
Graph db - Pramati Technologies [Meetup]
 
Graph computation
Graph computationGraph computation
Graph computation
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PC
 
breaking_dependencies_the_solid_principles__klaus_iglberger__cppcon_2020.pdf
breaking_dependencies_the_solid_principles__klaus_iglberger__cppcon_2020.pdfbreaking_dependencies_the_solid_principles__klaus_iglberger__cppcon_2020.pdf
breaking_dependencies_the_solid_principles__klaus_iglberger__cppcon_2020.pdf
 
PGQL: A Language for Graphs
PGQL: A Language for GraphsPGQL: A Language for Graphs
PGQL: A Language for Graphs
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
 
Code-first GraphQL Server Development with Prisma
Code-first  GraphQL Server Development with PrismaCode-first  GraphQL Server Development with Prisma
Code-first GraphQL Server Development with Prisma
 
GraphQL Meetup Bangkok 3.0
GraphQL Meetup Bangkok 3.0GraphQL Meetup Bangkok 3.0
GraphQL Meetup Bangkok 3.0
 
Engineering Student MuleSoft Meetup#6 - Basic Understanding of DataWeave With...
Engineering Student MuleSoft Meetup#6 - Basic Understanding of DataWeave With...Engineering Student MuleSoft Meetup#6 - Basic Understanding of DataWeave With...
Engineering Student MuleSoft Meetup#6 - Basic Understanding of DataWeave With...
 
openGL basics for sample program (1).ppt
openGL basics for sample program (1).pptopenGL basics for sample program (1).ppt
openGL basics for sample program (1).ppt
 
openGL basics for sample program.ppt
openGL basics for sample program.pptopenGL basics for sample program.ppt
openGL basics for sample program.ppt
 
Groovy On Trading Desk (2010)
Groovy On Trading Desk (2010)Groovy On Trading Desk (2010)
Groovy On Trading Desk (2010)
 
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
 

Más de openCypher

Más de openCypher (20)

Learning Timed Automata with Cypher
Learning Timed Automata with CypherLearning Timed Automata with Cypher
Learning Timed Automata with Cypher
 
Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesIncremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher Queries
 
Formal semantics for Cypher queries and updates
Formal semantics for Cypher queries and updatesFormal semantics for Cypher queries and updates
Formal semantics for Cypher queries and updates
 
Cypher.PL: an executable specification of Cypher semantics
Cypher.PL: an executable specification of Cypher semanticsCypher.PL: an executable specification of Cypher semantics
Cypher.PL: an executable specification of Cypher semantics
 
Micro-Servicing Linked Data
Micro-Servicing Linked DataMicro-Servicing Linked Data
Micro-Servicing Linked Data
 
Graph abstraction
Graph abstractionGraph abstraction
Graph abstraction
 
Cypher for Gremlin
Cypher for GremlinCypher for Gremlin
Cypher for Gremlin
 
Comparing PGQL, G-Core and Cypher
Comparing PGQL, G-Core and CypherComparing PGQL, G-Core and Cypher
Comparing PGQL, G-Core and Cypher
 
Eighth openCypher Implementers Group Meeting: Status Update
Eighth openCypher Implementers Group Meeting: Status UpdateEighth openCypher Implementers Group Meeting: Status Update
Eighth openCypher Implementers Group Meeting: Status Update
 
Cypher for Gremlin
Cypher for GremlinCypher for Gremlin
Cypher for Gremlin
 
Supporting dates and times in Cypher
Supporting dates and times in CypherSupporting dates and times in Cypher
Supporting dates and times in Cypher
 
Seventh openCypher Implementers Group Meeting: Status Update
Seventh openCypher Implementers Group Meeting: Status UpdateSeventh openCypher Implementers Group Meeting: Status Update
Seventh openCypher Implementers Group Meeting: Status Update
 
Academic research on graph processing: connecting recent findings to industri...
Academic research on graph processing: connecting recent findings to industri...Academic research on graph processing: connecting recent findings to industri...
Academic research on graph processing: connecting recent findings to industri...
 
Property Graphs with Time
Property Graphs with TimeProperty Graphs with Time
Property Graphs with Time
 
Cypher.PL: Executable Specification of Cypher written in Prolog
Cypher.PL: Executable Specification of Cypher written in PrologCypher.PL: Executable Specification of Cypher written in Prolog
Cypher.PL: Executable Specification of Cypher written in Prolog
 
Use case: processing multiple graphs
Use case: processing multiple graphsUse case: processing multiple graphs
Use case: processing multiple graphs
 
openCypher Technology Compatibility Kit (TCK)
openCypher Technology Compatibility Kit (TCK)openCypher Technology Compatibility Kit (TCK)
openCypher Technology Compatibility Kit (TCK)
 
Cypher Editor in the Web
Cypher Editor in the WebCypher Editor in the Web
Cypher Editor in the Web
 
The inGraph project and incremental evaluation of Cypher queries
The inGraph project and incremental evaluation of Cypher queriesThe inGraph project and incremental evaluation of Cypher queries
The inGraph project and incremental evaluation of Cypher queries
 
Formal Specification of Cypher
Formal Specification of CypherFormal Specification of Cypher
Formal Specification of Cypher
 

Último

Último (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Multiple graphs in openCypher

  • 1. CIP2017-06-18 Multiple Graphs Presentation at oCIG 8 on April 11, 2018 Stefan Plantikow, Andrés Taylor, Petra Selmer (Neo4j)
  • 2. History ● Started working on multiple graphs early 2017 ● Parallel work in LDBC query language task force ● Built first non-prototype proposal for oCIM 3 in Nancy (Nov 2017) ● Lots of great feedback but outcome incomplete ● Inspired by Nancy we started to reconstruct the MG proposal ○ Starting from first principles (data model, execution model, language semantics) ○ Incrementally add orthogonal concern ○ Aim for simple, minimal design ○ Reduce scope of work ● Today: Primer for core of new proposal draft
  • 3. Outline ● Multiple property graphs model ● Query processing model (Catalog and working graph) ● Working with multiple graphs (Matching, updating, returning graphs) ● Graph operations (Graph projection and graph union) ● Updating the catalog ● Examples ● Discussion (Status, timeline, next steps)
  • 4. Why multiple graphs? - Combining and transforming graphs from multiple sources - Versioning, snapshotting, computing difference graphs - Graph views for access control - Provisioning applications with tailored graph views - Shaping and integrating heterogeneous graph data - Roll-up and drill-down at different levels of detail Graph Management Graph Modeling 4
  • 6. Cypher today: (Single) property graph model Graph Database System (e.g. a cluster) The (single) Graph Application Server Client 1 Client 2 Client 3 6
  • 7. Cypher 10: Multiple property graphs model Graph Database System (e.g. a cluster) Multiple Graphs Application Server Client 1 Client 2 Client 3 7
  • 8. Classic Property Graph Model ● A single graph ● Labeled nodes with properties ● Typed, directed relationships with properties 8 Multiple Property Graphs Model ● Multiple graphs ● Labeled nodes with properties and typed, directed relationships with properties existing in a single graph ● The nodes of a relationship R must be part of the same graph as R; i.e. each graph is independent
  • 9. ● The graph(entity) function returns the graph id of the entity ○ An entity (a node or a relationship) in Cypher 10 needs to track its graph (to differentiate whether or not two entities are from the same graph) ○ graph(entity) is a value that uniquely identifies a graph within the implementation ○ This is a two-way process: a graph will also be aware of the entities it contains ● The id(entity) function returns the (entity) id of entity in graph(entity) ○ id(entity) is a value that uniquely identifies an entity within its graph and types (i.e. nodes and relationships do not share the id space) ○ The format of ids is implementation specific ○ This does not change between Cypher 9 and Cypher 10 Graph and entity ids 9
  • 10. Id validity and comparability 10 Validity of ids describes the duration for which an id is not assigned to another object. Validity of graph and entity ids are only guaranteed for the duration of a query! Validity of entity ids is only guaranteed from the matching of an entity to the consumption of a query result. In other words: for the duration of the query. Validity, entity ids are guaranteed to not be re-used per graph for the duration of the query. These are the minimal lifetime guarantees mandated by Cypher 10. Implementations may provide extended validity guarantees as they see fit. Comparability of ids Graph and entity ids are suggested to be comparable values, i.e. can be compared with <, <=, =, >=, >
  • 12. Query processor 12 Query Processor = Query Processing Service (e.g. a single server or a cluster) G1 G2 G3 Catalog
  • 13. Graph catalog ● A graph catalog maps graph names to graphs ● A qualified graph name is an (optionally dotted) identifier ● A graph name may be contextual; i.e. only work inside a specific ○ server ○ database ○ ... ● There are no further restrictions placed on graphs; e.g. graphs may be ○ mutable or immutable ○ exist only for a limited time (session, transaction, ...) ○ visible to all or only some of the users of the system ○ ... 13
  • 14. Syntax for qualified graph names ● Qualified graph name syntax follows function name syntax ● Parts ○ graph namespace (optional) ○ graph name ● Syntax ○ foo.bar.baz ○ `foo`.`bar` `foo.bar` ○ `foo.bar`.`baz.grok` (namespace foo.bar, graph baz.grok) ○ Number of periods can be 0 to * ○ Namespace and name separation is implementation-dependent ○ Last part is always a suffix of the graph name 14
  • 15. 1. A clause is always operating within the context of a working graph a. by reading from it b. by updating it 2. During the course of executing a query a. the working graph may be assigned by name (i.e. assigned to a graph from the catalog) b. the working graph may be assigned by graph projection (i.e. assigned by creating a new working graph) c. the working graph may be returned as a query result d. the graph id of the current working graph may be obtained using the graph() function Working graph 15
  • 16. Initial working graph ● Determined by Query Processor ● This allows for implementation freedom ○ This includes having no initial working graph ○ Queries that need an initial working graph fail if no working graph has been set ● Example ○ In Neo4j: Upon establishing a session via Bolt, an initial working graph to be used by subsequent queries is established (perhaps per user etc.) ○ In CAPS: Cypher queries are always executed explicitly against a graph which implicitly becomes the initial working graph 16
  • 17. Working graph operations ● Assign working graph by name (from catalog lookup) ○ Access multiple graphs within one query ● Assign working graph by graph projection (next section) ○ Consolidate data from one or more data sets into one graph ● Return (working) graph from query ○ Construct temporary graph via multiple update queries (e.g. during interactive workflow) ○ Return (export) graphs (e.g. system graph) 17
  • 19. // Set the working graph to the graph foo in the catalog for reading FROM foo // Set the working graph to the graph foo in the catalog for updating UPDATE foo // Change between reading and updating FROM GRAPH UPDATE GRAPH In any case, ● Does not consume the current cardinality ● Retains all variable bindings Select working graph 19
  • 20. Example: Reading from multiple graphs Which friends to invite to my next dinner party? [1] FROM social_graph [2] MATCH (a:Person {ssn: $ssn})-[:FRIEND]-(friend) [3] FROM salary_graph [4] MATCH (f:Entity {ssn: friend.ssn})<-[:INCOME]-(event) [5] WHERE $startDate < event.date < $endDate [6] RETURN friend.name, sum(event.amount) as incomes 20
  • 21. Matching with multiple graphs FROM persons MATCH (a:Person) FROM cars MATCH (b:Car)-[:OWNED_BY]->(a) // Does NOT match Updating multiple graphs FROM persons MATCH (a:Person) UPDATE cars CREATE (b:Car)-[:OWNED_BY]->(a) // Error
  • 22. Copy patterns How to copy data between graphs? FROM persons MATCH (a:Person) UPDATE cars CREATE (b:Car)-[:OWNED_BY]->(x COPY OF a) // This works ● COPY OF node Copies labels, properties ● COPY OF rel Copies relationship type, properties (ignores start, end!) Also usable in matching: MATCH (COPY OF a) ... 22
  • 23. Returning a graph ● Cypher 10 queries may return a graph ● Three variations of consuming a graph returned by a query a. Deliver graph to client b. Pass graph to another query; e.g. within a subquery (not covered in these slides) c. Store graph in the catalog; i.e. registering a returned graph under a new name ● Delivering a result graph from the query to the client is always by value a. All nodes with their labels and properties b. All relationships with their relationship type, start node, end node, and their properties c. Result graph is completely independent from graphs managed by the query processor 23
  • 24. Syntax for returning a graph // Returns the working graph ... RETURN GRAPH // Removing the current cardinality while keeping the working graph ... WITH GRAPH ... // Return a graph from the catalog FROM graphName RETURN GRAPH 24
  • 26. Graph operations ● Graph construction ● Common graph union ● Disjoint graph union
  • 27. Graph construction Graph construction dynamically constructs a new working graph ● for querying, storing in the catalog, later updating ● using entities from other graphs (this is called cloning) Simple example MATCH (a)-[:KNOWS {from: "Berlin"}]->(b) CONSTRUCT CLONE a AS x, b AS y CREATE (x)-[:MET_IN_BERLIN]->(y) RETURN GRAPH
  • 28. Cloning nodes Take an original entity a and create a representative clone c in the constructed graph MATCH (a) CONSTRUCT CLONE a AS c // properties(a) = properties(c) AND labels(a) = labels(c) // but: a <> c because graph(a) <> graph(c) ... Cloning the same original multiple times still only creates a single clone MATCH (root)-[:PARENT_OF*]->(child) CONSTRUCT CLONE root, child // CLONE can shadow variables like WITH MERGE (root)-[:ANCESTOR_OF]->(child) ...
  • 29. Cloning relationships Cloning a relationship r implicitly clones startNode(r) and endNode(r) Cloning is powerful enough to reconstruct whole subgraphs! FROM car_owners MATCH (a:Person {city: "Berlin"})-[r:OWNERS]->(c:Car) CONSTRUCT CLONE r // cars and their owners from berlin RETURN GRAPH
  • 30. Clone paths and complex values ● Cloning a path p clones all nodes(p) and rels(p) ● More generally, cloning a value clones all contained entities! MATCH (a:Person), (b:Person) MATCH p = (a)-[:KNOWS*]-(b) CONSTRUCT CLONE a, b, p CREATE (a)<-[:START]-(x:Path {steps: size(p)})-[:END]->(b) RETURN GRAPH
  • 31. Cloning graphs Clone all entities from given graphs into a new graph CONSTRUCT ON social_graph Clone all entities from working graph into new graph CONSTRUCT ON GRAPH Cloning a graph and a contained entity still only creates a single clone for the entity MATCH (a) WITH * LIMIT 1 CONSTRUCT ON GRAPH CLONE a AS x // creates only one x for the given a in the whole graph // (re-using clones created by CONSTRUCT ON GRAPH)
  • 32. Finding clones Sometimes it may be necessary to find clones instead of creating them MATCH (x)-[:OWNS]->(y) CONSTRUCT CLONE x AS c CREATE (x)-[:OWNS]->(COPY OF y) ... WITH x, count(*) AS num_clones // lost x here! MATCH (CLONE OF x) // found the clone of x again ...
  • 33. General form of CONSTRUCT CONSTRUCT [ON GRAPH] [ON graph1, graph2, ...] [CLONE expr [AS alias], variable, ... | * ] <updating clause 1> <updating clause 2> ... WITH ... | WITH GRAPH | RETURN ... | RETURN GRAPH
  • 34. Provenance tracking So far, we've only looked at one step of graph construction. What about? [1] MATCH (a) // single node a [2] CONSTRUCT [3] CLONE a AS c // c <> a, it is a clone of a [4] WITH a, c [5] CONSTRUCT [6] CLONE a AS x, c AS y // x <> a is a clone of a, y <> c is a clone of c [7] ... But x = y (i.e.they are the same clone). This is called provenance tracking. It's useful for updatable views and graph union.
  • 35. Updatable views CONSTRUCT ... // build the view ... WITH GRAPH UPDATE GRAPH ... // update the view ...
  • 36. Updates during graph construction General rule: Maintain logical consistency regarding provenance tracking CREATE Create in the constructed graph MERGE Merge in the constructed graph SET|REMOVE Not allowed on clones (for now) [DETACH] DELETE Delete from the constructed graph
  • 37. Updates after graph construction General rule: Only allow updates that can be propagated safely CREATE Not allowed MERGE Not allowed SET|REMOVE Update base data [DETACH] DELETE Not allowed
  • 38. Common graph union Clones of same original node are cloned, other entities are copied CONSTRUCT ... RETURN GRAPH UNION CONSTRUCT ... RETURN GRAPH
  • 39. Disjoint graph union All entities (including clones) are copied (i.e. this breaks provenance tracking) CONSTRUCT ... RETURN GRAPH UNION ALL CONSTRUCT ... RETURN GRAPH
  • 41. CREATE GRAPH foo // Create new graph 'foo' DELETE GRAPH foo // Delete graph 'foo' COPY foo TO bar // Copy graph 'foo' with schema RENAME foo TO bar // Rename graph 'foo' to 'bar' TRUNCATE foo // Remove data but keep schema in 'foo' Catalog side-effects 41
  • 42. // create a new base graph in the catalog CREATE GRAPH myGraph // copy content of graph as new graph into the catalog CREATE GRAPH bar { <some query that returns a graph> } These are always top-level clauses; i.e. they may not be nested Create new base graph (snapshot) in catalog 42
  • 43. Copy data from one base graph to another We can accomplish this by using DML and hand crafting the queries MATCH (a)-[r]->(b) UPDATE foo CREATE ... MERGE ... SET ... 43
  • 44. Schema updates Need to select the graph UPDATE foo // Different keyword? CREATE CONSTRAINT ... CREATE INDEX ... ... 44
  • 45. Catalog function catalog() Current working graph name in catalog or NULL catalog(graph(e)) Name of graph of e in catalog or NULL
  • 47. Status ● Working on two CIPs ○ Multiple graphs CIP (This talk, also: larger examples) ○ Query composition CIP ○ Features: views, nested subqueries, equivalence, more graph operations ● Future work: More elaborate updatable views, graph aggregation ● Goal is to finalize draft of multiple graphs CIP for next oCIM ● Design heavily influenced by feedback from CAPS project and other implementation concerns
  • 48. Feedback Please feedback by commenting on ● slides or ● CIP or ● openCypher implementers slack Thanks!