The Neo4j Graph database was lacking a declarative query language.
We wanted to add a humane query language which is easy to read and understand. It borrows on other languages like SQL and SPARQL but brings it it's own flavor. Cypher uses ASCII ART to describe graph patterns that you're looking for.
We used Scala's parser combinator library in combination with functional approaches and lazy evaluation to develop the Cypher query language.
The talk describes the internals of the Cypher implementation.
7. (Neo4j) -[:IS_A]-> (Graph Database)
Lucene
Sharding 1 M/s
Master/
Index
Slave
LS
TRAVERSA
HIG S
H_A TE
VA RA
IL.
TEG
IN
PROVIDES
Server ACID TX
RUN LI
S_A CE
S
NS
ED
O _L
Ruby IK
ES_T
RU
JS E
MySQL
S
NS
_A
_O
SC AL
Clojure
NS
.net
N
RU
Mongo
embedded 34bn Heroku
Java Nodes
8. Homework
- so Pay Attention
• Go To http://bit.ly/geekout-cypher
• With your Cypher knowledge
• Add yourself to the graph
• Determine a path from you to me
• Share the console on Twitter by
Monday
• Win this AR-Drone
31. Give me: JOINS
SELECT skills.*, user_skill.*
FROM users
JOIN user_skill ON users.id = user_skill.user_id
JOIN skills ON user_skill.skill_id = skill.id
WHERE users.id = 1
START user = node(1)
MATCH user -[r:USER_SKILL]-> skill
RETURN skill, r
32. Give me:
Old, Influential Friends
START me = node(...)
MATCH (me) - [f:FRIEND] - (old_friend)
- [:FRIEND ] - (fof)
WHERE ({today}-f.begin) > 365*10
WITH old_friend, collect(fof.name) as names
WHERE length(names) > 100
RETURN old_friend, names
ORDER BY old_friend.name ASC
f:FRIEND :FRIEND
me friend fof
33. Give me:
Simple Recommendation
START me = node(...)
MATCH (me) -[r1:RATED ]->(thing)
<-[r2:RATED ]- (someone)
-[r3:RATED ]->(cool_thing)
WHERE ABS(r1.stars-r2.stars) <= 2
AND r3.stars > 3
RETURN cool_thing, count(*) AS cnt
ORDER BY cnt DESC LIMIT 10
r1:RATED thing r2:RATED
me so
r3:RATED
cool
thing
34. Results ?
• Tables:
for Human Brainz & Tools
• Graphs:
to highlight, visualize,
export & refining queries
35. One Step Back
• Goals / Intent
• Origins
• Decisions
• Implementation
• Future
36. What is Cypher?
• Graph Query Language for Neo4j
• Querying for Humans
37. Goals, Origins & Design
Picking good ideas and having some of our own
39. Something new?
• Existing Neo4j query mechanisms were not
simple enough
• Too verbose (Java API)
• Too prescriptive (Gremlin)
• Other query languages
41. Gremlin?
• DSL for pipes
• imperative
• a single path expression
• loop constructs
• side-effects
• lots of closures
42. SQL?
• Well known and understood
• Unable to express paths
• these are crucial for graph-based
reasoning
• Cumbersome Mutation
• Neo4j is schema/table free
43. SPARQL?
• SPARQL designed for a different data model
• namespaces / URI‘s
• reified properties as nodes
• SPARQL/RDF mostly in academia
• developers don‘t get it
• Pattern matching is cool
46. Design Decisions
Closures / Quantifiers
START london = node(1), moscow = node(2)
MATCH path = london -[*]-> moscow
WHERE all(city in nodes(path) where city.capital)
RETURN path
50. Design Decisions
Familiar for SQL users
select start
from match
where where
group by return
order by order by
skip
limit
51. Design Decisions
Database vs Application
Design Goal: single user
interaction expressible as
single query
Queries have enough logic to
find required data, not
enough to process it
52. Implementation
Docs from
Execution Tests
Java Bridge
Plan
Pattern
Matching
Parsing
Scala
57. Problems with Scala?
• Scala versions
• Slow compilation
• Low Collection Write Throughput
• Code gets easily complicated
• Hard to ramp up for other devs
• Big separate library
58. Implementation
Query Parts
• START
• MATCH
• Pattern Matching
• WHERE
• Expressions, Predicates
• RETURN
59. START
WHERE
Aggregation
BINDS
FILTERS
DELETE RESTRUCTURES
Identifier
REMOVE_FROM Result
BOUND_TO USED_IN PAGINATES
BUILDS_UP
SKIP/LIMIT
Graph FOUND_IN Pattern
DESCRIBES
ADD_TO CREATES
MATCH
FIX COMPLETES
CREATE
RELATE
60. START
START dev=node:user(name=“Andres“)
• lazy source of identifiers bound to nodes
and relationships
• each identifier is an iterable
• spawns execution per value
• cross product between multiple
• index lookup or direct lookup
61. MATCH
MATCH dev-[:WORKED_ON]->project
• describe patterns with ASCII art
• declare identifiers
• paths
• var. length paths
• graph algorithms
• optional relationships
• expands results with each found subgraph
62. Core Acitvity:
Pattern Matching
• Match clause with ASCII-art
• derive pattern description
• bound Nodes and Relationships
• finds patterns attached to bound entities
• each found Pattern spawns Subgraph Result
• recursive PM with Backtracking
63. Pattern Matching
• Scala
• incremental search with backtracking
• allows:
• Variable length paths
• Filter during matching
• optional relationships
• powerful but slower
64. Pattern Matching
• Java
• existing graph matching libray
• fast but less capable
• integrated with lazy Scala API
• NEW Traversal Framework approach
• for one or two bound nodes
• use Neo4j traversal framework
• pattern description & complexity
determine Pattern Matcher selection
65. Implementation
• Recursive matching with backtracking
START x=... MATCH x-->y, x-->z, y-->z, z-->a-->b, z-->b
66. Implementation
• Recursive matching with backtracking
START x=... MATCH x-->y, x-->z, y-->z, z-->a-->b, z-->b
67. Implementation
• Recursive matching with backtracking
START x=... MATCH x-->y, x-->z, y-->z, z-->a-->b, z-->b
68. Implementation
• Recursive matching with backtracking
START x=... MATCH x-->y, x-->z, y-->z, z-->a-->b, z-->b
69. Implementation
• Recursive matching with backtracking
START x=... MATCH x-->y, x-->z, y-->z, z-->a-->b, z-->b
70. Implementation
• Recursive matching with backtracking
START x=... MATCH x-->y, x-->z, y-->z, z-->a-->b, z-->b
71. Implementation
• Recursive matching with backtracking
START x=... MATCH x-->y, x-->z, y-->z, z-->a-->b, z-->b
72. Implementation
• Recursive matching with backtracking
START x=... MATCH x-->y, x-->z, y-->z, z-->a-->b, z-->b
73. Implementation
• Recursive matching with backtracking
START x=... MATCH x-->y, x-->z, y-->z, z-->a-->b, z-->b
74. WHERE
WHERE project.name = „Cypher“
• filters results
• single big boolean expression
• needs existing identifiers to work with
• much like SQL
75. Expressions
• expressions compute values
• composite Specification Pattern
• input is ExecutionContext Map
• have name
• declare symbolic dependencies
• self & composite
• directly derived from parser
• probably rewritten in between
76. Predicates
• WHERE clause filters results with a single
composed predicate
• boolean expressions
• composable boolean algebra
• Patterns as predicates
• Quantifiers (ALL, NONE, SINGLE)
• Collections as predicates
77. RETURN
RETURN project, collect(idea),count(fun)
• determines what to return
• column names or aliases with AS
• automatic aggregation
• when aggregation function exists
• all non-aggregated values are grouping
• lazyness is killed with aggregation
78. SKIP LIMIT ORDER BY
SKIP 5 LIMIT 10
ORDER by length(ideas) DESC
• the usual suspects
• lazyness is killed with ordering
79. Mutation
Transaction
WITH Idem-
(Separation) potence
DRY
Mass Data
Handling
Lazyness
80. Mutation
• need transactions
• must not disturb reads / traversals
• separation of read and write query parts
• explicit context change and scope: WITH
• implicit split for simple queries
• granularity of mutation aligned with # of
executions
• work with parameters
81. Mutation - Impl.
• Query Builder create UpdateCommands
• UpdatePipe ??
• Transaction Scope for whole Query
• collect statistics
• create new entities
• add identifiers to the ExecutionContext
• need to iterate through to get all updates
executed (no lazyness), special Pipe
• have to track already deleted entities
83. START
WHERE
Aggregation
BINDS
FILTERS
DELETE RESTRUCTURES
Identifier
REMOVE_FROM Result
BOUND_TO USED_IN PAGINATES
BUILDS_UP
SKIP/LIMIT
Graph FOUND_IN Pattern
DESCRIBES
ADD_TO CREATES
MATCH
FIX COMPLETES
CREATE
RELATE
84. WITH
WITH me, count(friend) as friends
• syntax like RETURN
• separate query parts like a pipe
• declares a new scope with new identifiers
• all other ids will be gone
• useful for HAVING
• spawns a new ExecutionContext
• continue with read or write part
85. CREATE
CREATE me = {name: „Michael“}
• create new nodes or relationships
• can work with map params (or Iterables
thereof)
• assign new identifiers
• can create full paths
86. RELATE
RELATE posts-[:POST]->(p {title: „..“}
• FIX the graph
• construct missing relationships and nodes
• need at least one bound node
• match given properties
• try to advance (from multiple sides) and
create missing stuff then iterate
87. DELETE
DELETE n, rel, m.prop
• as expected
• idempotent deletion
• can only delete unconnected nodes
• so delete relationships first
START n=node(*)
MATCH n-[r?]-()
DELETE n,r
88. SET
SET n.name = „Father of “+m.name
• can work with arbitrary expressions
• use coalesce for idempotent defaults
• can mass assign with map-parameters
89. FOREACH
FOREACH ( f in new_friends :
RELATE me-[:FRIEND]->f)
• iterable loop for mutating operations
• saves a lot of repetetive code
• wraps current execution context in a
temporary proxy
96. Live Console
• In Memory GDBs in Web Session
• Set up with mutating Cypher (or Geoff)
• Executes Cypher (also mutating)
• Visualizes Graph & Query Results (d3)
• Multiple Cypher Versions
• Share: short link, tweet, yUML
• Embeddable <iframe>
• Live Console for Docs
97. The Rabbithole
http://console.neo4j.org
This Graph: http://tinyurl.com/7cnvmlq
98. Homework
- Apply your Wits
• Go To http://bit.ly/geekout-cypher
• With your new Cypher knowledge
• Add yourself to the graph
• Determine a path from you to me
• Share the console on Twitter by
Monday
• Win this AR-Drone
Want to focus on implementation and design decisions, not so much usage\n
\n
\n
\n
\n
\n
\n
For an overview of the language & some basic examples\n
\n
\n
\n
\n
\n
first minute\n
\n
\n
\n
\n
\n
\n
second minute\n
second minute\n
\n
\n
\n
\n
3rd minute\n
\n
\n
There existed a number of different ways to query a graph database. This one aims to make querying easy, and to produce queries that are readable.\n\nWe looked at alternatives - SPARQL, SQL, Gremlin and other...\n
\n
Java API\n&#x2022;Object oriented\n&#x2022;Node, Relationship, Path objects \n&#x2022;Imperative\n&#x2022;Verbose\n&#x2022;Traversers, for-loops, Index\n&#x2022;mostly lazy-eval\n\nGremlin\n&#x2022;DSL for pipes\n&#x2022;imperative\n&#x2022;a single path expression\n&#x2022;loop constructs\n&#x2022;side-effects\n&#x2022;lots of closures\n\nSQL\n&#x2022;Well known and understood\n&#x2022;Unable to express paths\n&#x2022;these are crucial for graph-based reasoning\n&#x2022;Cumbersome Mutation\n&#x2022;Neo4j is schema/table free\n\n\nSPARQL\n&#x2022;SPARQL designed for a different data model\n&#x2022;namespaces / URI&#x2018;s\n&#x2022;reified properties as nodes\n&#x2022;SPARQL/RDF mostly in academia\n&#x2022;developers don&#x2018;t get it\n&#x2022;Pattern matching is cool\n\n
Java API\n&#x2022;Object oriented\n&#x2022;Node, Relationship, Path objects \n&#x2022;Imperative\n&#x2022;Verbose\n&#x2022;Traversers, for-loops, Index\n&#x2022;mostly lazy-eval\n\nGremlin\n&#x2022;DSL for pipes\n&#x2022;imperative\n&#x2022;a single path expression\n&#x2022;loop constructs\n&#x2022;side-effects\n&#x2022;lots of closures\n\nSQL\n&#x2022;Well known and understood\n&#x2022;Unable to express paths\n&#x2022;these are crucial for graph-based reasoning\n&#x2022;Cumbersome Mutation\n&#x2022;Neo4j is schema/table free\n\n\nSPARQL\n&#x2022;SPARQL designed for a different data model\n&#x2022;namespaces / URI&#x2018;s\n&#x2022;reified properties as nodes\n&#x2022;SPARQL/RDF mostly in academia\n&#x2022;developers don&#x2018;t get it\n&#x2022;Pattern matching is cool\n\n
\n
\n
\n
\n
Java API\n&#x2022;Object oriented\n&#x2022;Node, Relationship, Path objects \n&#x2022;Imperative\n&#x2022;Verbose\n&#x2022;Traversers, for-loops, Index\n&#x2022;mostly lazy-eval\n\nGremlin\n&#x2022;DSL for pipes\n&#x2022;imperative\n&#x2022;a single path expression\n&#x2022;loop constructs\n&#x2022;side-effects\n&#x2022;lots of closures\n\nSQL\n&#x2022;Well known and understood\n&#x2022;Unable to express paths\n&#x2022;these are crucial for graph-based reasoning\n&#x2022;Cumbersome Mutation\n&#x2022;Neo4j is schema/table free\n\n\nSPARQL\n&#x2022;SPARQL designed for a different data model\n&#x2022;namespaces / URI&#x2018;s\n&#x2022;reified properties as nodes\n&#x2022;SPARQL/RDF mostly in academia\n&#x2022;developers don&#x2018;t get it\n&#x2022;Pattern matching is cool\n\n
\n
Java API\n&#x2022;Object oriented\n&#x2022;Node, Relationship, Path objects \n&#x2022;Imperative\n&#x2022;Verbose\n&#x2022;Traversers, for-loops, Index\n&#x2022;mostly lazy-eval\n\nGremlin\n&#x2022;DSL for pipes\n&#x2022;imperative\n&#x2022;a single path expression\n&#x2022;loop constructs\n&#x2022;side-effects\n&#x2022;lots of closures\n\nSQL\n&#x2022;Well known and understood\n&#x2022;Unable to express paths\n&#x2022;these are crucial for graph-based reasoning\n&#x2022;Cumbersome Mutation\n&#x2022;Neo4j is schema/table free\n\n\nSPARQL\n&#x2022;SPARQL designed for a different data model\n&#x2022;namespaces / URI&#x2018;s\n&#x2022;reified properties as nodes\n&#x2022;SPARQL/RDF mostly in academia\n&#x2022;developers don&#x2018;t get it\n&#x2022;Pattern matching is cool\n\n
\n
Java API\n&#x2022;Object oriented\n&#x2022;Node, Relationship, Path objects \n&#x2022;Imperative\n&#x2022;Verbose\n&#x2022;Traversers, for-loops, Index\n&#x2022;mostly lazy-eval\n\nGremlin\n&#x2022;DSL for pipes\n&#x2022;imperative\n&#x2022;a single path expression\n&#x2022;loop constructs\n&#x2022;side-effects\n&#x2022;lots of closures\n\nSQL\n&#x2022;Well known and understood\n&#x2022;Unable to express paths\n&#x2022;these are crucial for graph-based reasoning\n&#x2022;Cumbersome Mutation\n&#x2022;Neo4j is schema/table free\n\n\nSPARQL\n&#x2022;SPARQL designed for a different data model\n&#x2022;namespaces / URI&#x2018;s\n&#x2022;reified properties as nodes\n&#x2022;SPARQL/RDF mostly in academia\n&#x2022;developers don&#x2018;t get it\n&#x2022;Pattern matching is cool\n\n
\n
Java API\n&#x2022;Object oriented\n&#x2022;Node, Relationship, Path objects \n&#x2022;Imperative\n&#x2022;Verbose\n&#x2022;Traversers, for-loops, Index\n&#x2022;mostly lazy-eval\n\nGremlin\n&#x2022;DSL for pipes\n&#x2022;imperative\n&#x2022;a single path expression\n&#x2022;loop constructs\n&#x2022;side-effects\n&#x2022;lots of closures\n\nSQL\n&#x2022;Well known and understood\n&#x2022;Unable to express paths\n&#x2022;these are crucial for graph-based reasoning\n&#x2022;Cumbersome Mutation\n&#x2022;Neo4j is schema/table free\n\n\nSPARQL\n&#x2022;SPARQL designed for a different data model\n&#x2022;namespaces / URI&#x2018;s\n&#x2022;reified properties as nodes\n&#x2022;SPARQL/RDF mostly in academia\n&#x2022;developers don&#x2018;t get it\n&#x2022;Pattern matching is cool\n\n
\n
\n
\n
Parser Combinators\n&#x2022;Scala Library - DSL for parsable Patterns\n&#x2022;Composable parsers\n&#x2022;Literals and expressions become AST values\n&#x2022;additional checking and differentiation with match\n&#x2022;esp. for better error messages\n&#x2022;modularized parsers with traits\n&#x2022;Tokens, Expressions, Patterns, Where, ....\n&#x2022;combine traits for parser availability\n\nQuery Builders\n&#x2022;input: parsed query AST\n&#x2022;builder for each kind of pipe\n&#x2022;incrementally ask most suitable builder\n&#x2022; to add a fragment to the pipe construction\n&#x2022;or rewrite existing pipes\n&#x2022;error \n&#x2022;when leftover input and incomplete output\n&#x2022;when cannot advance due missing deps.\n\nCombining Pipes\n&#x2022;Each step in the execution is a Pipe\n&#x2022;has a sourcePipe\n&#x2022;is a TraversableLike[Map,Any]\n&#x2022;ExecutionContext holds (mutable) query state\n&#x2022;Pipes declare symbol dependencies\n&#x2022;from their expressions\n\nLazy Evaluation\n&#x2022;Match clause with ASCII-art\n&#x2022;derive pattern description\n&#x2022;bound Nodes and Relationships\n&#x2022;finds patterns attached to bound entities\n&#x2022;each found Pattern spawns Subgraph Result\n
Parser Combinators\n&#x2022;Scala Library - DSL for parsable Patterns\n&#x2022;Composable parsers\n&#x2022;Literals and expressions become AST values\n&#x2022;additional checking and differentiation with match\n&#x2022;esp. for better error messages\n&#x2022;modularized parsers with traits\n&#x2022;Tokens, Expressions, Patterns, Where, ....\n&#x2022;combine traits for parser availability\n\nQuery Builders\n&#x2022;input: parsed query AST\n&#x2022;builder for each kind of pipe\n&#x2022;incrementally ask most suitable builder\n&#x2022; to add a fragment to the pipe construction\n&#x2022;or rewrite existing pipes\n&#x2022;error \n&#x2022;when leftover input and incomplete output\n&#x2022;when cannot advance due missing deps.\n\nCombining Pipes\n&#x2022;Each step in the execution is a Pipe\n&#x2022;has a sourcePipe\n&#x2022;is a TraversableLike[Map,Any]\n&#x2022;ExecutionContext holds (mutable) query state\n&#x2022;Pipes declare symbol dependencies\n&#x2022;from their expressions\n\nLazy Evaluation\n&#x2022;Match clause with ASCII-art\n&#x2022;derive pattern description\n&#x2022;bound Nodes and Relationships\n&#x2022;finds patterns attached to bound entities\n&#x2022;each found Pattern spawns Subgraph Result\n
\n
\n
\n
\n
\n
\n
\n
\n
5th minute\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Mutation\n&#x2022;need transactions\n&#x2022;must not disturb reads / traversals\n&#x2022;separation of read and write query parts\n&#x2022;explicit context change and scope: WITH\n&#x2022;implicit split for simple queries\n&#x2022;granularity of mutation aligned with # of executions\n&#x2022;work with parameters\n\nMutation-Impl\n&#x2022;Query Builder create UpdateCommands\n&#x2022;UpdatePipe ??\n&#x2022;Transaction Scope for whole Query\n&#x2022;collect statistics\n&#x2022;create new entities\n&#x2022;add identifiers to the ExecutionContext\n&#x2022;need to iterate through to get all updates executed (no lazyness), special Pipe\n&#x2022;have to track already deleted entities\n\n
\n
\n
\n
5th minute\n
\n
\n
\n
DELETE\n&#x2022;as expected\n&#x2022;idempotent deletion\n&#x2022;can only delete unconnected nodes\n&#x2022;so delete relationships first\n&#x2022;keeps track of deleted ndoes\n
\n
\n
\n
\n
\n
Self Documenting Tests\n&#x2022;Provides\n&#x2022;Title & Description\n&#x2022;Define Sample Graph\n&#x2022;Declare & Execute Query\n&#x2022;Results\n\n&#x2022;Asserts\n&#x2022;No Syntax Errors\n&#x2022;Multiple Cypher Versions\n&#x2022;No Execution Errors\n&#x2022;Results \n&#x2022;Resulting graph\n\n&#x2022;Generates\n&#x2022;Graph Rendering Graph-Viz\n&#x2022;Tabular Results\n&#x2022;Ascii-doc for Documentation\n&#x2022;Live-Console Integration\n
\n
\n
\n
Life-Console\n&#x2022;In Memory GDBs in Web Session\n&#x2022;Set up with mutating Cypher (or Geoff)\n&#x2022;Executes Cypher (also mutating)\n&#x2022;Visualizes Graph & Query Results (d3)\n&#x2022;Multiple Cypher Versions\n&#x2022;Share: short link, tweet, yUML\n&#x2022;Embeddable <iframe>\n&#x2022;Live Console for Docs\n