Cypher to SQL online mapper

GraHPEr: Graph queries
on relational data
Luis Vaquero, Marco Lotz, James Brook, Joan Varvenne,
Suksant Sae Lor, David Subiros, Herry Herry, Brian Monahan
March 2016

“Ma’ Look! Graph
Analytics without
Graphs!!!!
June 2016

CC: https://www.youtube.com/watch?v=CxKOSAtMC1g

CC by Ole Rinnan.
http://www.vg.no/forbruker/bil-baat-og-motor/bil-og-trafikk/post-it-feberen-brer-seg/a/165769/

Outline
1. Problem
2. Our solution
3. Underlying magic/technology
4. Competition
5. Status and timeline
6. Summary and call to action

The Case for Graph Analytics on Relational Databases
• Lots of data sitting in relational databases (accumulated over the last few decades)
• Some data are simply too bulky to move around
• Consistency / Cascading issues slow down write throughput (key in big data apps)
• Simple graph syntax and semantics to build our queries

Relational Data as Graphs: Problems
1. Raw SQL or stored procedures on relational DBs (“monster SQL queries”)
2. Copy data from its original source to construct a new graph (duplication)
from Pixabay under CC
by Chris Downer under CC
from Pixabay under CC

Graph syntax/semantics
on relational DBs
without duplication

Relational GraHPEr
Query ProcessorGraph Schema Extractor
2 Related (but Independent) main functionalities:
Database Schema
Set of Graph Topologies
Graph Topology Cypher Query
Equivalent SQL Query

Relational Tables
Id Title Released Tagline
01 Matrix 1999 Enter the Matrix
Id Name Born
01 Keanu Reeves 1964
person_id Movie_id
01 02
Person_id Movie_id Role
01 01 Neo
person_id Movie_id
01 03
Movie
Person
Directed Produced
Acted In

The Equivalent Graph Topology (Gtop)
Movie
Properties:
 ID
 Title
 Released
 Tagline
Person
Properties:
 ID
 Name
 Born
Acted in
Attributes: Role
Produced
Attributes: None
Directed
Attributes: None
By default, an entity-relationship diagram
Advanced ML enables finding different graphs in the data

Query Processor
Parser
MATCH (m:Movie) RETURN m.title
SELECT m.title FROM Movie
Visitor
Query Builder

Query Processor
ParserMATCH (m:Movie) RETURN m.title
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,List(LabelName(movie))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
UnaliasedReturnItem(
Property(
Variable(m)
,PropertyKeyName(title)
)
,m.title
)

Query Processor
Parser
Visitor
Query Builder
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
Property(
Variable(m)
)
,m.title

Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
Property(
Variable(m)
)
,m.title
)
))
,None,None,None)
)
)
)

Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
Property(
Variable(m)
)
,m.title
)
))
,None,None,None)
)
)
)
Match - false

Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
Property(
Variable(m)
)
,m.title
)
))
,None,None,None)
)
)
)
Pattern
Match - false

Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
Property(
Variable(m)
)
,m.title
)
))
,None,None,None)
)
)
)
Pattern
Match - false
NodePattern

Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
Property(
Variable(m)
)
,m.title
)
))
,None,None,None)
)
)
)
Pattern
Match - false
NodePattern – m

Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
Property(
Variable(m)
)
,m.title
)
))
,None,None,None)
)
)
)
Pattern
Match - false
NodePattern – m - movie

Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
Property(
Variable(m)
)
,m.title
)
))
,None,None,None)
)
)
)
Pattern
Match - false
Return

Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
Property(
Variable(m)
)
,m.title
)
))
,None,None,None)
)
)
)
Pattern
Match - false
Return
ReturnItem

Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
Property(
Variable(m)
)
,m.title
)
))
,None,None,None)
)
)
)
Pattern
Match - false
Return
ReturnItem – m

Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
Property(
Variable(m)
)
,m.title
)
))
,None,None,None)
)
)
)
Pattern
Match - false
Return
ReturnItem – m - title

Query Processor
Parser
Visitor
Query Builder
Pattern
Match - false
Return

Query Processor SQL Builder
Pattern
Match - false
Return
Template Matcher SQL Templates

Pattern
Match - false
Return

Pattern
Match - false
Return
@*
This is a template for the Return cause of Cypher language.
*@
@args List returnItems
@args boolean distinct
@for(Map properties: returnItems) {@properties.get("property")
@if(!properties_isLast){,}} HPE Confidential

Pattern
Match - false
SELECT m.title

Pattern
Match - false
Template Matcher
Gtop: "implementationLevel" : {
"implementationNodes": [
{
"synonyms": ["movie"],
"tableName": "Movie",
"id" : [
{
"columnName": "id",
"dataType": "INTEGER"
}
]
}
]
}

FROM Movie
SELECT m.title

I/O
MATCH (p: Person)-[:person_id_acted_in_person_id]->(m: Movie) RETURN p.name, m.title
Input Cypher Query:
Expected output SQL:
SELECT p.name, m.title
FROM person AS p
JOIN acted_in ON (acted_in.person_id = p.id)
JOIN movie AS m ON (m.id = acted_in.movie_id)

It doesn’t stop there!
MATCH (p: Person) --> (m) return m
Input Cypher Query:
Expected output SQL: >>>>>>>>>

Hidden SQL Monsters
SELECT 'movie['||m.id||']' AS m
FROM person AS p
JOIN directed ON (directed.person_id = p.id)
JOIN movie AS m ON (m.id = directed.movie_id)
UNION ALL
FROM person AS p
JOIN acted_in ON (acted_in.person_id = p.id)
JOIN movie AS m ON (m.id = acted_in.movie_id)
UNION ALL
FROM person AS p
JOIN produced ON (produced.person_id = p.id)
JOIN movie AS m ON (m.id = produced.movie_id)

It doesn’t stop there!
MATCH (keanu: Person { name: 'Keanu Reeves' }) --> (m: Movie {released: '1999'}) return m
Input Cypher Query:
Expected output SQL: >>>>>>>>>

Hidden SQL Monster
SELECT 'movie['||m.id ||']' AS m
FROM person AS keanu
JOIN directed ON ( directed.person_id = keanu.id )
JOIN movie AS m ON ( m.id = directed.movie_id )
WHERE keanu.name = "keanu reeves" AND m.released = "1999"
UNION ALL
JOIN acted_in ON ( acted_in.person_id = keanu.id )
JOIN movie AS m ON ( m.id = acted_in.movie_id )
UNION ALL
JOIN produced ON ( produced.person_id = keanu.id )
JOIN movie AS m ON ( m.id = produced.movie_id )

Vendor Landscape
Market largely dominated by Neo4J3 -> This is why we chose Cypher as our query language
Me too: JSON databases are jumping into the space by enabling links between docs with
properties associated (e.g. ArangoDB)
Trends towards multimodal DB (OrientDB, DataStax, )
Consolidation: Experian acquired 4Store (now for internal use only) and, DataStax has acquired
Aurelius (Titan graph database).
1. https://en.wikipedia.org/wiki/Oracle_Spatial_and_Graph
2. http://www.teradata.com/SQL-GR-Engine
3. http://zion-city.blogspot.co.uk/2012/05/graphdb-market-share.html
* Find a more detailed comparison in the two last backup slides below

Vendor/Research Landscape
Simple
Queries
No Data
Duplication
Teradata SQL-GR
RapidGrapher
Oracle Spatial&Graph
IBM Graph
Neo4J
GraHPEr
Names in bold blue indicate products
Names on black font indicate research
IBM’s SQLGraph
GraphGen
Stanford’s Ringo
Spark’s GraphFrames

Cypher Language Coverage
Cypher clause Supported Details
Return Y
Order by Y
Limit Y
With Y http://wes.skeweredrook.com/the-mythical-with-neo4js-
cypher-query-language/
Skip N Can be implemented as a post-processing stage
Union N Current GraHPEr syntactic parser to split query in two
Unwind N No support for in-query collection/function handling yet
Using N Hint neo to use “right” index
General Clauses
50% of general clauses implemented
25% are easy to implement with minimum effort based on our current code base
12.5% require us to invest time in in-query collection/function processing
12.5% are neo4j specific

Reading Clauses
68% of read clauses implemented
20% are easy to implement with minimum effort based on our current code base
8% require us to invest time in in-query collection/function processing or build a REP
4% are for use with legacy indices in neo4jCypher clause Supported Details
Match by id Y
Match by type Y
Match by rel patter Y
Match by multiple types Y
Match multiple relationships Y
Match variable length relationships Y
Match anonymous edges and nodes Y
Match zero-path length Y
Where Y
Where on property Y
Where on label Y
Where patterns Y MATCH (n)WHERE (n)-[:KNOWS]-({ name:'Tobias' })RETURN n
Where range Y
Count Y
Distinct Y
Sum, avg, max, min Y
Case Y
Optional match N the Cypher equivalent of the outer join in SQL
Match rels with uncommon chars N
Where with string matching N
Where with regexes N
Percentile, std N can be implemented as a post-processing stage
Where on dynamic property N Requires REPL like utility
Where collection patterns N (partial) MATCH (tobias { name: 'Tobias' }),(others)WHERE others.name IN
['Andres', 'Peter'] AND (tobias)<--(others) RETURN others
Start N Deprecated/legacy usage. No plans to support.

Cypher clause Supported Details
ALL/ANY/NONE/SINGLE/E
XISTS
SIZE on collection
SIZE on pattern
LENGTH on collection
LENGTH on pattern
TYPE
Id
COALESCE
HEAD/LAST
Timestamp
Startnode / Endnode
Toint / Tofloat
Nodes
Relationships
Labels
Keys
Extract (map)
Filter
Tail
Range
Reduce
Math functions
String functions
Functions

No Writing Support

Summary
For: customers with large RDBMs deployments
Who: would like to do some graph analytics (multi modal)
without migrating massive amounts of data to other platform
GraHPEr
Provides: read-only easy installation library
discovery of graphs in relational data
to query relational data in a graphy way
without data duplication or cascade effects
single-system administration
Unlike: solutions that need data to be copied and adapted to a graph format
or expose complex / verbose graph functions as stored procedures

GraHPEr: Unique Selling Points
• Storage and transfer time savings
o No data duplication
o No separate system to manage
• Easy to install / minimally intrusive -> multimodal DBs made easy
o Just a read-only library on top of existing DB deployments
• Declarative graph query language (compatibility with the market leader, Neo4J1)
o Tap on large existing communities / reuse current code
1. http://neo4j.com/top-ten-reasons/

Thank you
Luis M. Vaquero
Hewlett Packard Enterprise
Contact: luis.vaquero@hpe.com

Graph Analytics
Not just startups in Sillicon-Valley: the Lufthansas, Walmarts, the USBs, and the AT&Ts too
ScriptHop: A motion-picture is graph among interconnected stakeholders, including producers,
directors, casting agents, cinematographers, actors, and so on.
Determine scripts with characters whose particular attributes (such as minorities) make them likely
to require loots of screen time, which might be excessively costly and time-consuming to produce
ORiGAMI – Oak Ridge Graph Analytics for Medical Innovation
http://www.forbes.com/sites/danwoods/2015/12/29/why-graph-technology-is-ready-for-its-close-up-in-2016

Quick Figures
• 1% market penetration today
• Forrester Research: it will reach over 25 percent of all enterprises by 2017
• Popular tools:
o GraphConnect (Neo4J, SF’15):
 more than 1000 developers
 more than 350 organisations
o 1000000+ downloads
o 124 contributors
o 36500 commits

Performance, Really?
“Relational DBs have 40 years of success behind them”
http://istc-bigdata.org/index.php/benchmarking-graph-databases/
HPE Confidential

Vendor Landscape
Vendor Date License Model Query Language
Complexible 2012 Commercial RDF SPARQL
DataStax 2011 Open
Source
Property Gremlin
FlockDB 2010 Open
Source
Property Java
Franz (AllegroDB) 2005 Dual RDF SPARQL, RDFS++, OWL2-RL, Prolog
Neo4J 2007 Open
Source
Property Cypher,
native API,
TinkerPop
Objectivity 2011 Commercial Objects Java
Oracle 2015 Commercial Property Java, Gremlin, Groovy, Python
Orient Tech 2011 Open
Source
Property REST, Gremlin, SPARQL, SQL
Informatica 2015 Commercial RDF SPARQL
Ontotext/GraphDB 2000 Commercial RDF SPARQL
Teradata SQL-GR 2015 Commercial Relational SQL
IBM Graph 2015 Dual Property Gremlin
Actian 2014 Commercial RDF SPARQL
MarkLogic 2015 Commercial RDF SPARQL
ArangoDB Commercial Property AQL, Blueprints

Graph to SQL
• Plenty of tools converting from SQL to graph languages.
• We want the opposite: Graph to SQL
Feature SQLGraph (IBM) GraphiQL (MIT) GraHPEr (HPE)
Language non-side-effecting Gremlin
to SQL compilation
Pig-Latin inspired new
declarative language
compiled into SQL
OpenCypher (with time
extensions) compiled to SQL
SQL Exploits recursive/iterative
queries
Exploits recursive/iterative
queries
ANSI92 with Vertica-friendly
optimisations
Additional
tables
Created relational tables (to
represent edges and
nodes)
Separate GraphTables Maximise reuse of existing tables
Integration
with pre-
existing
installations
Requires migration Requires migration No migration
Type of
analysis
Bulk Bulk Time-based
Benchmarks Large-scale Mid-scale (SNAP data) TBD (goal is large-scale, but time
constraints are key)

Cypher to SQL online mapper

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Cypher to SQL online mapper

Similar to Cypher to SQL online mapper (20)

Recently uploaded

Recently uploaded (20)

Cypher to SQL online mapper