SlideShare una empresa de Scribd logo
1 de 31
© DataStax, All Rights Reserved. Confidential
How Do *You* Do
Graph?
Ben Krug
Technical Support Engineer, DataStax
1
Who am I?
© 2016 DataStax, All Rights Reserved. 2
A Technical Support Engineer at DataStax.
Previously, Support Engineer at MySQL, then Sun, then Oracle.
Before that, a DBA / sysadmin for banks, utilities, startups, medical and
insurance companies, etc.
Over 25 years in DBMSs, from ISAM and hierarchical to relational,
NoSQL, and graph.
Blogs: formerly oracle2mysql.wordpress.com, now
intertubes.wordpress.com
Disclaimer: Any opinions given are my own!
My topic:
How to look at graphs (the best way?)
© 2016 DataStax, All Rights Reserved. 3
● This will be an opinionated discussion!
● Is there a best way?
● We've probably all done a lot of relational - does that help?
Goals:
© 2016 DataStax, All Rights Reserved. 4
● Discuss DM theory (compare and contrast) and some FUD
● Give an overview of some tools, in the context of the discussion (focused
on Tinkerpop, Spark, etc, relating to the DSE Graph implementations)
1st, what's a (property) graph?
© 2016 DataStax, All Rights Reserved. 5
● A collection of labeled nodes and (directed) edges
● Formally, one example of a definition is:
G = (V,E,λ), where V is a set of vertices, E (V ×V) is a multi-set of directed binary edges, and λ : ((V⊆ ∪
E) × Σ ) → (U  (V E)) is a partial function that maps an element/string pair to an object in the universal∗ ∪
set U (excluding vertices and edges as allowed property values).*
* The Gremlin Graph Traversal Machine and Language, Marko A. Rodriguez, 2015 Proceedings of the ACM Database
Programming Languages Conference
By contrast, what's a relational
database?
© 2016 DataStax, All Rights Reserved. 6
● A collection of rows and columns, organized into tables?
● wikipedia: a digital database based on the relational model of data, as proposed by E. F. Codd in 1970.
● google dictionary: a database structured to recognize relations among stored items of information.
● Formally, one example of a definition is: ?
○ Maybe we could base one on the relational algebra, but it's all very
wordy and difficult to pin down concisely.
● Or, we can say an RDBMS is one that adheres to "Codd's 12 rules"
which might mean that none truly exist (see, eg, rule 6, the "view updating rule)!
By contrast, what's a relational
database?
© 2016 DataStax, All Rights Reserved. 7
● A collection of rows and columns, organized into tables?
● wikipedia: a digital database based on the relational model of data, as proposed by E. F. Codd in 1970.
● google dictionary: a database structured to recognize relations among stored items of information.
● Formally, one example of a definition is: ?
● We can base it on the relational algebra, but it's all very wordy and
difficult to pin down concisely.
● Or, we can say an RDBMS is one that adheres to "Codd's 12 rules"
which might mean that none truly exist (see, eg, rule 6, the "view updating rule)
Let's pretend that we know what we mean - basically, tables of rows and
columns, normalized to some degree, with integrity constraints, "etc".
Importantly, it separates logical view from physical storage.
Things we might hear (that I disagree
with)
© 2016 DataStax, All Rights Reserved. 8
● Graph is an entirely new world, wholly distinct and separate from
relational. Relational is just a ball and chain.
"The data explosion demands new solutions, yet the hoary old RDBMS still rules." (InfoWorld)
Things we might hear (that I disagree
with)
© 2016 DataStax, All Rights Reserved. 9
● Graph is an entirely new world, wholly distinct and separate from
relational. Relational is just a ball and chain.
"The data explosion demands new solutions, yet the hoary old RDBMS still rules." (InfoWorld)
● Relational is about a static view of bits of data, not relations. (!)
"Joins are bad, mkay?" (https://oracle2mysql.wordpress.com/2016/02/18/joins-are-bad-mkay/)
Things we might hear (that I disagree
with)
© 2016 DataStax, All Rights Reserved. 10
● Graph is an entirely new world, wholly distinct and separate from
relational. Relational is just a ball and chain.
"The data explosion demands new solutions, yet the hoary old RDBMS still rules." (InfoWorld)
● Relational is about a static view of bits of data, not relations. (!)
"Joins are bad, mkay?" (https://oracle2mysql.wordpress.com/2016/02/18/joins-are-bad-mkay/)
● "Native" graph databases must be better than "non-native".
(for a good - and fun - rebuttal of this, see https://www.datastax.com/dev/blog/a-letter-
regarding-native-graph-databases)
Things we might hear (that I disagree
with)
© 2016 DataStax, All Rights Reserved. 11
● Graph is an entirely new world, wholly distinct and separate from
relational. Relational is just a ball and chain.
"The data explosion demands new solutions, yet the hoary old RDBMS still rules." (InfoWorld)
● Relational is about a static view of bits of data, not relations. (!)
"Joins are bad, mkay?" (https://oracle2mysql.wordpress.com/2016/02/18/joins-are-bad-mkay/)
● "Native" graph databases must be better than "non-native".
(for a fun rebuttal of this, see https://www.datastax.com/dev/blog/a-letter-regarding-native-
graph-databases)
● Joins are inherently slower than graph traversals.
this one calls for its own slide...
Are joins slower than traversals? It
depends...
© 2016 DataStax, All Rights Reserved. 12
Is O(k) faster than O(log(n)) when k << n? (k is constant) Not always!
Eg, Friends of Friends - say k=avg # of friends, n=number of people.
(This is a common example given.)
O(k) = O(1) - it must be fast, right? Not necessarily… For example,
"read an entry from a list of length n":
This is not entirely facetious! The clock time depends on the
algorithms that store, read, and deserialize the data, and what data
they need to process in order to find the results.
O(1) algorithm:
read the entire disk, and
return the the entry asked for.
O(log(n)) algorithm:
use an index to find the entry,
and return it.
(eg, see An Evaluation of Alternative Physical Graph Data Designs for Processing Interactive Social Networking Actions, Ghandeharizadeh,
Boghrati, and Barahmand, Database Laboratory Technical Report, Computer Science Department, USC, 2014)
Examples from a graph company's site
© 2016 DataStax, All Rights Reserved. 13
This all comes from the first page that came up (paid) when I googled "index free adjacency". Or, it's the first link if you google "graph
databases future".
"Some graph databases use native graph storage that is specifically designed to store and manage
graphs – from bare metal on up. Other graph technologies use relational, columnar or object-
oriented databases as their storage layer. Non-native storage is often slower than a native approach
because all of the graph connections have to be translated into a different data model."
This, we've already discussed - what is "native" graph storage? What guarantees that it's faster?
"Native graph processing (a.k.a. index-free adjacency) is the most efficient means of processing data
in a graph because connected nodes physically point to each other in the database. Non-native
graph processing engines use other means to process Create, Read, Update or Delete (CRUD)
operations that aren’t optimized for handling connected data."
As you scale to multiple nodes, "physically pointing" becomes meaningless. And again, what guarantees that
it's faster? And the last sentence is just FUD, any mature database is heavily optimized for CRUD, as are
RDBMSs. (For a nice rebuttal, focusing on a single server implementation, see
https://www.arangodb.com/2016/04/index-free-adjacency-hybrid-indexes-graph-databases/ .)
Examples from a graph company's site
(cont)
© 2016 DataStax, All Rights Reserved. 14
This all comes from the first page that came up (paid) when I googled "index free adjacency". Or, it's the first link if you google "graph
databases future".
"With traditional databases, relationship queries come to a grinding halt as the number and depth of
relationships increase. In contrast, graph database performance stays constant even as your data
grows year over year."
- will traversals stay constant as edges increase? What about queries that span the graph?
- for one example of numbers - https://intertubes.wordpress.com/2017/11/28/benchmarketing-neo4j-and-
mysql/
"With graph databases, your IT and data architecture teams move at the speed of business because
the structure and schema of a graph data model flex as your solutions and industry change. Your
team doesn’t have to exhaustively model your domain ahead of time (and then exhaustively remodel
and migrate the DB after some exec asks for a change)."
This overstates the impacts of a schema change in a well-designed system.
"your application doesn’t have to infer data connections using things like foreign keys or out-of-band
processing, like MapReduce."
Edges are not so different than foreign keys; is it really so difficult to "infer"? What does "out-of-band" mean?
Now, on to the reality (as I see it)
© 2016 DataStax, All Rights Reserved. 15
Now, on to the reality (as I see it)
© 2016 DataStax, All Rights Reserved. 16
● For data models, there's often a mapping between graph DBs with and
relational DBs.
Now, on to the reality (as I see it)
© 2016 DataStax, All Rights Reserved. 17
● For data models, there's often a mapping between graph DBs and
relational DBs
It's hard to formalize, because relational is hard to formalize.
Now, on to the reality (as I see it)
© 2016 DataStax, All Rights Reserved. 18
● For data models, there's often a mapping between graph DBs and
relational DBs
It's hard to formalize, because relational is hard to formalize.
A picture, with a common example of a graph DB:
Now, on to the reality (as I see it)
© 2016 DataStax, All Rights Reserved. 19
● For data models, there's often a mapping between graph DBs and
relational DBs
It's hard to formalize, because relational is hard to formalize.
● How about integrity constraints?
○ attributes must be of the types specified in the schema
○ (nullable) FKs must reference a field in child table; an edge must point
to a vertex.
As a result, many ways to view a graph
© 2016 DataStax, All Rights Reserved. 20
● A graph view (gremlin)
● A relational view (SparkSQL, dseGraphFrames, Studio (JDBC/ODBC))
● A graphical graph view (Studio)
● sometimes, the underlying storage view (in DSE, CQL (for 7eet h4x0rs))
How about data access methods?
© 2016 DataStax, All Rights Reserved. 21
● It's often said that SQL/CQL are declarative
● gremlin offers an imperative access method
● Is declarative for relational, and imperative for graphs?
● It's not so simple...
CQL is declarative; is SQL declarative?
© 2016 DataStax, All Rights Reserved. 22
● A lot of SQL is declarative. declarative is "relational", esp in early SQL.
● SQL '99 offers CTEs, which have an imperative element.
● Many people use UDFs with SQL/CQL, or embed SQL/CQL in imperative
(procedural) code.
● Most SQL vendors offer extenstions, like PL/SQL or TSQL, with
imperative (procedural) elements.
Even declarative SQL can be
imperative-ish
© 2016 DataStax, All Rights Reserved. 23
● Consider an equality lookup, joined to a child, joined to a child. (Eg,
Friends of Friends - a "graphy" query.)
● We know this will be implemented in one way - look up the key, go look
up the child, then look at its child (like a traversal).
● Does that make declarative SQL imperative?
gremlin is declarative and imperative
© 2016 DataStax, All Rights Reserved. 24
● An example: managers of "gremlin's" collaborators:
declarative:
g.V().match(
as("a").has("name","gremlin"),
as("a").out("created").as("b"),
as("b").in("created").as("c"),
as("c").in("manages").as("d"),
where("a",neq("c"))).
select("d").
groupCount().by("name")
imperative:
g.V().has("name","gremlin").as("a"
).
out("created").in("created").
where(neq("a")).
in("manages").
groupCount().by("name")
taken from https://tinkerpop.apache.org/gremlin.html
comparing: SQL and gremlin
© 2016 DataStax, All Rights Reserved. 25
There's a nice site comparing SQL and gremlin for a graph database with a
schema: sql2gremlin.com. Some examples:
SELECT DISTINCT LEN(CategoryName)
FROM Categories
SELECT ProductName, UnitPrice
FROM Products
WHERE UnitPrice >= 5 AND UnitPrice < 10
SELECT Products.ProductName
FROM Products
INNER JOIN Categories
ON Categories.CategoryID =
Products.CategoryID
WHERE Categories.CategoryName =
'Beverages'
g.V().hasLabel("category").values("name").
map {it.get().length()}.dedup()
g.V().has("product", "unitPrice",
between(5f,10f)).valueMap("name", "unitPrice")
g.V().has("name","Beverages").in("inCategory").
values("name")
SparkSQL is declarative and imperative
© 2016 DataStax, All Rights Reserved. 26
● The data model currently exposed is a bit hard, very unnormalized -
vertices table and edges table
● eg (using Studio), to find killrvideo movies Daniel Day-Lewis acted in:
select movie.title
from killrvideo_vertices actor
join killrvideo_edges acts_in
join killrvideo_vertices movie
on actor.name = 'Daniel Day-Lewis'
and acts_in.dst = actor.id
and acts_in.src = movie.id;
● imperative, because you can use spark functionality
"relational" - DseGraphFrames
● DseGraphFrames are an extenstion of Spark GraphFrames
○ GraphFrames are built on DataFrames. (DataFrames add a bit of
relational ease to Spark.)
● Some simple examples (from our docs and databricks docs):
g.E().groupCount().by(T.label)
g.V().has("age", P.gt(30)).show
g.V().hasLabel("person").drop().iterate()
● DseGraphFrames support GraphX (and other) Spark libraries
○ GraphX has functions for PageRank, connected components,
shortest path, etc
27
You can also use CQL to peek under
the hood
© 2016 DataStax, All Rights Reserved. 28
● However, this is generally not so useful
cassandra@cqlsh:killrvideo> select * from person_p where name = 'Daniel Day-Lewis' allow filtering;
community_id | member_id | ~~property_key_id | ~~property_id | name | personId | ~~vertex_exists
----------------------+----------------+----------------------------+----------------------------------------------------------+-------------------------+--------------+----------------------
1916035712 | 76 | 32771 | 00000000-0000-8003-0000-000000000000 | Daniel Day-Lewis | null | null
…
cassandra@cqlsh:killrvideo> select * from person_e where community_id = 1916035712 and member_id = 76;
community_id | member_id | ~~edge_label_id | ~~adjacent_vertex_id | ~~adjacent_label_id | ~~edge_id |
~~edge_exists | ~~simple_edge_id
---------------------+-----------------+-------------------------+-------------------------------------------+------------------------------+--------------------------------------------------------
+-----------------------------+------------------
1916035712 | 76 | 65571 | 0x1121e2800000000000000205 | 4 | a018fe3c-b87f-11e8-9882-f533cf6b1c0d |
True | null
...
This is an implementation, storage detail, not a logical view!
My vision - all-in-one
● Seemlessly conceptualize and access your graph in many ways
○ declarative, imperative, graphy, "relational" - avoid FUD!
● We're going in that direction!
● Use the best tools for the job at hand.
29
How does DSE integrate all this and help?
● DSE Graph implements Tinkerpop, gremlin, DseGraphFrames, and more
(eg, Solr integration).
● gremlin gives you powerful imperative and declarative methods, for
traversals and graph analyses.
● SparkSQL/DseGraphFrames do too, with the power of Spark.
● gremlin language variants (GLVs) allow you to program in your favorite
(supported) language.
● Solr can be integrated, which adds a declarative element to narrow
queries.
● gremlin OLAP uses SparkSQL to speed complex queries.
● DSE Studio offers graphical views and gremlin and SparkSQL access.
30
© DataStax, All Rights Reserved. Confidential
How Do *You* Do Graph?
Ben Krug
Technical Support Engineer, DataStax
ben.krug@datastax.com
31

Más contenido relacionado

La actualidad más candente

Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...Markus Harrer
 
Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j
 
GraphTour - Neo4j Platform Overview
GraphTour - Neo4j Platform OverviewGraphTour - Neo4j Platform Overview
GraphTour - Neo4j Platform OverviewNeo4j
 
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4jNeo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4jNeo4j
 
4. Document Discovery with Graph Data Science
 4. Document Discovery with Graph Data Science 4. Document Discovery with Graph Data Science
4. Document Discovery with Graph Data ScienceNeo4j
 
A Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain OptimizationA Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain OptimizationNeo4j
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4jNeo4j
 
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4jNeo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4jNeo4j
 
Experiments With Knowledge Graphs in Fisheries & Oceans Canada
Experiments With Knowledge Graphs in Fisheries & Oceans CanadaExperiments With Knowledge Graphs in Fisheries & Oceans Canada
Experiments With Knowledge Graphs in Fisheries & Oceans CanadaNeo4j
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AINeo4j
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best PracticesNeo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best PracticesNeo4j
 
Neo4j GraphTalk Florence - Introduction to the Neo4j Graph Platform
Neo4j GraphTalk Florence - Introduction to the Neo4j Graph PlatformNeo4j GraphTalk Florence - Introduction to the Neo4j Graph Platform
Neo4j GraphTalk Florence - Introduction to the Neo4j Graph PlatformNeo4j
 
Graph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jGraph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jijtsrd
 
Neo4j im Einsatz gegen Geldwäsche und Finanzbetrug
Neo4j im Einsatz gegen Geldwäsche und FinanzbetrugNeo4j im Einsatz gegen Geldwäsche und Finanzbetrug
Neo4j im Einsatz gegen Geldwäsche und FinanzbetrugNeo4j
 
Neo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j
 
GraphTour 2020 - Neo4j: What's New?
GraphTour 2020 - Neo4j: What's New?GraphTour 2020 - Neo4j: What's New?
GraphTour 2020 - Neo4j: What's New?Neo4j
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4jNeo4j
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Neo4j
 
Making connections matter: 2 use cases on graphs & analytics solutions
Making connections matter: 2 use cases on graphs & analytics solutionsMaking connections matter: 2 use cases on graphs & analytics solutions
Making connections matter: 2 use cases on graphs & analytics solutionsNeo4j
 
GDPR: Leverage the Power of Graphs
GDPR: Leverage the Power of GraphsGDPR: Leverage the Power of Graphs
GDPR: Leverage the Power of GraphsNeo4j
 

La actualidad más candente (20)

Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
 
Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You
 
GraphTour - Neo4j Platform Overview
GraphTour - Neo4j Platform OverviewGraphTour - Neo4j Platform Overview
GraphTour - Neo4j Platform Overview
 
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4jNeo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
 
4. Document Discovery with Graph Data Science
 4. Document Discovery with Graph Data Science 4. Document Discovery with Graph Data Science
4. Document Discovery with Graph Data Science
 
A Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain OptimizationA Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain Optimization
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j
 
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4jNeo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j
Neo4j GraphTalk Amsterdam - Next Generation Solutions using Neo4j
 
Experiments With Knowledge Graphs in Fisheries & Oceans Canada
Experiments With Knowledge Graphs in Fisheries & Oceans CanadaExperiments With Knowledge Graphs in Fisheries & Oceans Canada
Experiments With Knowledge Graphs in Fisheries & Oceans Canada
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AI
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best PracticesNeo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
 
Neo4j GraphTalk Florence - Introduction to the Neo4j Graph Platform
Neo4j GraphTalk Florence - Introduction to the Neo4j Graph PlatformNeo4j GraphTalk Florence - Introduction to the Neo4j Graph Platform
Neo4j GraphTalk Florence - Introduction to the Neo4j Graph Platform
 
Graph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jGraph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4j
 
Neo4j im Einsatz gegen Geldwäsche und Finanzbetrug
Neo4j im Einsatz gegen Geldwäsche und FinanzbetrugNeo4j im Einsatz gegen Geldwäsche und Finanzbetrug
Neo4j im Einsatz gegen Geldwäsche und Finanzbetrug
 
Neo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time Analytics
 
GraphTour 2020 - Neo4j: What's New?
GraphTour 2020 - Neo4j: What's New?GraphTour 2020 - Neo4j: What's New?
GraphTour 2020 - Neo4j: What's New?
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Making connections matter: 2 use cases on graphs & analytics solutions
Making connections matter: 2 use cases on graphs & analytics solutionsMaking connections matter: 2 use cases on graphs & analytics solutions
Making connections matter: 2 use cases on graphs & analytics solutions
 
GDPR: Leverage the Power of Graphs
GDPR: Leverage the Power of GraphsGDPR: Leverage the Power of Graphs
GDPR: Leverage the Power of Graphs
 

Similar a How do You Graph

aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RGraphRM
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBMarco Segato
 
Evolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistEvolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistTony Rogerson
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfajajkhan16
 
MySQL JSON Document Store - A Document Store with all the benefits of a Trans...
MySQL JSON Document Store - A Document Store with all the benefits of a Trans...MySQL JSON Document Store - A Document Store with all the benefits of a Trans...
MySQL JSON Document Store - A Document Store with all the benefits of a Trans...Olivier DASINI
 
Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYC
Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYCHands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYC
Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYCLaura Ventura
 
If NoSQL is your answer, you are probably asking the wrong question.
If NoSQL is your answer, you are probably asking the wrong question.If NoSQL is your answer, you are probably asking the wrong question.
If NoSQL is your answer, you are probably asking the wrong question.Lukas Smith
 
Ramya ppt.pptx
Ramya ppt.pptxRamya ppt.pptx
Ramya ppt.pptxRRamyaDevi
 
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?IJCSIS Research Publications
 
Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastEric Kavanagh
 
Databases benoitg 2009-03-10
Databases benoitg 2009-03-10Databases benoitg 2009-03-10
Databases benoitg 2009-03-10benoitg
 
Heterogeneous Data - Published
Heterogeneous Data - PublishedHeterogeneous Data - Published
Heterogeneous Data - PublishedPaul Steffensen
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesDataStax
 
The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...Domino Data Lab
 

Similar a How do You Graph (20)

Some NoSQL
Some NoSQLSome NoSQL
Some NoSQL
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con R
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDB
 
Evolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistEvolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/Specialist
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdf
 
Road Map for Careers in Big Data
Road Map for Careers in Big DataRoad Map for Careers in Big Data
Road Map for Careers in Big Data
 
Report 2.0.docx
Report 2.0.docxReport 2.0.docx
Report 2.0.docx
 
MySQL JSON Document Store - A Document Store with all the benefits of a Trans...
MySQL JSON Document Store - A Document Store with all the benefits of a Trans...MySQL JSON Document Store - A Document Store with all the benefits of a Trans...
MySQL JSON Document Store - A Document Store with all the benefits of a Trans...
 
On no sql.partiii
On no sql.partiiiOn no sql.partiii
On no sql.partiii
 
Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYC
Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYCHands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYC
Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYC
 
If NoSQL is your answer, you are probably asking the wrong question.
If NoSQL is your answer, you are probably asking the wrong question.If NoSQL is your answer, you are probably asking the wrong question.
If NoSQL is your answer, you are probably asking the wrong question.
 
Ramya ppt.pptx
Ramya ppt.pptxRamya ppt.pptx
Ramya ppt.pptx
 
Hadoop bank
Hadoop bankHadoop bank
Hadoop bank
 
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
 
Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory Webcast
 
Databases benoitg 2009-03-10
Databases benoitg 2009-03-10Databases benoitg 2009-03-10
Databases benoitg 2009-03-10
 
Heterogeneous Data - Published
Heterogeneous Data - PublishedHeterogeneous Data - Published
Heterogeneous Data - Published
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...
 

Último

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 

Último (20)

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 

How do You Graph

  • 1. © DataStax, All Rights Reserved. Confidential How Do *You* Do Graph? Ben Krug Technical Support Engineer, DataStax 1
  • 2. Who am I? © 2016 DataStax, All Rights Reserved. 2 A Technical Support Engineer at DataStax. Previously, Support Engineer at MySQL, then Sun, then Oracle. Before that, a DBA / sysadmin for banks, utilities, startups, medical and insurance companies, etc. Over 25 years in DBMSs, from ISAM and hierarchical to relational, NoSQL, and graph. Blogs: formerly oracle2mysql.wordpress.com, now intertubes.wordpress.com Disclaimer: Any opinions given are my own!
  • 3. My topic: How to look at graphs (the best way?) © 2016 DataStax, All Rights Reserved. 3 ● This will be an opinionated discussion! ● Is there a best way? ● We've probably all done a lot of relational - does that help?
  • 4. Goals: © 2016 DataStax, All Rights Reserved. 4 ● Discuss DM theory (compare and contrast) and some FUD ● Give an overview of some tools, in the context of the discussion (focused on Tinkerpop, Spark, etc, relating to the DSE Graph implementations)
  • 5. 1st, what's a (property) graph? © 2016 DataStax, All Rights Reserved. 5 ● A collection of labeled nodes and (directed) edges ● Formally, one example of a definition is: G = (V,E,λ), where V is a set of vertices, E (V ×V) is a multi-set of directed binary edges, and λ : ((V⊆ ∪ E) × Σ ) → (U (V E)) is a partial function that maps an element/string pair to an object in the universal∗ ∪ set U (excluding vertices and edges as allowed property values).* * The Gremlin Graph Traversal Machine and Language, Marko A. Rodriguez, 2015 Proceedings of the ACM Database Programming Languages Conference
  • 6. By contrast, what's a relational database? © 2016 DataStax, All Rights Reserved. 6 ● A collection of rows and columns, organized into tables? ● wikipedia: a digital database based on the relational model of data, as proposed by E. F. Codd in 1970. ● google dictionary: a database structured to recognize relations among stored items of information. ● Formally, one example of a definition is: ? ○ Maybe we could base one on the relational algebra, but it's all very wordy and difficult to pin down concisely. ● Or, we can say an RDBMS is one that adheres to "Codd's 12 rules" which might mean that none truly exist (see, eg, rule 6, the "view updating rule)!
  • 7. By contrast, what's a relational database? © 2016 DataStax, All Rights Reserved. 7 ● A collection of rows and columns, organized into tables? ● wikipedia: a digital database based on the relational model of data, as proposed by E. F. Codd in 1970. ● google dictionary: a database structured to recognize relations among stored items of information. ● Formally, one example of a definition is: ? ● We can base it on the relational algebra, but it's all very wordy and difficult to pin down concisely. ● Or, we can say an RDBMS is one that adheres to "Codd's 12 rules" which might mean that none truly exist (see, eg, rule 6, the "view updating rule) Let's pretend that we know what we mean - basically, tables of rows and columns, normalized to some degree, with integrity constraints, "etc". Importantly, it separates logical view from physical storage.
  • 8. Things we might hear (that I disagree with) © 2016 DataStax, All Rights Reserved. 8 ● Graph is an entirely new world, wholly distinct and separate from relational. Relational is just a ball and chain. "The data explosion demands new solutions, yet the hoary old RDBMS still rules." (InfoWorld)
  • 9. Things we might hear (that I disagree with) © 2016 DataStax, All Rights Reserved. 9 ● Graph is an entirely new world, wholly distinct and separate from relational. Relational is just a ball and chain. "The data explosion demands new solutions, yet the hoary old RDBMS still rules." (InfoWorld) ● Relational is about a static view of bits of data, not relations. (!) "Joins are bad, mkay?" (https://oracle2mysql.wordpress.com/2016/02/18/joins-are-bad-mkay/)
  • 10. Things we might hear (that I disagree with) © 2016 DataStax, All Rights Reserved. 10 ● Graph is an entirely new world, wholly distinct and separate from relational. Relational is just a ball and chain. "The data explosion demands new solutions, yet the hoary old RDBMS still rules." (InfoWorld) ● Relational is about a static view of bits of data, not relations. (!) "Joins are bad, mkay?" (https://oracle2mysql.wordpress.com/2016/02/18/joins-are-bad-mkay/) ● "Native" graph databases must be better than "non-native". (for a good - and fun - rebuttal of this, see https://www.datastax.com/dev/blog/a-letter- regarding-native-graph-databases)
  • 11. Things we might hear (that I disagree with) © 2016 DataStax, All Rights Reserved. 11 ● Graph is an entirely new world, wholly distinct and separate from relational. Relational is just a ball and chain. "The data explosion demands new solutions, yet the hoary old RDBMS still rules." (InfoWorld) ● Relational is about a static view of bits of data, not relations. (!) "Joins are bad, mkay?" (https://oracle2mysql.wordpress.com/2016/02/18/joins-are-bad-mkay/) ● "Native" graph databases must be better than "non-native". (for a fun rebuttal of this, see https://www.datastax.com/dev/blog/a-letter-regarding-native- graph-databases) ● Joins are inherently slower than graph traversals. this one calls for its own slide...
  • 12. Are joins slower than traversals? It depends... © 2016 DataStax, All Rights Reserved. 12 Is O(k) faster than O(log(n)) when k << n? (k is constant) Not always! Eg, Friends of Friends - say k=avg # of friends, n=number of people. (This is a common example given.) O(k) = O(1) - it must be fast, right? Not necessarily… For example, "read an entry from a list of length n": This is not entirely facetious! The clock time depends on the algorithms that store, read, and deserialize the data, and what data they need to process in order to find the results. O(1) algorithm: read the entire disk, and return the the entry asked for. O(log(n)) algorithm: use an index to find the entry, and return it. (eg, see An Evaluation of Alternative Physical Graph Data Designs for Processing Interactive Social Networking Actions, Ghandeharizadeh, Boghrati, and Barahmand, Database Laboratory Technical Report, Computer Science Department, USC, 2014)
  • 13. Examples from a graph company's site © 2016 DataStax, All Rights Reserved. 13 This all comes from the first page that came up (paid) when I googled "index free adjacency". Or, it's the first link if you google "graph databases future". "Some graph databases use native graph storage that is specifically designed to store and manage graphs – from bare metal on up. Other graph technologies use relational, columnar or object- oriented databases as their storage layer. Non-native storage is often slower than a native approach because all of the graph connections have to be translated into a different data model." This, we've already discussed - what is "native" graph storage? What guarantees that it's faster? "Native graph processing (a.k.a. index-free adjacency) is the most efficient means of processing data in a graph because connected nodes physically point to each other in the database. Non-native graph processing engines use other means to process Create, Read, Update or Delete (CRUD) operations that aren’t optimized for handling connected data." As you scale to multiple nodes, "physically pointing" becomes meaningless. And again, what guarantees that it's faster? And the last sentence is just FUD, any mature database is heavily optimized for CRUD, as are RDBMSs. (For a nice rebuttal, focusing on a single server implementation, see https://www.arangodb.com/2016/04/index-free-adjacency-hybrid-indexes-graph-databases/ .)
  • 14. Examples from a graph company's site (cont) © 2016 DataStax, All Rights Reserved. 14 This all comes from the first page that came up (paid) when I googled "index free adjacency". Or, it's the first link if you google "graph databases future". "With traditional databases, relationship queries come to a grinding halt as the number and depth of relationships increase. In contrast, graph database performance stays constant even as your data grows year over year." - will traversals stay constant as edges increase? What about queries that span the graph? - for one example of numbers - https://intertubes.wordpress.com/2017/11/28/benchmarketing-neo4j-and- mysql/ "With graph databases, your IT and data architecture teams move at the speed of business because the structure and schema of a graph data model flex as your solutions and industry change. Your team doesn’t have to exhaustively model your domain ahead of time (and then exhaustively remodel and migrate the DB after some exec asks for a change)." This overstates the impacts of a schema change in a well-designed system. "your application doesn’t have to infer data connections using things like foreign keys or out-of-band processing, like MapReduce." Edges are not so different than foreign keys; is it really so difficult to "infer"? What does "out-of-band" mean?
  • 15. Now, on to the reality (as I see it) © 2016 DataStax, All Rights Reserved. 15
  • 16. Now, on to the reality (as I see it) © 2016 DataStax, All Rights Reserved. 16 ● For data models, there's often a mapping between graph DBs with and relational DBs.
  • 17. Now, on to the reality (as I see it) © 2016 DataStax, All Rights Reserved. 17 ● For data models, there's often a mapping between graph DBs and relational DBs It's hard to formalize, because relational is hard to formalize.
  • 18. Now, on to the reality (as I see it) © 2016 DataStax, All Rights Reserved. 18 ● For data models, there's often a mapping between graph DBs and relational DBs It's hard to formalize, because relational is hard to formalize. A picture, with a common example of a graph DB:
  • 19. Now, on to the reality (as I see it) © 2016 DataStax, All Rights Reserved. 19 ● For data models, there's often a mapping between graph DBs and relational DBs It's hard to formalize, because relational is hard to formalize. ● How about integrity constraints? ○ attributes must be of the types specified in the schema ○ (nullable) FKs must reference a field in child table; an edge must point to a vertex.
  • 20. As a result, many ways to view a graph © 2016 DataStax, All Rights Reserved. 20 ● A graph view (gremlin) ● A relational view (SparkSQL, dseGraphFrames, Studio (JDBC/ODBC)) ● A graphical graph view (Studio) ● sometimes, the underlying storage view (in DSE, CQL (for 7eet h4x0rs))
  • 21. How about data access methods? © 2016 DataStax, All Rights Reserved. 21 ● It's often said that SQL/CQL are declarative ● gremlin offers an imperative access method ● Is declarative for relational, and imperative for graphs? ● It's not so simple...
  • 22. CQL is declarative; is SQL declarative? © 2016 DataStax, All Rights Reserved. 22 ● A lot of SQL is declarative. declarative is "relational", esp in early SQL. ● SQL '99 offers CTEs, which have an imperative element. ● Many people use UDFs with SQL/CQL, or embed SQL/CQL in imperative (procedural) code. ● Most SQL vendors offer extenstions, like PL/SQL or TSQL, with imperative (procedural) elements.
  • 23. Even declarative SQL can be imperative-ish © 2016 DataStax, All Rights Reserved. 23 ● Consider an equality lookup, joined to a child, joined to a child. (Eg, Friends of Friends - a "graphy" query.) ● We know this will be implemented in one way - look up the key, go look up the child, then look at its child (like a traversal). ● Does that make declarative SQL imperative?
  • 24. gremlin is declarative and imperative © 2016 DataStax, All Rights Reserved. 24 ● An example: managers of "gremlin's" collaborators: declarative: g.V().match( as("a").has("name","gremlin"), as("a").out("created").as("b"), as("b").in("created").as("c"), as("c").in("manages").as("d"), where("a",neq("c"))). select("d"). groupCount().by("name") imperative: g.V().has("name","gremlin").as("a" ). out("created").in("created"). where(neq("a")). in("manages"). groupCount().by("name") taken from https://tinkerpop.apache.org/gremlin.html
  • 25. comparing: SQL and gremlin © 2016 DataStax, All Rights Reserved. 25 There's a nice site comparing SQL and gremlin for a graph database with a schema: sql2gremlin.com. Some examples: SELECT DISTINCT LEN(CategoryName) FROM Categories SELECT ProductName, UnitPrice FROM Products WHERE UnitPrice >= 5 AND UnitPrice < 10 SELECT Products.ProductName FROM Products INNER JOIN Categories ON Categories.CategoryID = Products.CategoryID WHERE Categories.CategoryName = 'Beverages' g.V().hasLabel("category").values("name"). map {it.get().length()}.dedup() g.V().has("product", "unitPrice", between(5f,10f)).valueMap("name", "unitPrice") g.V().has("name","Beverages").in("inCategory"). values("name")
  • 26. SparkSQL is declarative and imperative © 2016 DataStax, All Rights Reserved. 26 ● The data model currently exposed is a bit hard, very unnormalized - vertices table and edges table ● eg (using Studio), to find killrvideo movies Daniel Day-Lewis acted in: select movie.title from killrvideo_vertices actor join killrvideo_edges acts_in join killrvideo_vertices movie on actor.name = 'Daniel Day-Lewis' and acts_in.dst = actor.id and acts_in.src = movie.id; ● imperative, because you can use spark functionality
  • 27. "relational" - DseGraphFrames ● DseGraphFrames are an extenstion of Spark GraphFrames ○ GraphFrames are built on DataFrames. (DataFrames add a bit of relational ease to Spark.) ● Some simple examples (from our docs and databricks docs): g.E().groupCount().by(T.label) g.V().has("age", P.gt(30)).show g.V().hasLabel("person").drop().iterate() ● DseGraphFrames support GraphX (and other) Spark libraries ○ GraphX has functions for PageRank, connected components, shortest path, etc 27
  • 28. You can also use CQL to peek under the hood © 2016 DataStax, All Rights Reserved. 28 ● However, this is generally not so useful cassandra@cqlsh:killrvideo> select * from person_p where name = 'Daniel Day-Lewis' allow filtering; community_id | member_id | ~~property_key_id | ~~property_id | name | personId | ~~vertex_exists ----------------------+----------------+----------------------------+----------------------------------------------------------+-------------------------+--------------+---------------------- 1916035712 | 76 | 32771 | 00000000-0000-8003-0000-000000000000 | Daniel Day-Lewis | null | null … cassandra@cqlsh:killrvideo> select * from person_e where community_id = 1916035712 and member_id = 76; community_id | member_id | ~~edge_label_id | ~~adjacent_vertex_id | ~~adjacent_label_id | ~~edge_id | ~~edge_exists | ~~simple_edge_id ---------------------+-----------------+-------------------------+-------------------------------------------+------------------------------+-------------------------------------------------------- +-----------------------------+------------------ 1916035712 | 76 | 65571 | 0x1121e2800000000000000205 | 4 | a018fe3c-b87f-11e8-9882-f533cf6b1c0d | True | null ... This is an implementation, storage detail, not a logical view!
  • 29. My vision - all-in-one ● Seemlessly conceptualize and access your graph in many ways ○ declarative, imperative, graphy, "relational" - avoid FUD! ● We're going in that direction! ● Use the best tools for the job at hand. 29
  • 30. How does DSE integrate all this and help? ● DSE Graph implements Tinkerpop, gremlin, DseGraphFrames, and more (eg, Solr integration). ● gremlin gives you powerful imperative and declarative methods, for traversals and graph analyses. ● SparkSQL/DseGraphFrames do too, with the power of Spark. ● gremlin language variants (GLVs) allow you to program in your favorite (supported) language. ● Solr can be integrated, which adds a declarative element to narrow queries. ● gremlin OLAP uses SparkSQL to speed complex queries. ● DSE Studio offers graphical views and gremlin and SparkSQL access. 30
  • 31. © DataStax, All Rights Reserved. Confidential How Do *You* Do Graph? Ben Krug Technical Support Engineer, DataStax ben.krug@datastax.com 31

Notas del editor

  1. interest in theory and history of data models do we need to throw out the baby with the bath water? how many have some experience with, or knowledge of tinkerpop (especially gremlin)?
  2. It&amp;apos;s a bit philosphical want to convince you that graph and relational are not as different as you think they are - they are different, but not *as* different
  3. not experienced in semantic or knowledge graphs, they may be less amenable to the coming ideas In mathematics, a multiset (aka bag or mset) is a modification of the concept of a set that, unlike a set, allows for multiple instances for each of its elements.
  4. also, I think: relational algebra would be more about access methods than the data model rule 6: All views that are theoretically updatable are also updatable by the system.
  5. Note: this means that I&amp;apos;m calling Cassandra w/CQL &amp;quot;relational&amp;quot;. It&amp;apos;s a very loose use of the word. Now, I want to clear the air a bit about &amp;quot;relational vs graph&amp;quot;. If we&amp;apos;re going to consider graph DBs, we need to honestly look at their features and uses (and advantages), not just say &amp;quot;relational bad, graph good&amp;quot;.
  6. the quote is just an example I found on the a first try google search. Not the worst!
  7. native vs non-native, and Marko&amp;apos;s paper - &amp;quot;native storage of a graph&amp;quot; - how do you serialize a graph (my problem for this talk - how to serialize all these related ideas!)
  8. also (more importantly?), both can expand exponentially, as in Denise&amp;apos;s talk, as you repeat.
  9. in reality, look at each implementation, and its performance on the kinds of accesses you&amp;apos;ll need to do.
  10. schema changes also depend on the implementation - graphs can have schemas, and this was a question after Denise&amp;apos;s talk.
  11. In particular, for property graphs, that have some type of schema.
  12. note - this is not the way we, or SparkSQL, map the graph to tables! The idea is, they are not separate universes.
  13. FKs - otoh, there&amp;apos;s no system-enforced &amp;quot;on delete cascade&amp;quot; Where do these similarities lead us?...
  14. what kind of languages should we use? Is there a &amp;quot;best&amp;quot;? one note: declarative are nice for optimizers, imperative are nice if you know best; gremlin &amp;gt; SQL, but relational languages were designed to be declarative in order to enforce and protect data integrity, not by lack of imagination.
  15. CQL, Cassandra(NoSQL), relational - &amp;quot;just when I thought I was out, they pull me back in!&amp;quot;
  16. the point is not that it&amp;apos;s truly imperative, but that imperative traversals and declarative ones, in themselves, may not be so different, as far as &amp;quot;imperative vs declarative&amp;quot; go. next, gremlin
  17. this is a bit like the &amp;quot;imperative&amp;quot; SQL traversal query mentioned - both do the same thing, may well take the same steps. next SparkSQL
  18. couldn&amp;apos;t do gremlin2sql w/all of gremlin site uses T-SQL, so possibly you could do gremlin2TSQL? The point is to compare, and also, if you&amp;apos;re used to relational, this can help you get started with gremlin queries. (Or, if you are, also see https://academy.datastax.com/content/gremlin-traversals - google &amp;quot;gremlin recipes datastax&amp;quot;)
  19. tradeoffs - knowledge/semantic graph example (no schema) - BUT, you get the power of Spark for analyzing pieces of the graph Studio is a great and easy tool to do a lot of these things … discuss its options, schema views (hidden tables in CQL view), etc
  20. I say &amp;quot;relational&amp;quot; in that this is a declarative element in Spark, as if you&amp;apos;re dealing with a vertices table and an edges table. Can also combine graph and non-graph data using SparkSQL, Spark, DseGraphFrames, DataFrames useful for loading data into a graph
  21. nodes get _p tables, edges get _e tables. Note, this is like x$ tables in Oracle - not documented, subject to change, etc
  22. bicycle and car example - one day, trying to adjust seat height (wrong spot), then also disconnecting battery (engine light)