4. The Solution?
> -- First degree
> SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title
FROM cast WHERE actor_name='Kevin Bacon')
> -- Second degree
> SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title
FROM cast WHERE actor_name IN (SELECT actor_name FROM cast WHERE movie_title IN
(SELECT DISTINCT movie_title FROM cast WHERE actor_name='Kevin Bacon')))
> -- Third degree
> SELECT actor_name FROM cast WHERE movie_title IN(SELECT DISTINCT movie_title
FROM cast WHERE actor_name IN (SELECT actor_name FROM cast WHERE movie_title IN
(SELECT DISTINCT movie_title FROM cast WHERE actor_name IN (SELECT
actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title FROM
cast WHERE actor_name='Kevin Bacon'))))
10. Warning: Computer Science Ahead
A graph is an ordered pair G = (V, E)
where V is a set of vertices and
E is a set of edges,
which are pairs of vertices in V.
16. Back to Bacon
START s=node:actors(name="Keanu Reeves"),
e=node:actors(name="Kevin Bacon")
MATCH p = shortestPath( s-[*]-e )
RETURN p, length(p)
http://tinyurl.com/c65d99w
17. ACL
• Users can belong to groups
• Groups can belong to groups
• Groups and users have permissions on objects
o read
o write
o denied
19. START u=node:users(name="User 2"),
o=node:objects(name="Home")
MATCH u-[:belongs_to*0..]->g,
g-[:can_read]->o
RETURN g
http://tinyurl.com/dx7onro
20. START u=node:users(name="User 3"),
o=node:objects(name="Users 1 Blog")
MATCH u-[:belongs_to*0..]->g,
g-[:can_read]->o,
u-[d?:denied*]->o
WHERE d is null
RETURN g
http://tinyurl.com/bwtyhvt
21. Real Life Example
• Companies have brands, locations, location groups
• Brands have locations, location groups
• Location groups have locations
23. START b=node:brands(name="Brand 1")
MATCH b<-[:HAS*]-c-[:HAS*]->l<-[h?:HAS*]-b
WHERE h IS NULL AND l.type='location'
RETURN l ORDER BY l.name
http://tinyurl.com/cl537w6
24. Tweet
@chicken_tech
we should be using graph dbs!
25. But Wait...There's More!
• Mutating Cypher (insert, update)
• Indexing (auto, full-text, spatial)
• Batches and Transactions
• Embedded (for JVM) or REST
26. Where fore art thou, RDB?
• Aggregation
• Ordered data
• Truly tabular data
• Few or clearly defined relationships
* Goal here is to inspire further investigation * Not going to go into nuts & bolts * Docs are amazing!
* graph db usage poll
* Six degrees game * Relational databases can't easily answer certain types of questions * arbitrary path query * the basic unit of social networking
* Each degree adds a join * Increases complexity * Decreases performance * Stop when the actor you're looking for is in the list
* this problem highlights the ugly truth about RDBs * they weren't designed to handle these types of problems. * RDB relationships join data, but are not data in themselves * arbitrary path query * RDB does &quot;query&quot;, not &quot;path&quot; * certainly not &quot;arbitrary&quot;
* Gather everything in the set that matches these criteria, then tell me if this thing is in the set * 1 set, no problem * 2nd set no problem * 3rd set not related to 1st * 4th not related to 2nd * 5th related to 1st and 4th * etc. * Relationships are only available between overlapping sets
* Neo4j * AGPL for community * ACID compliant * High Availablity mode * Embedded and REST
* Neo4j * AGPL for community * ACID compliant * High Availablity mode * Embedded and REST * Bindings for every language
* graph theory * edges can be ordered or unordered pairs * vocab: - vertex -> node - edge -> relationship
* Tree data-structures * Networks * Maps * vehicles on streets == packets through network * social networking * manufacturing * fraud detection * supply chain
* Make each record a node * Make every foreign key a relationship * RDB indexes are usually stored in a tree structure * Trees are graphs * Why not use RDBs? * The trouble with RDBs is how they are stored in memory and queried * Require a translation step from memory blocks to graph structure * ORMs hide the problem, but do not solve it * Relationships not first-class citizens * Many problem domains map poorly to rows/tables
The zen of graph databases
* Social networking - friends of friends of friends of friends * Assembly/Manufacturing - 1 widget contains 3 gadgets each contain 2 gizmos * Map directions - starting at my house find a route to the office that goes past the pub * Multi-tenancy - root node per tenant * all queries start at root * No overlap between graphs = no accidental data spillage * Fraud: track transactions back to origination * Pretty much anything that can be drawn on a whiteboard
* Example: retail system * Customer makes Order * Store sells Order * Order contains Items * Supplier supplied Items * Customer rates Items * Did this customer rank supplier X highly? * Which suppliers sell the highest rated items? * Does item A get rated higher when ordered with Item B? * All can be answered with RDBs as well * Not as elegant * Not as performant
* Actors are nodes * Movies are nodes * Relationship: Actor is IN a movie * Compare to degree selection join queries
* all groups user 3 is a member of directly or inherited
* does user 2 have permission to read the home page?
* does user 3 have permission to read the user 1's blog?
* Find all locations in company
* For a given brand, find all locations not under that brand
* RDBs are really good at data aggregation * Set math, duh * Have to traverse the whole graph in order to do aggregation * Truly tabular means not a lot of relationships between the data types * Neo4j guys say: rdb will tell you the salary of everyone in the room; graph db will tell you who will buy you a beer
* Emil Eifrem (Neo Tech CEO) webinar * Check out 54 minute mark