An introduction to Graph Databases and Neo4j, given at the Silicon Valley Java User Group in April 2014. Abstract follows:
Hey, so did you know that graph databases are the fastest growing category in databases today? No? Don’t worry, most don’t. But today we will have as our guest the founder of the popular open source graph database Neo4j, to tell you the back-story.
Graph databases have been popularized by leading social web properties like Facebook and LinkedIn. So it’s only natural that most geeks out there equate graphs with social, but that’s only part of the story. Thousands of companies in a wide range of industries have quietly adopted Neo4j across a broad range of business critical uses. Forrester Research now predicts that by 2017, over 25% of enterprises will be using a graph database. So if you haven’t already encountered one, chances are you might soon!
Neo4j is mostly written in Java (some Scala!), but usable across all of the popular languages. This session will introduce fundamental graph database concepts and then dive into a hands-on code-centric introduction to Neo4j. We will cover the declarative query language Cypher, introduce operational concerns such as clustering and horizontal scalability, and describe popular graph database use cases.
You will leave with a solid understanding of graph database fundamentals and an appreciation for when graph databases are a good fit in the real world.
Heart Disease Classification Report: A Data Analysis Project
Intro to Graph Databases and Neo4j (SV JUG Apr, 2014)
1. Neo Technology, Inc Confidential
Emil Eifrem
emil@neotechnology.com
@emileifrem
#neo4j
Graph All Teh Things!!!11
An Introduction to Graph Databases and Neo4j
SiliconValley JUG, 2014
Thursday, April 17, 14
2. Neo Technology, Inc Confidential
Agenda
• Graphs Are Eating The World
• Wait! What Is A Graph Anyway?
• Use Cases
Thursday, April 17, 14
3. Neo Technology, Inc Confidential
WARNING!
ALL I’M OFFERING IS THE TRUTH
Thursday, April 17, 14
6. Neo Technology, Inc Confidential
Graphs Are Eating
The World
Thursday, April 17, 14
7. Neo Technology, Inc Confidential
Graphs Are Eating The World
Thursday, April 17, 14
8. Neo Technology, Inc Confidential
“Graph analysis is the true killer app for Big Data.”
Graphs Are Eating The World
http://blogs.forrester.com/james_kobielus/11-12-19-the_year_ahead_in_big_data_big_cool_new_stuff_looms_large
Thursday, April 17, 14
9. Neo Technology, Inc Confidential
“[I]t is arguable that graph databases will have a
bigger impact on the database landscape than
Hadoop or its competitors.”
http://www.bloorresearch.com/blog/IM-Blog/2012/5/graph-databases-nosql.html
Graphs Are Eating The World
Thursday, April 17, 14
10. Neo Technology, Inc Confidential
Graphs Are Eating The World
Thursday, April 17, 14
11. Neo Technology, Inc Confidential
Graphs Are Eating The World
http://www.asterdata.com/resources/video-discovery-platform.phpttp://www.zdnet.com/teradata-aster-gets-graph-database-hdfs-compatible-file-store-7000021667/
Thursday, April 17, 14
12. Neo Technology, Inc Confidential
http://www.forrester.com/TechRadar+Enterprise+DBMS+Q1+2014/fulltext/-/E-RES106801
Graphs Are Eating The World
“Forrester estimates that over 25% of enterprises
will be using graph databases by 2017”
Thursday, April 17, 14
13. Neo Technology, Inc Confidential
Graphs Are Eating The World
*Source: http://db-engines.com/en/ranking
Thursday, April 17, 14
14. Neo Technology, Inc Confidential
Graphs Are Eating The World
Finance
Accenture
Thursday, April 17, 14
15. Neo Technology, Inc Confidential
What Is A Graph,
Anyway?
Thursday, April 17, 14
16. Neo Technology, Inc Confidential
Graphs!
Dancing With
Michael Jackson
Eating Brains
Thursday, April 17, 14
17. Neo Technology, Inc Confidential
Dancing With
Michael Jackson
Eating Brains
Not really.
These are Charts!
Thursday, April 17, 14
19. Neo Technology, Inc Confidential
What Is “Graph Search,” Anyway?
Thursday, April 17, 14
20. Neo Technology, Inc Confidential
Cypher
LOVES
A B
Graph PatternsASCII art
MATCH (A) -[:LOVES]-> (B)
WHERE A.name = "A"
RETURN B
Thursday, April 17, 14
21. Neo Technology, Inc Confidential
MATCH (me:Person)-[:IS_FRIEND_OF]->(friend:Person),
(friend)-[:LIKES]->(restaurant),
(restaurant)-[:LOCATED_IN]->(newyork:City),
(restaurant)-[:SERVES]->(sushi:Cuisine)
WHERE me.name = 'Emil' AND newyork.location='New York' AND
sushi.cuisine='Sushi'
RETURN restaurant.name
http://maxdemarzi.com/?s=facebook
Thursday, April 17, 14
23. “Find all direct reports and how many they manage, up to 3 levels down”
Example HR Query (using SQL)
Thursday, April 17, 14
24. *“Find all direct reports and how many they manage, up to 3 levels down”
(SELECT T.directReportees AS directReportees, sum(T.count) AS count
FROM (
SELECT manager.pid AS directReportees, 0 AS count
FROM person_reportee manager
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
UNION
SELECT manager.pid AS directReportees, count(manager.directly_manages) AS count
FROM person_reportee manager
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
UNION
SELECT manager.pid AS directReportees, count(reportee.directly_manages) AS count
FROM person_reportee manager
JOIN person_reportee reportee
ON manager.directly_manages = reportee.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
UNION
SELECT manager.pid AS directReportees, count(L2Reportees.directly_manages) AS count
FROM person_reportee manager
JOIN person_reportee L1Reportees
ON manager.directly_manages = L1Reportees.pid
JOIN person_reportee L2Reportees
ON L1Reportees.directly_manages = L2Reportees.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
) AS T
GROUP BY directReportees)
UNION
(SELECT T.directReportees AS directReportees, sum(T.count) AS count
FROM (
SELECT manager.directly_manages AS directReportees, 0 AS count
FROM person_reportee manager
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
UNION
SELECT reportee.pid AS directReportees, count(reportee.directly_manages) AS count
FROM person_reportee manager
JOIN person_reportee reportee
ON manager.directly_manages = reportee.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
UNION
(continued from previous page...)
SELECT depth1Reportees.pid AS directReportees,
count(depth2Reportees.directly_manages) AS count
FROM person_reportee manager
JOIN person_reportee L1Reportees
ON manager.directly_manages = L1Reportees.pid
JOIN person_reportee L2Reportees
ON L1Reportees.directly_manages = L2Reportees.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
) AS T
GROUP BY directReportees)
UNION
(SELECT T.directReportees AS directReportees, sum(T.count) AS count
FROM(
SELECT reportee.directly_manages AS directReportees, 0 AS count
FROM person_reportee manager
JOIN person_reportee reportee
ON manager.directly_manages = reportee.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
UNION
SELECT L2Reportees.pid AS directReportees, count(L2Reportees.directly_manages) AS
count
FROM person_reportee manager
JOIN person_reportee L1Reportees
ON manager.directly_manages = L1Reportees.pid
JOIN person_reportee L2Reportees
ON L1Reportees.directly_manages = L2Reportees.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
) AS T
GROUP BY directReportees)
UNION
(SELECT L2Reportees.directly_manages AS directReportees, 0 AS count
FROM person_reportee manager
JOIN person_reportee L1Reportees
ON manager.directly_manages = L1Reportees.pid
JOIN person_reportee L2Reportees
ON L1Reportees.directly_manages = L2Reportees.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
)
Example HR Query (using SQL)
Thursday, April 17, 14
25. MATCH
(boss)-‐[:MANAGES*0..3]-‐>(sub),
(sub)-‐[:MANAGES*1..3]-‐>(report)
WHERE
boss.name
=
“John
Doe”
RETURN
sub.name
AS
Subordinate,
count(report)
AS
Total
Same Query in Cypher
*“Find all direct reports and how many they manage, up to 3 levels down”
Thursday, April 17, 14
26. Neo Technology, Inc Confidential
What About Performance?
Database # persons query time
MySQL
Neo4j
Neo4j
1,000
๏a sample social graph
•with ~1,000 persons
๏average 50 friends per person
๏pathExists(a,b) limited to depth 4
๏caches warmed up to eliminate disk I/O
Thursday, April 17, 14
27. Neo Technology, Inc Confidential
What About Performance?
Database # persons query time
MySQL
Neo4j
Neo4j
1,000 2,000 ms
๏a sample social graph
•with ~1,000 persons
๏average 50 friends per person
๏pathExists(a,b) limited to depth 4
๏caches warmed up to eliminate disk I/O
Thursday, April 17, 14
28. Neo Technology, Inc Confidential
Database # persons query time
MySQL
Neo4j
Neo4j
1,000 2,000 ms
1,000 2 ms
๏a sample social graph
•with ~1,000 persons
๏average 50 friends per person
๏pathExists(a,b) limited to depth 4
๏caches warmed up to eliminate disk I/O
What About Performance?
Thursday, April 17, 14
29. Neo Technology, Inc Confidential
Database # persons query time
MySQL
Neo4j
Neo4j
1,000 2,000 ms
1,000 2 ms
1,000,000
๏a sample social graph
•with ~1,000 persons
๏average 50 friends per person
๏pathExists(a,b) limited to depth 4
๏caches warmed up to eliminate disk I/O
What About Performance?
Thursday, April 17, 14
30. Neo Technology, Inc Confidential
Database # persons query time
MySQL
Neo4j
Neo4j
1,000 2,000 ms
1,000 2 ms
1,000,000 2 ms
๏a sample social graph
•with ~1,000 persons
๏average 50 friends per person
๏pathExists(a,b) limited to depth 4
๏caches warmed up to eliminate disk I/O
What About Performance?
Thursday, April 17, 14
31. Neo Technology, Inc Confidential
“Our
Neo4j
solution
is
literally
thousands
of
times
faster
than
the
prior
MySQL
solution,
with
queries
that
require
10-‐100
times
less
code.”
-‐
Volker
Pacher,
Senior
Developer
eBay
What About Performance?
Thursday, April 17, 14
37. Neo Technology, Inc Confidential
Network Management - Impact Analysis
// Server 1 Outage
MATCH (n)<-[:DEPENDS_ON*]-(upstream)
WHERE n.name = "Server 1"
RETURN upstream
Practical Cypher
upstream
{name:"Webserver VM"}
{name:"Public Website"}
Thursday, April 17, 14
38. Neo Technology, Inc Confidential
Network Management - Dependency Analysis
// Public website dependencies
MATCH (n)-[:DEPENDS_ON*]->(downstream)
WHERE n.name = "Public Website"
RETURN downstream
Practical Cypher
downstream
{name:"Database VM"}
{name:"Server 2"}
{name:"SAN"}
{name:"Webserver VM"}
{name:"Server 1"}
Thursday, April 17, 14
39. Neo Technology, Inc Confidential
Network Management - Statistics
// Most depended on component
MATCH (n)<-[:DEPENDS_ON*]-(dependent)
RETURN n,
count(DISTINCT dependent)
AS dependents
ORDER BY dependents DESC
LIMIT 1
Practical Cypher
n dependents
{name:"SAN"} 6
Thursday, April 17, 14
50. Neo Technology, Inc Confidential
• Provider Graph
(e.g. referrals, patient management, research)
• Patient Graph
(support communities, doctor recommendations, clinical trials)
• Bioinformatic Graph
(drug research, genetic screening, bioengineering, etc.)
• Master Data Graph
(biological master data, evolutionary taxonomy,
the access control graph, etc.)
• Treatment Graph
(collaborative medicine, clinical trials, etc.)
5 Graphs of Health Care
Thursday, April 17, 14
51. Neo Technology, Inc Confidential
Graph Databases
The Definitive Book on Graph Databases
Available as a free PDF download at
graphdatabases.com!
Thursday, April 17, 14
52. Neo Technology, Inc Confidential
Brown
Bag
Lunch
By request only!
• you bring 10+ colleagues
• you provide a room with a projector + screen
• we bring a bag lunch
• we introduce Neo4j to your team
in 45 min + 15 min for Q&A
Schedule your Neo4j Intro now!
Thursday, April 17, 14
53. • The premier source for training on Graph
Databases and Neo4j
• Join us for the ‘All You Can Graph Day’ on May 9 in
Palo Alto!
• Sign up at graphacademy.com
Thursday, April 17, 14
54. • Wednesday, Oct. 22 - graphconnect.com
• Only conference focused on graphDBs
and applications powered by graphs
• Register now for $99 Alpha Geek Passes
GRAPHCONNECT
SF 2014
Thursday, April 17, 14
55. • 4/24 - GraphPUB Silicon Valley at Tied House
• 5/1 - Uncovering Invisible Relationships with
a GraphDB at Medium
• 5/6 - GraphPANEL Silicon Valley at AOL
• And more at: meetup.com/graphdb-sf
UPCOMING EVENTS
GRAPHPUB
NEO4J
20 14
Thursday, April 17, 14