This presentation covers several aspects of modeling data and domains with a graph database like Neo4j. The graph data model allows high fidelity modeling. Using the first class relationships of the graph model allow to use much higher forms of normalization than you would use in a relational database.
Video here: https://vimeo.com/67371996
9. A graph database...
7
NO: not for charts & diagrams, or vector artwork
YES: for storing data that is structured as a graph
10. A graph database...
7
NO: not for charts & diagrams, or vector artwork
YES: for storing data that is structured as a graph
remember linked lists, trees?
11. A graph database...
7
NO: not for charts & diagrams, or vector artwork
YES: for storing data that is structured as a graph
remember linked lists, trees?
graphs are the general-purpose data structure
12. A graph database...
7
NO: not for charts & diagrams, or vector artwork
YES: for storing data that is structured as a graph
remember linked lists, trees?
graphs are the general-purpose data structure
“A relational database may tell you the average age of everyone
in this place,
but a graph database will tell you who is most likely to buy you a
beer.”
37. 12
“There is a significant downside - the whole approach works
really well when data access is aligned with the aggregates, but
what if you want to look at the data in a different way? Order
entry naturally stores orders as aggregates, but analyzing
product sales cuts across the aggregate structure. The
advantage of not using an aggregate structure in the database
is that it allows you to slice and dice your data different ways
for different audiences.
This is why aggregate-oriented stores talk so much about map-
reduce.”
Martin Fowler
Aggregate Oriented Model
38. 13
The connected data model is based on fine grained elements
that are richly connected, the emphasis is on extracting many
dimensions and attributes as elements.
Connections are cheap and can be used not only for the
domain-level relationships but also for additional structures
that allow efficient access for different use-cases. The fine
grained model requires a external scope for mutating
operations that ensures Atomicity, Consistency, Isolation and
Durability - ACID also known as Transactions.
Michael Hunger
Connected Data Model
53. // lookup starting point in an index
START n=node:People(name = ‘Andreas’)
Andreas
You traverse the graph
21
54. // lookup starting point in an index
START n=node:People(name = ‘Andreas’)
Andreas
You traverse the graph
21
// then traverse to find results
START me=node:People(name = ‘Andreas’
MATCH (me)-[:FRIEND]-(friend)-[:FRIEND]-(friend2)
RETURN friend2
56. SELECT skills.*, user_skill.*
FROM users
JOIN user_skill ON users.id = user_skill.user_id
JOIN skills ON user_skill.skill_id = skill.id WHERE users.id = 1
22
START user = node(1)
MATCH user -[user_skill]-> skill
RETURN skill, user_skill
62. Need to model the relationship
language_code
language_name
word_count
Language
country_code
country_name
flag_uri
language_code
Country
63. What if the cardinality changes?
language_code
language_name
word_count
country_code
Language
country_code
country_name
flag_uri
Country
64. Or we go many-to-many?
language_code
language_name
word_count
Language
country_code
country_name
flag_uri
Country
language_code
country_code
LanguageCountry
65. Or we want to qualify the relationship?
language_code
language_name
word_count
Language
country_code
country_name
flag_uri
Country
language_code
country_code
primary
LanguageCountry
70. What’s different?
๏ Implementation of maintaining relationships is left up
to the database
๏ Artificial keys disappear or are unnecessary
๏ Relationships get an explicit name
• can be navigated in both directions
79. Anti-Pattern: Node represents multiple
concepts
name
flag_uri
language_name
number_of_words
yes_in_language
no_in_language
currency_code
currency_name
Country
80. USES_CURRENCY
Split up in separate concepts
name
flag_uri
currency_code
currency_name
Country
name
number_of_words
yes
no
Country
SPEAKS
Currency
currency_code
currency_name
81. Challenge: Property or Relationship?
๏ Can every property be replaced by a relationship?
• Hint: triple stores. Are they easy to use?
๏ Should every entities with the same property values be
connected?
82. Object Mapping
๏ Similar to how you would map objects to a relational
database, using an ORM such as Hibernate
๏ Generally simpler and easier to reason about
๏ Examples
• Java: Spring Data Neo4j
• Ruby: Active Model
๏ Why Map?
• Do you use mapping because you are scared of SQL?
• Following DDD, could you write your repositories
directly against the graph API?
84. Relationships for querying
๏ like in other databases
• same structure for different use-cases (OLTP and
OLAP) doesn‘t work
• graph allows: add more structures
๏ Relationships should the primary means to access
nodes in the database
๏ Traversing relationships is cheap – that’s the whole
design goal of a graph database
๏ Use lookups only to find starting nodes for a query
Data Modeling examples in Manual
93. Evolution: Relationship to Node
59
Peter
SENT_EMAIL
Michael
Peter EMAIL_FROM
Michael
EMAIL_TO
Email
Emil
EMAIL_CC
Community
TAGGED
. . .
see Hyperedges
94. Combine multiple Domains in a Graph
๏ you start with a single domain
๏ add more connected domains as your system evolves
๏ more domains allow to ask different queries
๏ one domain „indexes“ the other
๏ Example Facebook Graph Search
• social graph
• location graph
• activity graph
• favorite graph
• ...
95. Notes on the Graph Data Model
๏Schema free, but constraints
๏Model your graph with a whiteboard and a wise man
๏Nodes as main entities but useless without connections
๏Relationships are first level citizens in the model and database
๏Normalize more than in a relational database
๏use meaningful relationship-types, not generic ones like IS_
๏use in-graph structures to allow different access paths
๏evolve your graph to your needs, incremental growth
61
103. [A] ACL from Hell
๏ Customer:
• leading consumer utility company with tons and
tons of users
๏ Goal:
• comprehensive access control administration
for customers
๏ Benefits:
• Flexible and dynamic architecture
• Exceptional performance
• Extensible data model supports new
applications and features
• Low cost
64
104. [A] ACL from Hell
๏ Customer:
• leading consumer utility company with tons and
tons of users
๏ Goal:
• comprehensive access control administration
for customers
๏ Benefits:
• Flexible and dynamic architecture
• Exceptional performance
• Extensible data model supports new
applications and features
• Low cost
64
• A Reliable access control administration system for
5 million customers, subscriptions and agreements
• Complex dependencies between groups, companies,
individuals, accounts, products, subscriptions, services and
agreements
• Broad and deep graphs (master customers with 1000s of
customers, subscriptions & agreements)
105. [A] ACL from Hell
๏ Customer:
• leading consumer utility company with tons and
tons of users
๏ Goal:
• comprehensive access control administration
for customers
๏ Benefits:
• Flexible and dynamic architecture
• Exceptional performance
• Extensible data model supports new
applications and features
• Low cost
64
• A Reliable access control administration system for
5 million customers, subscriptions and agreements
• Complex dependencies between groups, companies,
individuals, accounts, products, subscriptions, services and
agreements
• Broad and deep graphs (master customers with 1000s of
customers, subscriptions & agreements)
name: Andreas
subscription: sports
service: NFL
account: 9758352794
agreement: ultimate
owns
subscribes to
has plan
includes
provides group: graphistas
promotion: fall
member of
offered
discounts
company: Neo
Technologyworks with
gets discount on
subscription: local
subscribes to
provides service: Ravens
includes
107. [B] Timely Recommendations
๏ Customer:
• a professional social network
• 35 millions users, adding 30,000+ each day
๏ Goal: up-to-date recommendations
• Scalable solution with real-time end-user
experience
• Low maintenance and reliable architecture
• 8-week implementation
65
108. [B] Timely Recommendations
๏ Customer:
• a professional social network
• 35 millions users, adding 30,000+ each day
๏ Goal: up-to-date recommendations
• Scalable solution with real-time end-user
experience
• Low maintenance and reliable architecture
• 8-week implementation
65
๏ Problem:
• Real-time recommendation imperative to attract new
users and maintain positive user retention
• Clustered MySQL solution not scalable or fast enough
to support real-time requirements
๏ Upgrade from running a batch job
• initial hour-long batch job
• but then success happened, and it became a day
• then two days
๏ With Neo4j, real time recommendations
109. [B] Timely Recommendations
๏ Customer:
• a professional social network
• 35 millions users, adding 30,000+ each day
๏ Goal: up-to-date recommendations
• Scalable solution with real-time end-user
experience
• Low maintenance and reliable architecture
• 8-week implementation
65
๏ Problem:
• Real-time recommendation imperative to attract new
users and maintain positive user retention
• Clustered MySQL solution not scalable or fast enough
to support real-time requirements
๏ Upgrade from running a batch job
• initial hour-long batch job
• but then success happened, and it became a day
• then two days
๏ With Neo4j, real time recommendations
name:Andreas
job: talking
name: Allison
job: plumber
name: Tobias
job: coding
knows
knows
name: Peter
job: building
name: Emil
job: plumber
knows
name: Stephen
job: DJ
knows
knows
name: Delia
job: barking
knows
knows
name: Tiberius
job: dancer
knows
knows
knows
knows
111. [C] Collaboration on Global Scale
๏ Customer: a worldwide software leader
• highly collaborative end-users
๏ Goal: offer an online platform for global collaboration
• Highly flexible data analysis
• Sub-second results for large, densely-connected data
• User experience - competitive advantage
66
112. [C] Collaboration on Global Scale
๏ Customer: a worldwide software leader
• highly collaborative end-users
๏ Goal: offer an online platform for global collaboration
• Highly flexible data analysis
• Sub-second results for large, densely-connected data
• User experience - competitive advantage
66
• Massive amounts of data tied to members, user
groups, member content, etc. all interconnected
• Infer collaborative relationships through user-
generated content
• Worldwide Availability
113. [C] Collaboration on Global Scale
๏ Customer: a worldwide software leader
• highly collaborative end-users
๏ Goal: offer an online platform for global collaboration
• Highly flexible data analysis
• Sub-second results for large, densely-connected data
• User experience - competitive advantage
66
• Massive amounts of data tied to members, user
groups, member content, etc. all interconnected
• Infer collaborative relationships through user-
generated content
• Worldwide Availability
Asia North America Europe
114. [C] Collaboration on Global Scale
๏ Customer: a worldwide software leader
• highly collaborative end-users
๏ Goal: offer an online platform for global collaboration
• Highly flexible data analysis
• Sub-second results for large, densely-connected data
• User experience - competitive advantage
66
• Massive amounts of data tied to members, user
groups, member content, etc. all interconnected
• Infer collaborative relationships through user-
generated content
• Worldwide Availability
Asia North America Europe
Asia North America Europe
134. 68
Really, once you start
thinking in graphs
it's hard to stop
Recommendations MDM
Systems
Management
Geospatial
Social computing
Business intelligence
Biotechnology
Making Sense of all that
data
your brain
access control
linguistics
catalogs
genealogyrouting
compensation market vectors
135. 68
Really, once you start
thinking in graphs
it's hard to stop
Recommendations MDM
Systems
Management
Geospatial
Social computing
Business intelligence
Biotechnology
Making Sense of all that
data
your brain
access control
linguistics
catalogs
genealogyrouting
compensation market vectors
What will you build?