Neo4j GraphTour New York_EY Presentation_Michael Moore
1. Roadmap for
Enterprise Graph Strategy
Michael Moore, Ph.D.
Executive Director, Enterprise Knowledge Graphs + AI
EY Performance Improvement Advisory
michael.moore4@ey.com
2. The Database Landscape is Changing
SQL RDBMS
Column
Document Key Value
Graph
SearchServerlessStreams In-Memory
Traditional Databases
& Data Warehousing
NoSQL Databases
Data Services & Data Processing
Batch MR Blockchain
2
3. Scale Out Scale Up Continued increase in capacity and dropping compute costs are challenging
scale-out commodity server assumptions, particularly for database workloads
20182017
2019 12TB RAM 2019 24TB RAM
4. Rankings Change in Popularity (db-engines.com)
*Proprietary method based on general interest, mentions, relevance in social networks, frequency of technical discussions etc.
Graph DBs
4
5. “We send email to people, so they will
visit our website and buy our product”
A Database specifically designed for creating, storing, and querying graphs
MATCH (e:Email)-[:SENT_TO]->
(p:Person {fullName: ’Steve Newman'})-[:VISITED]->
(w:Website)<-[:SOLD_ON]-(pr:Product)<-[:PURCHASED]-(p)
RETURN *
Semantic Representation
Graph Representation
Physical Representation
Email Person Website
Product
SENT VISITED
SOLD ONPURCHASED
► Graphs have all possible logical relationships precomputed, much, much faster than SQL
► Graphs are fast and easy understand, develop and use
► Graphs integrate well with applications and data sources, great for real-time digital workloads
► Graphs surface, unify and mobilize data held in silos and data lakes
What is a Graph Database? 5
11. Graph Use Cases
► Customer 360°
► Recommendation
Engines
► Marketing Attribution
► Enterprise Search
► Fraud Detection
► Master Data
Management
► Supply Chain
► Geolocation &
Routing
► Access & Asset
Control
► Social Networks
► IT & Network
Management
11
12. Real-Time, Evolving Graph View Across the Business
Data Ingestion, Cleansing, Reduction & Pipelining
Real-time BI & ScorecardsMobile & Web Applications Data Science
access control, metadata, recos, monitoring KPIs, targets, reporting, drill down/across attribution, similarity, fraud, pathing, cliques
Marketing ROI &
Digital Experience (CMO)
Data Governance &
Data Quality (CDO)
Operations & Risk
Management (CFO)
Account Coverage &
Customer LTV (CRO)
Product Marketing &
Recommendations (CPO)
UNSTRUCTURED
LEGACY
SNAPSHOTS
CONFORMED &
CURATED
STREAMS
Graphs Accelerate Enterprise Data Mobilization 12
13. 13Roadmap for Enterprise Graph Strategy
Small Team:
• Graph Architect
• Data Engineer
• Full-stack Developer
• Data Scientist
• Report Developer
Problem / Scope
What will the graph
solve?
Production BuildCloud PilotLocalhost POCGraphy Problem
Business need, Data sources Data modeling, API, example queries Data snapshot, reference architecture, API suite Hardening, scheduled & stream ETL, Live UX
Stakeholder Input
Graph Design
Data Work
APIs / Data Services
Integration / Refinement
Scale / Harden / Run
Validate
What questions can now
be answered?
Connect
Does the data support the
graph model and
semantics?
Mobilize
What data does the new
experience need?
Use Cases
What is the feedback
from the business on how
well the graph solves the
use case?
Deploy
What monitoring, testing,
process needs to be put
in place to achieve a
robust SLA?
Key Conversations
14. Talk to the business, pick a graphy problem
What is a “Graphy” problem?
• Requires many entities (eg many SQL tables, 360° views)
• Involves recursion (eg. SQL self joins)
• Has complex, potentially colliding, hierarchies (eg SQL 1 to many, many-to-many)
• Based on informatics of the relationships themselves (eg collaborative filtering shared
relationship counts, shortest path segment summations for wayfinding, cost/time
minimization for supply chain, money flows for finance)
• Requires mapping, direct or indirect across data sources (eg data lake unification)
• Demands fast query results (eg digital applications, search)
• Most importantly, go talk to the business – what are the analytics you’d like to have or
customer experiences you’d like to light up – but can’t because of our current data
limitations?
• What’s the most critical data that you’d like to see connected?
• What would be an example demo that you’d find compelling
(report/analysis/experience)
14
Production BuildCloud PilotLocalhost POCGraphy Problem
15. Get comfortable with Neo4j – don’t need to become an
expert
• Get hands on – be fearless! Neo4j is the easiest graph database
to learn.
• Install Neo4j, Apoc procedures, set the following in
Manage/Settings
#Apoc Plugin Configurations
apoc.import.file.enabled=true
apoc.export.file.enabled=true
dbms.security.procedures.unrestricted=*.*
• Go through the Cypher lessons, and learn basics graph modeling
and to load csv
LOAD CSV WITH HEADERS FROM "file:///movies.csv"
AS row
CALL apoc.load.csv(url,{}) YIELD map
• Any reasonably sized laptop should be able to handle a graph
with several million nodes and relationships You will quickly see
some of the significant benefits of connected data.
• For extra credit you can go onto github/neo4j-examples and
download starter applications for your favorite languages.
15
Production BuildCloud PilotLocalhost POCGraphy Problem
16. Design and build your POC Graph
• Start small and simple, limit yourself to 3-4 data sources, shallow extracts.
Snapshot SQL top queries for a pool of linked transactions
• Use common sense, business-friendly naming for your node labels and
relationship types. You’ll iterate this model using input from the business,
and the model should be clear and readable
• Don’t be afraid of recursion
(Employee)-[:REPORTS_TO]->(Employee) who is the boss?
• Don’t get too hung up on whether something should be a node label,
property, or relationship. Just keep in mind that node labels define set
members, and that it’s faster to search along relationships (traversal) than
properties (full graph scan)
• You can use call db.schema() to see the graph schema, and we often use
http://www.apcjones.com/arrows/# to build illustrative schemas for
conversations with business stakeholders
• Test your graph design by writing some example queries, do this with your
business stakeholder
• Does this look right to you – is this how you would whiteboard this process?
Am I missing any key entities or relationships?
16
Production BuildCloud PilotLocalhost POCGraphy Problem
17. 1
Example Customer 360° Graph Schema
Account
Transactions
Segments
Product
Interactions
17
Customer 360°
Graph
• Accurately captures full range of
customer touchpoints across
enterprise surface area
• Enables more insightful indirect
spend analytics for products and
services
• Reconciles product usage,
marketing interactions and digital
identity
• Integrates with execution layer for
AI driven UX
18. Example Knowledge Graph Schema
for Spend and Supply Chain Analytics
Supplier 360°
Spend Graph
• Accurately captures the sourcing
complexity of products and services
• Enables more insightful indirect
spend analytics for products and
services
• Reconciles line-item detail to top
parent company, across
intermediate entities
• Extensible for audit, fraud detection,
tracking & traceability
• Integrates with data lake, reporting
platforms and transactional
applications
Product Supply Chain Service Providers
Procurement
Top
Parent
Line
Item
Detail
Tracking and Traceability
Invoicing
Data fabric composed of nodes and relationships that
connect and mobilize data, using consistent semantics
18
19. Example B2B MDM Graph Schema
Product
Core Data Elements
Customer
& Contact
Orders
19
Master Data
Management
Graph Schema
• Accurately captures data lineage for
core identity components
• Provides ”Golden Record” from
multi-source probabilistic authority
scores
• Relates contacts, customers, orders
and products without loss of fidelity
• Enables detailed whitespace
analysis and next best sales action
• Integrates with data lake and CRM
applications
20. Design and build your POC Graph 20
Production BuildCloud PilotLocalhost POCGraphy Problem
• Breakthrough queries
• Graph algorithms
• Data unification & mobilization
• Use-case specific (Customer 360, Supply Chain, Fraud, Reco)
• Make a localhost graph->app stack so you understand how
parameterized Cypher & Bolt drivers work
• Use any of the neo4j-examples to jumpstart
• If you don’t want to spend time creating a REST API, check out
GraphQL and the GRAND stack (https://github.com/grand-
stack/grand-stack-starter)
• Focus on the business value of the new graph enabled
analytics –
We can now know this to make better decisions
We can now do this for our customers
21. Pick and build your demo application for your snapshot graph 21
Production BuildCloud PilotLocalhost POCGraphy Problem
• Pick a cloud or on-prem
• Use Marketplace images if possible
• Start with a single instance VM for Neo4j, (~ RAM 50% of SQL size)
• Attach external drives so you can scale the server
• Determine your stack architecture
• Understand your data processing requirements
• Install Python – very good for performing batch operations, pip neo4j-
driver
• Leverage Neo4j’s high speed loader
• Determine what cleansing needs to occur
• If you need help reach out to SI partner or Neo4j services
22. Pick and build your demo application for your snapshot graph 22
• MVP data domains
• Graph database, app-informed
• Simplest data service
• MVP app experience
• Add new experiences, same data
• Add new data domains
Nodejs, .Net, Python, React, Swift, Tableau, etc.
REST, Bolt
Production BuildCloud PilotLocalhost POCGraphy Problem
Michael’s I-Frame model For Graph ROI
Accelerate Graph-driven User Experiences
23. CRM
Reporting
(Tableau, PBI)
Blobs FilesQueuesTables
Azure Cloud Storage
AI Sandbox
(Azure ML Studio)
Stream ETL
(Azure Event Hub)
Audience
Manager
Campaign
Target
Experience
Manager
Analytics
Marketo
Engage
Adobe Experience Cloud
Scheduled
ETL
Data
Reduction
(Azure Spark)
Cloud Data Lake
In-Memory
Document Store
Data Models
(Azure Analysis
Services)
Data Catalog
(Azure Data
Catalog)
ERP
AZURE VPC
In-Memory
Knowledge Graph
Data Services APIs
REST
Ingest Batch
StoreIngest Real-time
SearchConsolidate
Connect & Unify
Mobilize
Semantic
Layer
Analytics
Layer
Azure Data
Factory
Automated Reports
and Dashboards
Consistent Metrics
Data Discovery
Retention Models
Deep Learning
In-Memory
Sessionization
Data Aggregation
Syndicated
Data and Analytics
Knowledge Graph
Customer/Contact 360° View
Marketing Attribution
Recommendations
Real-time
Document Search
Elastic SQL Repository for
Curated & Conformed Data
Data Staging
Elastic Repository for
Raw and Unstructured Data
Real Time Updates
Customer Events
Automated Data Loading
Triggered Marketing
Consistent Experience
Example Graph Architecture Execution
24. Reporting
(Tableau,QuickSight)
S3 Blobs FilesQueuesEBS Tables
AWS Cloud Storage
Data
Discovery
(AWS Athena)
Stream ETL
(AWS Kinesis)
Audience
Manager
Campaign
Target
Experience
Manager
Analytics
Marketo
Engage
Adobe Experience Cloud (Azure)
Scheduled ETL
(AWS Data Pipeline,
PDI Kettle)
Data
Reduction
(AWS EMR)
Cloud Data Lake
In-Memory
Document Store
Machine
Learning
(AWS SageMaker)
Data Catalog
(AWS Glue)
ERP
AWS VPC
In-Memory
Knowledge Graph
Data Services APIs
REST
Ingest Batch
StoreIngest Real-time
SearchConsolidate
Connect & Unify
Mobilize
Execution
Semantic
Layer
Analytics
Layer
Example Graph Architecture
Automated Reports
and Dashboards
Retention Models
Deep Learning
Data Discovery
Consistent
Data Models
Sessionization
Data Aggregation
Knowledge Graph
Customer/Contact 360° View
Marketing Attribution
Recommendations
Real-time
Document Search
Elastic SQL Repository for
Curated & Conformed Data
Data Staging
Elastic Repository for
Raw and Unstructured Data
CRM
Real Time Updates
Customer Events
Automated Data Loading
Triggered Marketing
Consistent Experience
Syndicated
Data and Analytics
25. Enterprise Knowledge Graph Development with Neo4j
• Locate and validate data lake tables
• Design test graph schema
• Estimate graph size from nodes, relationships and properties
• Configure Neo4j server to minimize SSD disk contention
• Prepare Hive queries to generate graph-form tables (nodes, relationships)
• Validate key uniqueness, string handling, character types, relationship mappings
• Export graph form tables to gzip csv files
• Iteratively test data loader scripts, file by file
• On successful completion of hydration, apply constraints and indexes, refactor as needed
Graph-form TablesData Lake Tables CSV.gz Files Load Script Data Store
EXTRACT EXTRACT HIGH SPEED LOADER
IMPORT DONE in 1h 29m 16s 530ms.
Imported:
458356377 nodes
2176603843 relationships
9064981812 properties
Peak memory usage: 9.46 GB
25
26. VOLVE KNOWLEDGE GRAPH DEMO:
Platform 360°
In-Memory Knowledge Graph
Connects Data Across Silos, Enables
High-Performance, Deep Analytics
In-Memory Document Database
Searches and Mobilizes
Semi-Structured Data (JSON)
Elastic Blob Storage
Hub for Unstructured Data
and Document Originals
North Sea Oil Field Upstream Data
MAERSK INSPIRER 2007 -2016
Public Domain Data
https://www.equinor.com/en/how-and-why/digitalisation-in-our-dna/volve-field-data-village-download.html
Example data architecture for leveraging knowledge
graphs to connect and mobilize data across the enterprise
Volve Field Data Set
37,172 Files (5 TB):
• Geophysical Interpretations
• Reservoir Models
• Seismic Data
• Well Picks & Perforations
• Well Technical Data
• Daily Drilling Reports
• Daily Production Reports
• Realtime Drilling Logs
Enterprise Business Intelligence
Syndicate Data & Insights
Across the Organization
27. Polyglot Graph Data Processing
Extract XML,
Convert to JSON,
Load JSON with
Azure Blob URI
Extract and Load
Azure Blob URIs
Extract and Load
• Document Metadata
• Named Entities
• Map Relationships
• Text Summaries
Graph Analytics & Queries
Couchbase Full Text Search
Pointers to Azure Blob URIs
Leveraging fit-for-purpose storage:
Graph storage for unified many-to-many access to cross-domain data
Document storage for searchable access to semi-structured data
Blob storage repository for large, raw and unstructured data
37,157 blobs
5.5 TB
Unstructured:
Semi-Structured:
Load CSV to Graph
Structured:
20,573 JSONs
5 GB
Reports/Applications
Data Mobilization and Graph Unification – Full Lineage and Auditability
215K nodes & relationships
1.5 GB
27
28. Example Polyglot Discovery Graph Schema
Searchable Pointers to
Unstructured blobs
Text & Metrics from
Semi-Structured
data
Structured Data and Derived Entities
28
Data Discovery
Graph Schema
• Connects structured, semi-
structured and unstructured data
across polyglot storage
• Accurately handles complex data
and documents hierarchies
• Enables full text search in graph or
in document store, directly and via
NLP
• Provides source document access
through blob URLs
• Integrates with data lake, reporting
platforms and transactional
applications
29. 29
Example Neo4j Query – Contractors associated with delays due to fishing (downhole recovery)
interruptions
Contractor
Drilling Report
Drilling Activity
Activity Code
(interruption)
Activity Detail
(fishing)
WellBore
33. Neo4j – React Integration with GraphQL (GRAND Stack)
33
34. Example Unstructured Blob: PDF of Drilling Report, with Blob URI (Pointer to Original
Document)
34
https://fsodnastorage.blob.core.windows.net/volve-pub/Well_technical_data/Daily%20Drilling%20report%20-
%20PDF%20Version/15_9_19_A_1997_07_25.pdf?sv=2018-03-28&ss=bfqt&srt=sco&sp=rwdlacup
&se=2023-12-31T05%3A25%3A35Z&st=2019-05-
02T20%3A25%3A35Z&spr=https&sig=CP7XcAK2h3haZDASPdFMgpzlLtfcgxSC8WCUbm3RIQU%3D
35. 35Neo4j - Power BI Integration with GraphQL
Graph Database
Neo4j GraphQL API
2
3
4
1. Client issues GraphQL query
2. GraphQL API sends Cypher query to Neo4j
3. Response data sent to Client
4. Data updated in PBI report
GraphQL schema (neo4j-graphql-js)
m query cURL wrapper
PBI report
1
36. Challenge Approach Technology Benefits Value
How to do real-time,
comprehensive analytics
using data from multiple
sources, including
structured, semi-structured,
unstructured data
Use in-memory graph
database to unify and relate
data in a single knowledge
fabric with consistent
semantics and specific
analytical lenses
Neo4j • Consolidated repository for data and
relationships
• Excellent query performance
• Fast, flexible development
• Easy application integration
• Consumes data from lakes, DBs, files
• 360 view of
business and root
cause analytics
How to mobilize semi-
structured data locked up in
XML formats
Use in-memory document
database to store, index and
search JSON documents
converted from XML
Couchbase • Scalable in-memory repository for
large volumes semi-structured data
• Full-text search on all document
fields
• Excellent query performance
• Data discovery and
mobilization
How to manage and
mobilize vast quantities of
unstructured data
Use elastic blob storage to
catalog and curate a wide
variety of data types including
sensor, images, and PDFs
Azure • Cloud data repository
• Provides secure pointers to
data/document originals
• Data audit and
traceability
Solving Enterprise Data Challenges with Knowledge Graphs and Polyglot
Persistence 36
37. Production BuildCloud PilotLocalhost POCGraphy Problem
Go to Production 37
• Follow your IT best practices
• Security, assume you’ll be breached
• Deploy full environment set – Prod cluster, Stg
cluster, Test, Dev
• DevOps - leverage Jenkins, Ansible
• Wrap your solution in test automation
• Do load testing against your APIs to look for
additional optimization opportunities (Gatling)
• Monitor your logs (Splunk, Dynatrace)
• Monitor your common queries, refactor or reindex as
needed, optimize for speed
• Leverage the I-Frame Model to provide more value
38. 38Roadmap for Enterprise Graph Strategy
Small Team:
• Graph Architect
• Data Engineer
• Full-stack Developer
• Data Scientist
• Report Developer
Problem / Scope
What will the graph
solve?
Production BuildCloud PilotLocalhost POCGraphy Problem
Business need, Data sources Data modeling, API, example queries Data snapshot, reference architecture, API suite Hardening, scheduled & stream ETL, Live UX
Stakeholder Input
Graph Design
Data Work
APIs / Data Services
Integration / Refinement
Scale / Harden / Run
Validate
What questions can now
be answered?
Connect
Does the data support the
graph model and
semantics?
Mobilize
What data does the new
experience need?
Use Cases
What is the feedback
from the business on how
well the graph solves the
use case?
Deploy
What monitoring, testing,
process needs to be put
in place to achieve a
robust SLA?
Key Conversations
39. EY Cross-Sector Graph Experience: MDM, 360°, AML/Fraud, Recommenders 39
Fortune 100 Tech Company
Use Case:
Global B2B Account 360° view and
marketing attribution
Approach:
Neo4j graph with 500M nodes
and 2.2B relationships,
representing all known business
accounts, contacts and marketing
touches. Mastered data from
17disparate transactional sources
in Azure Data Lake. Supported in-
graph analytics for marketing
attribution and next best action
recommendations across global
geographies
Duration:
16 weeks to working graph
Fortune 100 Footwear Company
Use Case:
Converged Brick & Mortar +
Online Shopper 360° View
Approach:
Neo4j graph with 2B nodes and
relationships, representing sales
transactions for 40M shoppers
across 275 physical stores and the
ecommerce platform. Algorithmic
extraction and profiling from raw
XML records in AWS Hadoop,
MDM record concordance and in-
graph analytics for product
associations, store analytics and
recommendation services.
Duration:
12 weeks to working graph,
ongoing project through 2018
Fortune 500 Cruise Line Company
Use Case:
Shipboard and Shoreside
Recommendation Engine
Approach:
Neo4j graph deployable to
shipboard VM Ware data centers,
with streaming updates from
large shoreside Neo4j graph
integrating data from Azure
Cerebro, Adobe Experience
Manager and legacy transactional
systems. In-graph
analytics,services API,
recommendation engine for next
best activity for passengers
surfaced via mobile app
Duration:
12 weeks to working graph,
ongoing project through 2018
Fortune 100 Investment Firm
Use Case:
Enhanced Anti-Money Laundering
and Fraud Detection using
Graph+AI
Approach:
Neo4j graph of account 360° view
representing activity of 2M
accounts over 4 years. MDM and
entity extraction for account and
party identity elements from
enterprise Oracle system.
Network clustering, feature
engineering and graph embedding
in TensorFlow deep learning
classifier for suspicious activity
patterns across accounts and
between parties.
Duration:
16 weeks to working graph
Fortune 100 Tech Company
Use Case:
B2B Local Marketing Events
Recommendation Engine
Approach:
Neo4j graph and personalized
next best event recommendation
engine for B2B field marketers.
Reconciles physical and digital
event attendees with corporate
account structures for 10K
accounts and 5M contacts
Entities mastered from
transactional data in SQLServer
and Azure Data Lake.
Microservices APIs support data
syndication to martech
applications and PowerBI
reporting.
Duration:
10 weeks to working graph
40. Better Questions
How can I get more business value and
deeper insights from the data I already
have?
How can I get a better understanding of my
customers to create more relevant experiences?
How can I more effectively mobilize and
syndicate the data I’m ingesting?
What is the next best action I can take?
Thank
You!
40
41. Michael Moore, Ph.D.
Executive Director
► Michael Moore is an Executive Director and Practice Lead for Graph + AI
in EY’s Tech Consulting Emerging Technology (ET) Group
► Joined EY in 2017, based in the Seattle, WA office
► Ph.D. University of California, Berkeley
► B.S. & B.A. University of California, Santa Cruz
► Society Consulting – Graph Architect
Schema, ETL & systems design for a high-performance Neo4j graph database encompassing the totality of
Microsoft’s B2B data on Azure VM. Graph database supports multi-touch marketing attribution analytics and multi-
dimensional event-based audience segmentation & recommendations for direct marketing. Provided POC graph
reporting and visualization interfaces. Neo4j Enterprise edition, Python, Node.js, nGraph, Javascript.
► Microsoft Corporation – General Manager
Management of core BI infrastructure and measurement capabilities supporting Microsoft's global marketing budget
cascade, campaign reporting, pipeline reporting, incentive reporting, ROMI reporting, social and web analytics on
Microsoft.com for the Global Marketing Operations team. Management of complex projects across multiple
subsidiaries, agencies and vendors. Strategic focus on foundational database, digital and social marketing
capabilities including: marketing ROI, customer & channel partner engagement, marketing conversion, sales
pipeline, dynamic personalization, data mining, predictive modeling, behavioral segmentation, privacy governance,
web enablement, tracking & measurement, and internal & external data quality, and instrumentation process control.
► Grey San Francisco – VP Analytics
Responsible for ongoing campaign reporting, ROI analysis, creative and placement optimizations for agency clients.
Architected and deployed an enterprise OLAP reporting solution on Oracle RAC / Microstrategy to improve quality
and efficiency of analytics operations. Provided advanced analytical services to clients in retail, tech, banking and
automotive, including consulting, regression modeling and data mining.
Profile Select professional experience
Skills and tool knowledge
► Michael Moore, Ph.D. is an Executive Director in the Advisory Services practice of
Ernst & Young LLP. He is the National practice lead for Enterprise Knowledge
Graphs + AI in EY’s Data and Analytics (DnA) Group.
► Michael has industry and solution in customer experience, customer service, e-
commerce, ad-serving, web and media analytics, consumer loyalty and churn,
marketing optimization, enterprise and partner pipeline, and social media
► He specializes in graph database architecture, graph-based advanced analytics,
machine learning and recommender systems. Michael is certified Neo4j
Professional, and has active enterprise graph engagements in financial services,
tech, oil & gas, retail and hospitality sectors.
41