Graphs – or information about the relationships, connection, and topology of data points – are transforming machine learning. We’ll walk through real world examples of how to get transform your tabular data into a graph and how to get started with graph AI. This talk will provide an overview of how we to incorporate graph based features into traditional machine learning pipelines, create graph embeddings to better describe your graph topology, and give you a preview of approaches for graph native learning using graph neural networks. We’ll talk about relevant, real world case studies in financial crime detection, recommendations, and drug discovery. This talk is intended to introduce the concept of graph based AI to beginners, as well as help practitioners understand new techniques and applications. Key take aways: how graph data can improve machine learning, when graphs are relevant to data science applications, what graph native learning is and how to get started.
2. Alicia Frame, PhD
Senior Data Scientist, Neo4j
Transforming AI with Graphs:
Real World Examples with Spark & Neo4j
#UnifiedDataAnalytics #SparkAISummit
3.
4. Financial Services Drug Discovery Recommendations
Cybersecurity Predictive Maintenance
Customer Segmentation
Churn Prediction Search/MDM
Graph Data Science Applications
5. CAR
DRIVES
name: “Dan”
born: May 29, 1970
twitter: “@dan”
name: “Ann”
born: Dec 5, 1975
since:
Jan 10, 2011
brand: “Volvo”
model: “V70”
Latitude: 37.5629900°
Longitude: -122.3255300°
Nodes
• Can have Labels to classify nodes
• Labels have native indexes
Relationships
• Relate nodes by type and direction
Properties
• Attributes of Nodes & Relationships
• Stored as Name/Value pairs
• Can have indexes and composite indexes
MARRIED TO
LIVES WITH
OW
NS
PERSON PERSON
7
Labeled Property Graphs
6. • Current data science models ignore network structure
• Graphs add highly predictive features to existing ML models
• Otherwise unattainable predictions based on relationships
Novel & More Accurate Predictions
with the Data You Already Have
Machine Learning Pipeline
7. “The idea is that graph networks are bigger than
any one machine-learning approach.
Graphs bring an ability to generalize about
structure that the individual neural nets don't have.”
"Where do the
graphs come from
that
graph networks
operate over?”
8. Building a Graph ML Model
Data
Sources
Native Graph
Platform
Machine
Learning
Aggregate Disparate Data
and Cleanse
Build Predictive ModelsUnify Graphs and Engineer
Features
Parquet JSON
and more…
MLlib
and more…
9. Spark Graph Native Graph
Platform
Machine Learning
Example: Spark & Neo4j Workflow
Graph
Transactions
Graph
Analytics
Cypher 9 in Spark 3.0
to create non-
persistent graphs
MLlib to Train Models
Native Graph Algorithms,
Processing, and Storage
10. Explore Graphs
Build Graph
Solutions
• Massively scalable
• Powerful data pipelining
• Robust ML Libraries
• Non-persistent, non-native graphs
• Persistent, dynamic graphs
• Graph native query and algorithm
performance
• Constantly growing list of graph
algorithms and embeddings
11. The Steps of Graph Data Science
Query Based
Knowledge
Graph
Query Based
Feature
Engineering
Graph
Algorithm
Feature
Engineering
Graph
Embeddings
Graph Neural
Networks
DataScienceComplexity
Knowledge
Graphs
Graph Feature
Engineering
Graph Native
Learning
Graph Persistence
12. Steps Forward in Graph Data Science
Query Based
Knowledge
Graph
Query Based
Feature
Engineering
Graph Algorithm
Feature
Engineering
Graph
Embeddings
Graph Neural
Networks
Enterprise Maturity
DataScienceComplexity
13. Query based knowledge graphs:
Connecting the Dots at NASA
“Using Neo4j someone from our Orion project found information from the Apollo
project that prevented an issue, saving well over two years of work and one
million dollars of taxpayer funds.”
14. Steps Forward in Graph Data Science
Query Based
Knowledge
Graph
Graph
Algorithm
Feature
Engineering
Graph
Embeddings
Graph Neural
Networks
Query Based
Feature
Engineering
Enterprise Maturity
DataScienceComplexity
15. Churn prediction research
has found that simple hand-
engineered features are highly
predictive
• How many calls/texts has
an account made?
• How many of their contacts
have churned?
Query-Based Feature Engineering
Telecom-churn prediction
Telecommunication
networks are easily
represented as graphs
17. Spark Graph Native Graph
Platform
Machine Learning
• Merge distributed data
into DataFrames
• Reshape your tables
into graphs
• Explore cypher queries
• Move to Neo4j to build
expert queries
• Persist your graph
Knowledge Graphs:
Getting Started Example with Spark
• Bring query based
graph features to ML
pipeline
Graph
Transactions
Graph
Analytics
18. Steps Forward in Graph Data Science
Query Based
Feature
Engineering
Graph
Embeddings
Graph Neural
Networks
Query Based
Knowledge
Graph
Graph
Algorithm
Feature
Engineering
Enterprise Maturity
DataScienceComplexity
19. Feature Engineering is how we combine and process the
data to create new, more meaningful features, such as
clustering or connectivity metrics.
Graph Feature Engineering
Add More Descriptive Features:
- Influence
- Relationships
- Communities
20. Graph Feature Categories & Algorithms
Pathfinding
& Search
Finds the optimal paths or
evaluates
route availability and quality
Centrality /
Importance
Determines the importance of
distinct nodes in the network
Community
Detection
Detects group clustering or
partition options
Heuristic
Link Prediction
Estimates the likelihood of nodes
forming a relationship
Evaluates how alike nodes
are
Similarity
Embeddings
Learned representations
of connectivity or topology
21. • Connected components to identify
disjointed graphs sharing identifiers
• PageRank to measure influence and
transaction volumes
• Louvain to identify communities
that frequently interact
• Jaccard to measure account
similarity based on relationships
Financial Crime: Detecting Fraud
Large financial institutions already have existing pipelines
to identify fraud via heuristics and models
Graph based features improve accuracy:
22. +142,000 Peer Reviewed Publications
Graph Fraud / Anomaly Detection
in the last 10 years
23. Spark Graph Native Graph
Platform
Machine Learning
• Merge distributed data
into DataFrames
• Reshape your tables
into graphs
• Explore cypher queries
and simple algorithms
• Persist your graph
• Create rule based
features
• Run native graph
algorithms and write to
graph or stream
Graph Feature Engineering:
Getting Started Example with Spark
• Bring graph features
to ML pipeline for
training
Graph
Transactions
Graph
Analytics
24. Graph Algorithms in Neo4J
• Parallel Breadth First Search
• Parallel Depth First Search
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• Minimum Spanning Tree
• A* Shortest Path
• Yen’s K Shortest Path
• K-Spanning Tree (MST)
• Random Walk
• Degree Centrality
• Closeness Centrality
• CC Variations: Harmonic, Dangalchev,
Wasserman & Faust
• Betweenness Centrality
• Approximate Betweenness Centrality
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Triangle Count
• Clustering Coefficients
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity – 1 Step & Multi-Step
• Balanced Triad (identification)
• Euclidean Distance
• Cosine Similarity
• Jaccard Similarity
• Overlap Similarity
• Pearson Similarity
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Similarity
neo4j.com/docs/
graph-algorithms/current/
Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors
25. Graph Algorithms in Neo4J
• Parallel Breadth First Search
• Parallel Depth First Search
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• Minimum Spanning Tree
• A* Shortest Path
• Yen’s K Shortest Path
• K-Spanning Tree (MST)
• Random Walk
• Degree Centrality
• Closeness Centrality
• CC Variations: Harmonic, Dangalchev,
Wasserman & Faust
• Betweenness Centrality
• Approximate Betweenness Centrality
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Triangle Count
• Clustering Coefficients
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity – 1 Step & Multi-Step
• Balanced Triad (identification)
• Euclidean Distance
• Cosine Similarity
• Jaccard Similarity
• Overlap Similarity
• Pearson Similarity
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Similarity
neo4j.com/docs/
graph-algorithms/current/
Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors
26. Steps Forward in Graph Data Science
Query Based
Knowledge
Graph
Graph
Algorithm
Feature
Engineering
Graph Neural
Networks
Query Based
Feature
Engineering
Graph
Embeddings
Enterprise Maturity
DataScienceComplexity
27. Embedding transforms graphs into a vector, or set of
vectors, describing topology, connectivity, or attributes of
nodes and edges in the graph
Graph Embeddings
• Vertex embeddings: describe connectivity of each node
• Path embeddings: traversals across the graph
• Graph embeddings: encode an entire graph into a single vector
30. Spark Graph Native Graph
Platform
Machine Learning
• Merge distributed data into
DataFrames
• Reshape your tables
into graphs
• Explore cypher queries and
simple algorithms
• Move to Neo4j to build
expert queries
• Write to persist
• Stay tuned for DeepWalk
and DeepGL algorithms
Graph Feature Engineering:
Getting Started Example with Spark
• Bring graph features to
ML pipeline for training
Graph
Transactions
Graph
Analytics
31. Steps Forward in Graph Data Science
Query Based
Knowledge
Graph
Graph
Algorithm
Feature
Engineering
Query Based
Feature
Engineering
Graph Neural
Networks
Graph
Embeddings
Enterprise Maturity
DataScienceComplexity
32. Deep Learning refers to training multi-layer neural
networks using gradient descent
Graph Native Learning
33. Graph Native Learning refers to deep learning models
that take a graph as an input, performs computations,
and return a graph
Graph Native Learning
Battaglia et al, 2018
34. Example: electron path prediction
Bradshaw et al, 2019
Graph Native Learning
Given reactants and reagents, what will the
products be?
Given reactants and reagents, what will the
products be?