La strada verso il successo con i database a grafo, la Graph Data Science e l’AI generativa
1. La strada verso il successo
con i database a grafo, la
Graph Data Science e l’AI
generativa
Marco Bessi, PreSale Engineer
Neo4j Inc. All rights reserved 2023
5. Neo4j is a DB
Causal
clustering
ACID
transactions
High
availability
Binary
& HTTP
protocol
Official
drivers
5 Neo4j Inc. All rights reserved 2023
6. Neo4j is a GRAPH DB
Causal
clustering
ACID
transactions
High
availability
Binary
& HTTP
protocol
Official
drivers
Native
graph DB
Property
graph
model
Schema
free
Index-free
adjacency
Cypher
6 Neo4j Inc. All rights reserved 2023
7. Labeled property graph
model components
• Nodes
- Represent objects in the graph
• Relationships
- Relate nodes by type and direction
• Properties
- Name-value pairs that can go
on nodes and relationships
- Can have indexes and composite
indexes
• Labels
- Group nodes & shape the domain
7 Neo4j Inc. All rights reserved 2023
8. Hybrid workload duality
Intelligent
Operational
Systems
Graph Transactions,
Storage & Querying
Built for operational and analytical workloads
Better
Predictions
for Analytics
Graph Analytics, ML,
& Data Science
8 Neo4j Inc. All rights reserved 2023
10. GDS evolution
Local
Matching
Use embeddings to learn
the features in your
graph that you don’t even
know are important yet.
Train in-graph supervise
ML models to predict
links, labels and missing
data.
Global
Patterns
Graph
Representations
Use unsupervised
machine learning
techniques to identify
associations, anomalies,
and trends.
Graph analytics
Graph feature
engineering
Find the patterns
you’re looking for in
connected data.
Knowledge graphs
10 Neo4j Inc. All rights reserved 2023
11. 65+ Graph Data Science Techniques in Neo4j
Pathfinding &
Search
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• A* Shortest Path
• Yen’s K Shortest Path
• Minimum Weight Spanning Tree
• K-Spanning Tree (MST)
• Random Walk
• Breadth & Depth First Search
Centrality &
Importance
• Degree Centrality
• Closeness Centrality
• Harmonic Centrality
• Betweenness Centrality & Approx.
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Hyperlink Induced Topic Search (HITS)
• Influence Maximization (Greedy, CELF)
Community
Detection
• Triangle Count
• Local Clustering Coefficient
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity
• K-1 Coloring
• Modularity Optimization
• Speaker Listener Label Propagation
Supervised
Machine Learning
• Node Classification
• Link Prediction
… and more!
Heuristic Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors
Similarity
• Node Similarity
• K-Nearest Neighbors (KNN)
• Jaccard Similarity
• Cosine Similarity
• Pearson Similarity
• Euclidean Distance
• Approximate Nearest Neighbors (ANN)
Graph
Embeddings
• Node2Vec
• FastRP
• FastRPExtended
• GraphSAGE
• Synthetic Graph Generation
• Scale Properties
• Collapse Paths
• One Hot Encoding
• Split Relationships
• Graph Export
• Pregel API (write your own algos)
11 Neo4j Inc. All rights reserved 2023
12. Where is graph data science applied
Predictive
Maintenance
Churn
Prediction
Fraud
Detection
Life Sciences
Personalized
Recommendations
Cybersecurity
Disambiguation &
Segmentation
GenAI and KG
12 Neo4j Inc. All rights reserved 2023
14. In a world of infinite
content, knowledge
becomes valuable
Denny
Vrandečić
[WikiData]
[Semantic MediaWiki]
[Wikifunctions]
14 Neo4j Inc. All rights reserved 2023
15. $6.6 Trillion
In Economic Value
Generative AI is Predicted to Unlock
Up to 3.3% productivity improvement annually
15 Neo4j Inc. All rights reserved 2023
16. Generative AI
A branch of artificial intelligence that
focuses on creating models and
algorithms capable of generating new
and original content.
ChatGPT is a well known example of a
generative AI.
A type of Generative AI that is trained on
vast amounts of content.
Currently seen as the “GenAI for
language/text”.
ChatGPT is a Large Language Model.
Large Language Model
17. LLMs give us an amazing opportunity to:
Automate data
retrieval tasks 1
3
2
4
Expedite reading,
understanding, &
summarizing
Improve customer
service experiences
Generate content
& code
17 Neo4j Inc. All rights reserved 2023
18. Save time and money
Improve growth and retention:
Customer
Operations 1
3
2
4
Software Engineering
Marketing & Sales
R&D
75% of GenAI value will come from four areas
18 Neo4j Inc. All rights reserved 2023
19. But there are challenges:
1
3
2
4
5
Knowledge
cut-off
Can inherit bias
through training data
Reasonable answers,
not always accurate
Lack of enterprise
domain knowledge
Inability to verify or
attribute sources
19 Neo4j Inc. All rights reserved 2023
20. In summary
Learns random sentences from
random people
Talks like a person but doesn’t
really understand what it’s saying
Occasionally speaks absolute
non sense
Is a cute little bird
20 Neo4j Inc. All rights reserved 2023
21. How can you take advantage of
this massive opportunity while
overcoming these challenges?
21 Neo4j Inc. All rights reserved 2023
22. LLM + Graphs
I want to use
LLM’s in my
Graph for…
22 Neo4j Inc. All rights reserved 2023
23. LLM + Graphs
Front-end
Natural language
to cypher
I want to use
LLM’s in my
Graph for…
23 Neo4j Inc. All rights reserved 2023
25. How to help LLM do better?
Fine tuning
Few-shot
learning
Grounding
Provide additional
training data to better
tune GenAI to your
use case
Provide completed
examples “shots” to
the AI as context in
prompts. a.k.a
In-Context Learning
Provide AI with the
information to use for
generating responses
Neo4j as the data
source for Grounding
25 Neo4j Inc. All rights reserved 2023
26. Ground LLMs in Neo4j’s Knowledge Graph
26 Neo4j Inc. All rights reserved 2023
Retrieval Augmented
Generation (RAG):
retrieve data from outside a
foundation model and augment
your prompts by adding the
relevant retrieved data in
context
27. Neo4j’s Knowledge Graph
LLM + Neo4j’s
Knowledge
Graph
Improve
accuracy
Deploy with
confidence
Unlock
innovation
27 Neo4j Inc. All rights reserved 2023
28. Combine the power of LLM with the stored data of
your knowledge graph for more accurate responses
and to reduce hallucinations.
Improve Accuracy
Neo4j Inc. All rights reserved 2023
29. LLMs can help generate more accurate responses by considering the connections
and dependencies within the graph and mapping new links as new data is identified.
Flexible schema means it is
easy to grow your knowledge
base whenever new
information is available
Relationships are data that is
used to return explicit results
Accuracy
Vector search adds implicit
results showcasing similar
responses
29 Neo4j Inc. All rights reserved 2023
30. Knowledge retrieval with Neo4j
30 Neo4j Inc. All rights reserved 2023
CALL db.index.vector.queryNodes('products', 5, $embedding)
CALL apoc.ml.openai.embedding([$question], $apiKey)
CALL apoc.ml.openai.chat([{role:'system', content: $system},
{role:'user', content: $user}], $apiKey)
1
2
3
31. Add a layer of context
over your LLM for
accuracy and
specificity
Neo4j knowledge graphs supply the LLM with information about your company so answers
are specific to your business, giving more context for more accurate responses.
Specificity
31 Neo4j Inc. All rights reserved 2023
32. Ensure your grounding-partner meets security,
scalability, and regulatory requirements so you
can deploy enterprise-wide with confidence.
Deploy with Confidence
Neo4j Inc. All rights reserved 2023
33. Define policies by role or identity
Integrates with identity and access
management provider with SSO
LLM retrieves and returns information governed by your enterprise security and
access control policies-down to the node level.
Security & Privacy
Build constraints on nodes, labels,
relationships, properties, specific
parts of the graph, and even
traversal depth
33 Neo4j Inc. All rights reserved 2023
34. Easily scale with autonomous
clustering
Reliability
Neo4j knowledge graphs scale and are battle-tested to thousands of concurrent users,
get answers quickly with incredibly fast query speeds.
Incredibly fast traversals with index
free adjacency
34 Neo4j Inc. All rights reserved 2023
35. Add metadata or annotations
Map relationships between
search results and data source
nodes
Represent data sources as nodes
Verify the enriched responses from your LLM because each piece of information is
linked to its sources and origins.
Explainability
35 Neo4j Inc. All rights reserved 2023
36. Adopt innovation and iterate as your LLM
implementations evolve with full interoperability
across all cloud and data providers.
Unlock Innovation
Neo4j Inc. All rights reserved 2023
37. Any cloud, any data provider
Graph Data Science
BI & VISUALIZATIONS
INGEST
STORE
PROCESS
Apache
Kafka
MACHINE LEARNING
Cloud
Functions
Neo4j Bloom
PubSub
DataProc
Analytics
Feature
Engineering
Data
Exploration
Graph
Data
Science
Business
Applications &
Existing Systems
Files (unstructured,
structured)
TensorFlow
KNIME Python
Cloud Storage
AWS
Lambda
Graph Database
37 Neo4j Inc. All rights reserved 2023
38. Ground LLMs to improve accuracy and explainability with Neo4j’s
enterprise-ready knowledge graph built on a flexible schema that
integrates seamlessly with your GenAI tech platform.
Stay Grounded with Neo4j
Improve Accuracy Deploy with Confidence Unlock Innovation
Neo4j Inc. All rights reserved 2023
42. Better predictions with data you already have
● Traditional ML ignore network structure because it’s difficult to extract
● Add graphy data to existing ML pipelines to increase accuracy, or
● Graphs use relationships to unlock otherwise unattainable predictions
Machine Learning Pipeline
42 Neo4j Inc. All rights reserved 2023
43. Why do data science in Neo4j (VS Python/R )?
1
2
3
Performance: Specialized data structure; Good concurrency;
In-memory optimization & compression productions-ready.
Easy of use: Consistent syntax, including Cypher;
Comprehensive library (65+ vetted algorithms);
Persistence, every stage of the data science process
can be stored/not stored.
Product support: GDS / AuraDS is supported
by Neo4j, well documented, with a large
team of support engineers, and a sizeable
user community.
43 Neo4j Inc. All rights reserved 2023
44. Neo4j should be the database for Grounding
44 Neo4j Inc. All rights reserved 2023
Vector DB Limitations
Knowledge Graph
Strengths
Neo4j Differentiators
Similarity ≠ Relevance or
Accuracy
Black-Box (Sub-Symbolic)
Duplicate & incomplete
results
Missing reference information
Challenging to answer
multi-hop questions
Difficult for SME to correct
Relevancy beyond just
similarity
Transparent symbolic
representation
Condensed information
storage
References between
documents calculated before
query time
Enables human correction
LLMs understand Cypher
Vectors + Cypher
Index for many data types
(numeric, geopoints, dates)
Fine-grained security and
access control
ACID transactions,
high-availability, and scale
Ecosystem integration
Available on all clouds
Graph Data Science for
enhanced ML
45. New Neo4j features
From Neo4j v5.11
1
2
APOC OpenAI API:
CALL apoc.ml.openai.embedding(['Some Text'], $apiKey, {})
CALL apoc.ml.openai.completion('Question', $apiKey, {config}) yield
value;
CALL apoc.ml.query("Question") yield value, query
...
Vector index & search:
Index nodes on float or double array properties.
Based on Lucene which uses Hierarchical Navigable Small-World graphs (HNSW).
Created and queried using procedures.
48. Creation
CALL db.index.vector.createNodeIndex(
)
indexName, // Name of the index to create (STRING)
label, // Label of nodes to index (STRING)
propertyKey, // Property key to index (STRING)
dimensions, // Dimensionality of vectors to index (INTEGER)
similarityFunction // “EUCLIDEAN” or “COSINE”
(case-insensitive STRING)
50. Writing vector efficiently
CALL db.create.setVectorProperty(
)
node, // Nodes to add the property to (NODE)
propertyKey, // Property key to write to (STRING)
vector, // Property value to write (LIST<FLOAT>)