SlideShare una empresa de Scribd logo
1 de 42
Descargar para leer sin conexión
03/18/22 Heiko Paulheim 1
What do
Knowledge Graph Embeddings Learn?
includes: some new Adventures in RDF2vec
Heiko Paulheim
University of Mannheim
Heiko Paulheim
03/18/22 Heiko Paulheim 2
Graphs vs. Vectors
• Data Science tools for prediction etc.
– Python, Weka, R, RapidMiner, …
– Algorithms that work on vectors, not graphs
• Bridges built over the past years:
– FeGeLOD (Weka, 2012), RapidMiner LOD Extension (2015),
Python KG Extension (2021)
?
03/18/22 Heiko Paulheim 3
Graphs vs. Vectors
• Transformation strategies (aka propositionalization)
– e.g., types: type_horror_movie=true
– e.g., data values: year=2011
– e.g., aggregates: nominations=7
?
03/18/22 Heiko Paulheim 4
Graphs vs. Vectors
• Observations with simple propositionalization strategies
– Even simple features (e.g., add all numbers and types)
can help on many problems
– More sophisticated features often bring additional improvements
• Combinations of relations and individuals
– e.g., movies directed by Steven Spielberg
• Combinations of relations and types
– e.g., movies directed by Oscar-winning directors
• …
– But
• The search space is enormous!
• Generate first, filter later does not scale well
03/18/22 Heiko Paulheim 5
Towards RDF2vec
• Excursion: word embeddings
– word2vec proposed by Mikolov et al. (2013)
– predict a word from its context or vice versa
• Idea: similar words appear in similar contexts, like
– Jobs, Wozniak, and Wayne founded Apple Computer Company in April
1976
– Google was officially founded as a company in January 2006
– usually trained on large text corpora
• projection layer: embedding vectors
03/18/22 Heiko Paulheim 6
From Word Embeddings to Graph Embeddings
• Basic idea:
– extract random walks from an RDF graph:
Mulholland Dr. David Lynch US
– feed walks into word2vec algorithm
• Order of magnitude (e.g., DBpedia)
– ~6M entities (“words”)
– start up to 500 random walks per entity, length up to 8
→ corpus of >20B tokens
• Result:
– entity embeddings
– most often outperform other propositionalization techniques
director nationality
Ristoski and Paulheim (2016): RDF2vec: RDF graph embeddings for data mining
03/18/22 Heiko Paulheim 7
The End of Petar’s PhD Journey…
• ...and the beginning of the RDF2vec adventure
03/18/22 Heiko Paulheim 8
Why does RDF2vec Work?
• Example: PCA plot of an excerpt of a cities classification problem
– From cities classification task in the embedding evaluation framework by
Pellegrino et al.
03/18/22 Heiko Paulheim 9
Why does RDF2vec Work?
• In downstream machine learning, we usually want class separation
– to make the life of the classifier as easy as possible
• Class separation means
– Similar entities (i.e., same class) are projected
closely to each other
– Dissimilar entities (i.e., different classes) are projected
far away from each other
03/18/22 Heiko Paulheim 10
Why does RDF2vec Work?
• Observation: close projection of similar entities
– Usage example: content-based recommender system based on k-NN
Ristoski and Paulheim (2016): RDF2vec: RDF graph embeddings for data mining
03/18/22 Heiko Paulheim 11
Embeddings for Link Prediction
• RDF2vec observations
– similar instances form clusters, direction of relation is ~stable
– link prediction by analogy reasoning (Japan – Tokyo ≈ China – Beijing)
Ristoski & Paulheim: RDF2vec: RDF Graph Embeddings for Data Mining. ISWC, 2016
03/18/22 Heiko Paulheim 12
Embeddings for Link Prediction
• In RDF2vec, relation preservation is a by-product
• TransE (and its descendants): direct modeling
– Formulates RDF embedding as an optimization problem
– Find mapping of entities and relations to Rn
so that
• across all triples <s,p,o>
Σ ||s+p-o|| is minimized
• try to obtain a smaller error
for existing triples
than for non-existing ones
Bordes et al: Translating Embeddings for Modeling Multi-relational Data. NIPS 2013.
Fan et al.: Learning Embedding Representations for Knowledge Inference on Imperfect and Incomplete
Repositories. WI 2016
03/18/22 Heiko Paulheim 13
Link Prediction vs. Node Embedding
• Hypothesis:
– Embeddings for link prediction also cluster similar entities
– Node embeddings can also be used for link prediction
Portisch et al. (2022): Knowledge Graph Embedding for Data Mining vs. Knowledge Graph Embedding for
Link Prediction - Two Sides of the Same Coin?
03/18/22 Heiko Paulheim 14
Close Projection of Similar Entities
• What does similar mean?
03/18/22 Heiko Paulheim 15
Similarity vs. Relatedness
• Closest 10 entities to Angela Merkel in different vector spaces
Portisch et al. (2022): Knowledge Graph Embedding for Data Mining vs. Knowledge Graph Embedding for
Link Prediction - Two Sides of the Same Coin?
03/18/22 Heiko Paulheim 16
Back to Class Separation
• What is a class?
– e.g., cities per se
– e.g., cities in the Netherlands
– e.g., cities in the Netherlands above 100k inhabitants
• Or something different, such as
– e.g., everything located in Amsterdam
– e.g., everything Amsterdam is known for
03/18/22 Heiko Paulheim 17
Back to Class Separation
• Observation: there are different kinds of classes:
– Classes of objects of the same category (e.g., cities)
→ those are similar
– Classes of objects of different categories (e.g., buildings, dishes,
organizations, persons)
→ those are related
03/18/22 Heiko Paulheim 18
Intermediate Observation
• In most vector spaces of link prediction embeddings (TransE etc.):
proximity ~ similarity
• In RDF2vec embedding space:
proximity ~ a mix of similarity and relatedness
03/18/22 Heiko Paulheim 19
So… why does RDF2vec Work Then?
• Recap: downstream ML algorithms need class separation
– but RDF2vec groups items by similarity and relatedness
• Why is RDF2vec still so good at classification?
03/18/22 Heiko Paulheim 20
Example
• It depends on the classification problem at hand!
– Cities vs. countries
– Places in Europe vs. places in Asia
03/18/22 Heiko Paulheim 21
So… why does RDF2vec Work Then?
• Many downstream classification tasks are homogeneous
– e.g., classifying cities in different subclasses
• For homogeneous entities:
– relatedness provides finer-grained distinctions
03/18/22 Heiko Paulheim 22
Similarity vs. Relatedness
• Recap word embeddings:
– Jobs, Wozniak, and Wayne founded Apple Computer Company in April
1976
– Google was officially founded as a company in January 2006
• Graph walks:
– Hamburg → country → Germany → leader → Angela_Merkel
– Germany → leader → Angela_Merkel → birthPlace → Hamburg
– Hamburg → leader → Peter_Tschentscher → residence → Hamburg
Germany
Angela_Merkel Hamburg
birthPlace
country
leader
Peter_Tschentscher
leader
residence
country
03/18/22 Heiko Paulheim 23
Order-Aware RDF2vec
• Using an order-aware variant of word2vec
• Experimental results:
– order-aware RDF2vec most often outperforms classic RDF2vec
– a bit more computation heavy, but still scales to DBpedia etc.
Ling et al. (2015): Two/Too Simple Adaptations of Word2Vec for Syntax Problems.
03/18/22 Heiko Paulheim 24
Similarity vs. Relatedness
• Exploiting different notions of proximity
– Use case: table interpretation (a special case of entity disambiguation)
related
similar
03/18/22 Heiko Paulheim 25
Similarity vs. Relatedness in Graph Walks
• Which parts of a walk denote what?
– Hamburg → country → Germany → leader → Angela_Merkel
– Germany → leader → Angela_Merkel → birthPlace → Hamburg
– Hamburg → leader → Peter_Tschentscher → residence → Hamburg
– California → leader → Gavin_Newsom → birthPlace → San_Francisco
• Common predicates (leader, birthPlace)
– Similar entities
• Common entities (Hamburg)
– Related entities
– For same-class entities: similar entities!
Portisch and Paulheim (under review): Walk this Way! Entity Walks and Property Walks for RDF2vec.
03/18/22 Heiko Paulheim 26
Similarity vs. Relatedness in Graph Walks
• Given that observation:
– Common predicates (leader, birthPlace)
• Similar classes
– Common entities (Hamburg)
• Related entities
• For same-class entities: similar entities!
• ...we should be able to learn tailored embeddings
– using walks of predicates → embedding space encodes similarity
– using walks of entities → embedding space encodes relatedness
Portisch and Paulheim (under review): Walk this Way! Entity Walks and Property Walks for RDF2vec.
03/18/22 Heiko Paulheim 27
Similarity vs. Relatedness in Graph Walks
• Classic RDF2vec walks:
– Germany → leader → Angela_Merkel → birthPlace → Hamburg
• p-walk (predicates only except for focus entity)
– country → leader → Angela_Merkel → birthPlace → mayor
• e-walk (entities only)
– Berlin → Germany → Angela_Merkel → Hamburg → Elbphilharmonie
Portisch and Paulheim (under review): Walk this Way! Entity Walks and Property Walks for RDF2vec.
03/18/22 Heiko Paulheim 28
The RDF2vec Zoo
• We now have an entire zoo of RDF2vec variants
– SG vs. CBOW
– Order-aware vs. unordered (“classic”)
– Classic walks vs. e-walks vs. p-walks
03/18/22 Heiko Paulheim 29
The RDF2vec Zoo – Preliminary Evaluation
Classic is usually
quite good
￘
￘
oa variants are
often superior
03/18/22 Heiko Paulheim 30
The RDF2vec Zoo – Breeding New Embeddings
• Preliminary results show good results by combining embeddings
Adler_Mannheim → city → Mannheim → country → Germany
Adler_Mannheim → stadium → SAP_Arena → location → Mannheim
SAP_Arena → location → Mannheim → country → Germany
...
Classic random walks
city → Mannheim → country
stadium → location → Mannheim → country
location → Mannheim → federal_state → location
...
p-walks
Adler_Mannheim → Mannheim → Germany
Adler_Mannheim → SAP_Arena → Mannheim → Germany
SAP_Arena → Mannheim → Baden-Württemberg → Germany
...
e-walks
concatenated
vector
Global PCA
03/18/22 Heiko Paulheim 31
The RDF2vec Zoo – Breeding New Embeddings
• Combinations can be task specific
– Based on general embeddings
– Combination can pick up task-specific signals
Adler_Mannheim → city → Mannheim → country → Germany
Adler_Mannheim → stadium → SAP_Arena → location → Mannheim
SAP_Arena → location → Mannheim → country → Germany
...
Classic random walks
city → Mannheim → country
stadium → location → Mannheim → country
location → Mannheim → federal_state → location
...
p-walks
Adler_Mannheim → Mannheim → Germany
Adler_Mannheim → SAP_Arena → Mannheim → Germany
SAP_Arena → Mannheim → Baden-Württemberg → Germany
...
e-walks
concatenated
vector
w
3
w
1
(weighted)
local PCA
w
2
Task data
03/18/22 Heiko Paulheim 32
Which Classes can be Learned with RDF2vec?
• We already saw that there are different notions of classes
• Idea: compile a list of class definitions as a benchmark
– Classes are expressed as DL formulae, e.g.
– r.T, e.g. Class person with children
– r.{e}, e.g.: Class person born in New York City
– R.{e}, e.g., Class person with any relation to New York City
– r.C, e.g., Class person playing in a basketball team
– …
• First attempt:
– Create SPARQL queries in DBpedia
to get positive and negative examples
– Train binary classifier on different embeddings
03/18/22 Heiko Paulheim 33
Which Classes can be Learned with RDF2vec?
• Formulating hypotheses
– e.g., r.T, cannot be learned when using e walks
• Testing hypotheses
– using queries against DBpedia
• Seeing surprises
– e.g., models trained on e-walks can reach ~90% accuracy in that case
03/18/22 Heiko Paulheim 34
Which Classes can be Learned with RDF2vec?
• Challenge: isolating effects
– Let’s consider, r.T: e.g. almaMater.T
– In theory, we should not be able to learn this with e-walks
– Frequent entities in the neighborhoods of positive examples:
• Politician (3k examples)
• Bachelor of Arts (3k examples)
• Harvard Law School (2k examples)
• Lawyer (2k examples)
• Northwestern University (2k examples)
• Harvard University (2k examples)
• Doctor of Philosophy (2k examples)
• …
– Those signals are visible to e-walks!
03/18/22 Heiko Paulheim 35
Which Classes can be Learned with RDF2vec?
• Maybe, DBpedia is not
such a great testbed
– Hidden patterns, e.g.,
for relation cooccurence
– Many inter-pattern dependencies
– Information not missing at random
• Possible solution:
– Synthetic knowledge graphs!
– First experiments show
better visibility of expected effects
03/18/22 Heiko Paulheim 36
Alternatives to Understand KG Embeddings
cartoon
superhero
• Approach 1: learn symbolic
interpretation function for dimensions
• Each dimension of the embedding model
is a target for a separate learning problem
• Learn a function to explain the dimension
• E.g.:
• Just an approximation used for explanations and justifications
y≈−|∃character .Superhero|
03/18/22 Heiko Paulheim 37
Alternatives to Understand KG Embeddings
• Approach 2: learn symbolic substitute function for similarity function
Right hand side picture: RelFinder (https://interactivesystems.info/developments/relfinder)
03/18/22 Heiko Paulheim 38
Alternatives to Understand KG Embeddings
• Approach 3: generate
symbolic interpretations
for individual predictions
– Inspired by LIME:
• Generate perturbed
examples
• Label them using
embedding+downstream classifier
• Learn symbolic model on this labeled set
– Good news:
• RDF2vec can, in principle, create embeddings for unseen entities
• Those can be used to classify perturbed examples
https://c3.ai/glossary/data-science/lime-local-interpretable-model-agnostic-explanations/
03/18/22 Heiko Paulheim 39
Summary
• Knowledge Graph Embeddings with RDF2vec
– Encode similarity and relatedness
• Explicit trade-off is possible!
– Variations visited: walk extraction, order-awareness, materialization, ...
– Additional insights that are not explicit in the graph
• aka latent semantics
03/18/22 Heiko Paulheim 40
More on RDF2vec
• Collection of
– Implementations
– Pre-trained models
– >45 use cases
in various domains
03/18/22 Heiko Paulheim 41
Thank you!
http://www.heikopaulheim.com
@heikopaulheim
03/18/22 Heiko Paulheim 42
What do
Knowledge Graph Embeddings Learn?
includes: some new Adventures in RDF2vec
Heiko Paulheim
University of Mannheim
Heiko Paulheim

Más contenido relacionado

La actualidad más candente

Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Databricks
 
Graphs in Automotive and Manufacturing - Unlock New Value from Your Data
Graphs in Automotive and Manufacturing - Unlock New Value from Your DataGraphs in Automotive and Manufacturing - Unlock New Value from Your Data
Graphs in Automotive and Manufacturing - Unlock New Value from Your Data
Neo4j
 
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & TomorrowAmsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Neo4j
 

La actualidad más candente (20)

Graphs for Data Science and Machine Learning
Graphs for Data Science and Machine LearningGraphs for Data Science and Machine Learning
Graphs for Data Science and Machine Learning
 
Improving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsImproving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph Algorithms
 
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
 
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
 
SPARQL Cheat Sheet
SPARQL Cheat SheetSPARQL Cheat Sheet
SPARQL Cheat Sheet
 
Smarter Fraud Detection With Graph Data Science
Smarter Fraud Detection With Graph Data ScienceSmarter Fraud Detection With Graph Data Science
Smarter Fraud Detection With Graph Data Science
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
 
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
 
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptxGraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
 
Graph based data models
Graph based data modelsGraph based data models
Graph based data models
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
 
Graphs in Automotive and Manufacturing - Unlock New Value from Your Data
Graphs in Automotive and Manufacturing - Unlock New Value from Your DataGraphs in Automotive and Manufacturing - Unlock New Value from Your Data
Graphs in Automotive and Manufacturing - Unlock New Value from Your Data
 
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & TomorrowAmsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
Slides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsSlides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property Graphs
 
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
 
Growth Strategies for Bootstrapped Companies Savant Growth
Growth Strategies for Bootstrapped Companies  Savant GrowthGrowth Strategies for Bootstrapped Companies  Savant Growth
Growth Strategies for Bootstrapped Companies Savant Growth
 

Similar a What_do_Knowledge_Graph_Embeddings_Learn.pdf

Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...
Shenghui Wang
 
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkDistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
Gezim Sejdiu
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
Sören Auer
 

Similar a What_do_Knowledge_Graph_Embeddings_Learn.pdf (20)

New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
 
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
 
Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...
 
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph MiningDetection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkDistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph Profiling
 
Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
 
Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)
 
Re-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutRe-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playout
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the Web
 
20130527 library linkeddata
20130527 library linkeddata20130527 library linkeddata
20130527 library linkeddata
 
Machine Learning Methods for Analysing and Linking RDF Data
Machine Learning Methods for Analysing and Linking RDF DataMachine Learning Methods for Analysing and Linking RDF Data
Machine Learning Methods for Analysing and Linking RDF Data
 
Linked data and voyager
Linked data and voyagerLinked data and voyager
Linked data and voyager
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+
 
OpenLSH - a framework for locality sensitive hashing
OpenLSH  - a framework for locality sensitive hashingOpenLSH  - a framework for locality sensitive hashing
OpenLSH - a framework for locality sensitive hashing
 

Más de Heiko Paulheim

Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Heiko Paulheim
 

Más de Heiko Paulheim (20)

Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge Graphs
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on Twitter
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine Learning
 
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopServing DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
 
Combining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionCombining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly Detection
 
Gathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesGathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia Entities
 
What the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataWhat the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open Data
 
Linked Open Data enhanced Knowledge Discovery
Linked Open Data enhanced  Knowledge DiscoveryLinked Open Data enhanced  Knowledge Discovery
Linked Open Data enhanced Knowledge Discovery
 
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMiner
 
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
 
Detecting Incorrect Numerical Data in DBpedia
Detecting Incorrect Numerical Data in DBpediaDetecting Incorrect Numerical Data in DBpedia
Detecting Incorrect Numerical Data in DBpedia
 
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier DetectionIdentifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
 

Último

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
cnajjemba
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 

Último (20)

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 

What_do_Knowledge_Graph_Embeddings_Learn.pdf

  • 1. 03/18/22 Heiko Paulheim 1 What do Knowledge Graph Embeddings Learn? includes: some new Adventures in RDF2vec Heiko Paulheim University of Mannheim Heiko Paulheim
  • 2. 03/18/22 Heiko Paulheim 2 Graphs vs. Vectors • Data Science tools for prediction etc. – Python, Weka, R, RapidMiner, … – Algorithms that work on vectors, not graphs • Bridges built over the past years: – FeGeLOD (Weka, 2012), RapidMiner LOD Extension (2015), Python KG Extension (2021) ?
  • 3. 03/18/22 Heiko Paulheim 3 Graphs vs. Vectors • Transformation strategies (aka propositionalization) – e.g., types: type_horror_movie=true – e.g., data values: year=2011 – e.g., aggregates: nominations=7 ?
  • 4. 03/18/22 Heiko Paulheim 4 Graphs vs. Vectors • Observations with simple propositionalization strategies – Even simple features (e.g., add all numbers and types) can help on many problems – More sophisticated features often bring additional improvements • Combinations of relations and individuals – e.g., movies directed by Steven Spielberg • Combinations of relations and types – e.g., movies directed by Oscar-winning directors • … – But • The search space is enormous! • Generate first, filter later does not scale well
  • 5. 03/18/22 Heiko Paulheim 5 Towards RDF2vec • Excursion: word embeddings – word2vec proposed by Mikolov et al. (2013) – predict a word from its context or vice versa • Idea: similar words appear in similar contexts, like – Jobs, Wozniak, and Wayne founded Apple Computer Company in April 1976 – Google was officially founded as a company in January 2006 – usually trained on large text corpora • projection layer: embedding vectors
  • 6. 03/18/22 Heiko Paulheim 6 From Word Embeddings to Graph Embeddings • Basic idea: – extract random walks from an RDF graph: Mulholland Dr. David Lynch US – feed walks into word2vec algorithm • Order of magnitude (e.g., DBpedia) – ~6M entities (“words”) – start up to 500 random walks per entity, length up to 8 → corpus of >20B tokens • Result: – entity embeddings – most often outperform other propositionalization techniques director nationality Ristoski and Paulheim (2016): RDF2vec: RDF graph embeddings for data mining
  • 7. 03/18/22 Heiko Paulheim 7 The End of Petar’s PhD Journey… • ...and the beginning of the RDF2vec adventure
  • 8. 03/18/22 Heiko Paulheim 8 Why does RDF2vec Work? • Example: PCA plot of an excerpt of a cities classification problem – From cities classification task in the embedding evaluation framework by Pellegrino et al.
  • 9. 03/18/22 Heiko Paulheim 9 Why does RDF2vec Work? • In downstream machine learning, we usually want class separation – to make the life of the classifier as easy as possible • Class separation means – Similar entities (i.e., same class) are projected closely to each other – Dissimilar entities (i.e., different classes) are projected far away from each other
  • 10. 03/18/22 Heiko Paulheim 10 Why does RDF2vec Work? • Observation: close projection of similar entities – Usage example: content-based recommender system based on k-NN Ristoski and Paulheim (2016): RDF2vec: RDF graph embeddings for data mining
  • 11. 03/18/22 Heiko Paulheim 11 Embeddings for Link Prediction • RDF2vec observations – similar instances form clusters, direction of relation is ~stable – link prediction by analogy reasoning (Japan – Tokyo ≈ China – Beijing) Ristoski & Paulheim: RDF2vec: RDF Graph Embeddings for Data Mining. ISWC, 2016
  • 12. 03/18/22 Heiko Paulheim 12 Embeddings for Link Prediction • In RDF2vec, relation preservation is a by-product • TransE (and its descendants): direct modeling – Formulates RDF embedding as an optimization problem – Find mapping of entities and relations to Rn so that • across all triples <s,p,o> Σ ||s+p-o|| is minimized • try to obtain a smaller error for existing triples than for non-existing ones Bordes et al: Translating Embeddings for Modeling Multi-relational Data. NIPS 2013. Fan et al.: Learning Embedding Representations for Knowledge Inference on Imperfect and Incomplete Repositories. WI 2016
  • 13. 03/18/22 Heiko Paulheim 13 Link Prediction vs. Node Embedding • Hypothesis: – Embeddings for link prediction also cluster similar entities – Node embeddings can also be used for link prediction Portisch et al. (2022): Knowledge Graph Embedding for Data Mining vs. Knowledge Graph Embedding for Link Prediction - Two Sides of the Same Coin?
  • 14. 03/18/22 Heiko Paulheim 14 Close Projection of Similar Entities • What does similar mean?
  • 15. 03/18/22 Heiko Paulheim 15 Similarity vs. Relatedness • Closest 10 entities to Angela Merkel in different vector spaces Portisch et al. (2022): Knowledge Graph Embedding for Data Mining vs. Knowledge Graph Embedding for Link Prediction - Two Sides of the Same Coin?
  • 16. 03/18/22 Heiko Paulheim 16 Back to Class Separation • What is a class? – e.g., cities per se – e.g., cities in the Netherlands – e.g., cities in the Netherlands above 100k inhabitants • Or something different, such as – e.g., everything located in Amsterdam – e.g., everything Amsterdam is known for
  • 17. 03/18/22 Heiko Paulheim 17 Back to Class Separation • Observation: there are different kinds of classes: – Classes of objects of the same category (e.g., cities) → those are similar – Classes of objects of different categories (e.g., buildings, dishes, organizations, persons) → those are related
  • 18. 03/18/22 Heiko Paulheim 18 Intermediate Observation • In most vector spaces of link prediction embeddings (TransE etc.): proximity ~ similarity • In RDF2vec embedding space: proximity ~ a mix of similarity and relatedness
  • 19. 03/18/22 Heiko Paulheim 19 So… why does RDF2vec Work Then? • Recap: downstream ML algorithms need class separation – but RDF2vec groups items by similarity and relatedness • Why is RDF2vec still so good at classification?
  • 20. 03/18/22 Heiko Paulheim 20 Example • It depends on the classification problem at hand! – Cities vs. countries – Places in Europe vs. places in Asia
  • 21. 03/18/22 Heiko Paulheim 21 So… why does RDF2vec Work Then? • Many downstream classification tasks are homogeneous – e.g., classifying cities in different subclasses • For homogeneous entities: – relatedness provides finer-grained distinctions
  • 22. 03/18/22 Heiko Paulheim 22 Similarity vs. Relatedness • Recap word embeddings: – Jobs, Wozniak, and Wayne founded Apple Computer Company in April 1976 – Google was officially founded as a company in January 2006 • Graph walks: – Hamburg → country → Germany → leader → Angela_Merkel – Germany → leader → Angela_Merkel → birthPlace → Hamburg – Hamburg → leader → Peter_Tschentscher → residence → Hamburg Germany Angela_Merkel Hamburg birthPlace country leader Peter_Tschentscher leader residence country
  • 23. 03/18/22 Heiko Paulheim 23 Order-Aware RDF2vec • Using an order-aware variant of word2vec • Experimental results: – order-aware RDF2vec most often outperforms classic RDF2vec – a bit more computation heavy, but still scales to DBpedia etc. Ling et al. (2015): Two/Too Simple Adaptations of Word2Vec for Syntax Problems.
  • 24. 03/18/22 Heiko Paulheim 24 Similarity vs. Relatedness • Exploiting different notions of proximity – Use case: table interpretation (a special case of entity disambiguation) related similar
  • 25. 03/18/22 Heiko Paulheim 25 Similarity vs. Relatedness in Graph Walks • Which parts of a walk denote what? – Hamburg → country → Germany → leader → Angela_Merkel – Germany → leader → Angela_Merkel → birthPlace → Hamburg – Hamburg → leader → Peter_Tschentscher → residence → Hamburg – California → leader → Gavin_Newsom → birthPlace → San_Francisco • Common predicates (leader, birthPlace) – Similar entities • Common entities (Hamburg) – Related entities – For same-class entities: similar entities! Portisch and Paulheim (under review): Walk this Way! Entity Walks and Property Walks for RDF2vec.
  • 26. 03/18/22 Heiko Paulheim 26 Similarity vs. Relatedness in Graph Walks • Given that observation: – Common predicates (leader, birthPlace) • Similar classes – Common entities (Hamburg) • Related entities • For same-class entities: similar entities! • ...we should be able to learn tailored embeddings – using walks of predicates → embedding space encodes similarity – using walks of entities → embedding space encodes relatedness Portisch and Paulheim (under review): Walk this Way! Entity Walks and Property Walks for RDF2vec.
  • 27. 03/18/22 Heiko Paulheim 27 Similarity vs. Relatedness in Graph Walks • Classic RDF2vec walks: – Germany → leader → Angela_Merkel → birthPlace → Hamburg • p-walk (predicates only except for focus entity) – country → leader → Angela_Merkel → birthPlace → mayor • e-walk (entities only) – Berlin → Germany → Angela_Merkel → Hamburg → Elbphilharmonie Portisch and Paulheim (under review): Walk this Way! Entity Walks and Property Walks for RDF2vec.
  • 28. 03/18/22 Heiko Paulheim 28 The RDF2vec Zoo • We now have an entire zoo of RDF2vec variants – SG vs. CBOW – Order-aware vs. unordered (“classic”) – Classic walks vs. e-walks vs. p-walks
  • 29. 03/18/22 Heiko Paulheim 29 The RDF2vec Zoo – Preliminary Evaluation Classic is usually quite good ￘ ￘ oa variants are often superior
  • 30. 03/18/22 Heiko Paulheim 30 The RDF2vec Zoo – Breeding New Embeddings • Preliminary results show good results by combining embeddings Adler_Mannheim → city → Mannheim → country → Germany Adler_Mannheim → stadium → SAP_Arena → location → Mannheim SAP_Arena → location → Mannheim → country → Germany ... Classic random walks city → Mannheim → country stadium → location → Mannheim → country location → Mannheim → federal_state → location ... p-walks Adler_Mannheim → Mannheim → Germany Adler_Mannheim → SAP_Arena → Mannheim → Germany SAP_Arena → Mannheim → Baden-Württemberg → Germany ... e-walks concatenated vector Global PCA
  • 31. 03/18/22 Heiko Paulheim 31 The RDF2vec Zoo – Breeding New Embeddings • Combinations can be task specific – Based on general embeddings – Combination can pick up task-specific signals Adler_Mannheim → city → Mannheim → country → Germany Adler_Mannheim → stadium → SAP_Arena → location → Mannheim SAP_Arena → location → Mannheim → country → Germany ... Classic random walks city → Mannheim → country stadium → location → Mannheim → country location → Mannheim → federal_state → location ... p-walks Adler_Mannheim → Mannheim → Germany Adler_Mannheim → SAP_Arena → Mannheim → Germany SAP_Arena → Mannheim → Baden-Württemberg → Germany ... e-walks concatenated vector w 3 w 1 (weighted) local PCA w 2 Task data
  • 32. 03/18/22 Heiko Paulheim 32 Which Classes can be Learned with RDF2vec? • We already saw that there are different notions of classes • Idea: compile a list of class definitions as a benchmark – Classes are expressed as DL formulae, e.g. – r.T, e.g. Class person with children – r.{e}, e.g.: Class person born in New York City – R.{e}, e.g., Class person with any relation to New York City – r.C, e.g., Class person playing in a basketball team – … • First attempt: – Create SPARQL queries in DBpedia to get positive and negative examples – Train binary classifier on different embeddings
  • 33. 03/18/22 Heiko Paulheim 33 Which Classes can be Learned with RDF2vec? • Formulating hypotheses – e.g., r.T, cannot be learned when using e walks • Testing hypotheses – using queries against DBpedia • Seeing surprises – e.g., models trained on e-walks can reach ~90% accuracy in that case
  • 34. 03/18/22 Heiko Paulheim 34 Which Classes can be Learned with RDF2vec? • Challenge: isolating effects – Let’s consider, r.T: e.g. almaMater.T – In theory, we should not be able to learn this with e-walks – Frequent entities in the neighborhoods of positive examples: • Politician (3k examples) • Bachelor of Arts (3k examples) • Harvard Law School (2k examples) • Lawyer (2k examples) • Northwestern University (2k examples) • Harvard University (2k examples) • Doctor of Philosophy (2k examples) • … – Those signals are visible to e-walks!
  • 35. 03/18/22 Heiko Paulheim 35 Which Classes can be Learned with RDF2vec? • Maybe, DBpedia is not such a great testbed – Hidden patterns, e.g., for relation cooccurence – Many inter-pattern dependencies – Information not missing at random • Possible solution: – Synthetic knowledge graphs! – First experiments show better visibility of expected effects
  • 36. 03/18/22 Heiko Paulheim 36 Alternatives to Understand KG Embeddings cartoon superhero • Approach 1: learn symbolic interpretation function for dimensions • Each dimension of the embedding model is a target for a separate learning problem • Learn a function to explain the dimension • E.g.: • Just an approximation used for explanations and justifications y≈−|∃character .Superhero|
  • 37. 03/18/22 Heiko Paulheim 37 Alternatives to Understand KG Embeddings • Approach 2: learn symbolic substitute function for similarity function Right hand side picture: RelFinder (https://interactivesystems.info/developments/relfinder)
  • 38. 03/18/22 Heiko Paulheim 38 Alternatives to Understand KG Embeddings • Approach 3: generate symbolic interpretations for individual predictions – Inspired by LIME: • Generate perturbed examples • Label them using embedding+downstream classifier • Learn symbolic model on this labeled set – Good news: • RDF2vec can, in principle, create embeddings for unseen entities • Those can be used to classify perturbed examples https://c3.ai/glossary/data-science/lime-local-interpretable-model-agnostic-explanations/
  • 39. 03/18/22 Heiko Paulheim 39 Summary • Knowledge Graph Embeddings with RDF2vec – Encode similarity and relatedness • Explicit trade-off is possible! – Variations visited: walk extraction, order-awareness, materialization, ... – Additional insights that are not explicit in the graph • aka latent semantics
  • 40. 03/18/22 Heiko Paulheim 40 More on RDF2vec • Collection of – Implementations – Pre-trained models – >45 use cases in various domains
  • 41. 03/18/22 Heiko Paulheim 41 Thank you! http://www.heikopaulheim.com @heikopaulheim
  • 42. 03/18/22 Heiko Paulheim 42 What do Knowledge Graph Embeddings Learn? includes: some new Adventures in RDF2vec Heiko Paulheim University of Mannheim Heiko Paulheim