SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
Metrics for Evaluating Quality of
Embeddings for Ontological Concepts
Faisal Alshargi
University of Leipzig
Saeedeh Shekarpour
University of Dayton
Tommaso Soru
University of Leipzig
Amit Sheth
Kno.e.sis Center
1
Motivation: the growth of Linked Open Data
2
March 2018
~3000 Datasets
>190 Triples
Motivation: Knowledge Transformation
It is necessary to dispose this valuable knowledge for extrinsic tasks such as natural language processing
or data mining.
➔ the knowledge (i.e. schema level and instance level) has to be transformed from discrete
representations to numerical representations (called embeddings).
3
Research Gap and our Contribution
Existing deficiency: is the lack of an evaluation framework for comprehensive and fair judgment on the
quality of the embeddings of concepts
Evaluation can be intrinsic or extrinsic
Our Contribution: concerned with evaluating the quality of embeddings for concepts. It extends the
state of the art by providing several intrinsic metrics for evaluating the quality of the embedding of
concepts on three aspects:
i. the categorization aspect,
ii. the hierarchical aspect, and
iii. the relational aspect.
4
Ontological Concept
Ontological concepts play a crucial role in
➔ capturing the semantics of a particular domain
➔ typing entities which bridge a schema level and
an instance level
➔ determining valid types of sources and
destinations for relations in a knowledge graph.
Thus, the embeddings of the concepts are expected to
truly reflect characteristics of ontological concepts in
the embedding space.
5
Contribution
contributing in providing several metrics for evaluating the quality of the embedding of concepts from
three perspectives:
(i) how the embedding of concepts behaves for categorizing their instantiated entities;
(ii) how the embedding of concepts behaves with respect to hierarchical semantics described in the
underlying ontology; and
(iii) how the embedding of concepts behaves with respect to relations
6
Embedding Models (1)
➔ Word2vec: Skip-Gram Model and Continuous Bag of Words (CBOW) Model. learns two separate
embeddings for each target word wi , (i) the word embedding and (ii) the context embedding.
➔ GloVe Model: The GloVe model (Pennington, Socher, and Manning 2014) is a global log-bilinear
regression model for the unsupervised learning of word embeddings. It captures global statistics of
words in a corpus and benefits the advantages of the other two models: (i) global matrix
factorization and (ii) local context window methods.
➔ RDF2Vec Model: RDF2Vec (Ristoski and Paulheim 2016) is an approach for learning embeddings
of entities in RDF graphs. It initially converts the RDF graphs into a set of sequences using two
strategies: (i) Weisfeiler-Lehman Subtree RDF Graph Kernels, and (ii) graph random walks. Then,
word2vec is employed for learning embeddings over these produced sequences.
7
Embedding Models (2)
➔ Translation-based Models: The TransE and TransH models assume that the embeddings of both
the entities and relations of a knowledge graph are represented in the same semantic space,
whereas the TransR considers two separate embedding spaces for entities and relations. All three
approaches share the same principle, summing the vectors of the subject and the predicate, one
can obtain an approximation of the vectors of the objects.
➔ Other Knowledge Graph Embedding (KGE) Models.
◆ HolE (Holographic Embeddings)
◆ DistMult
◆ ComplEx
◆ ...
All approaches above have shown to reach state-of-the-art performances on link prediction and triplet
classification.
8
Excluding of non-scalable KGE Approaches
➔ We selected the knowledge graph embedding approaches for the evaluation of our metrics among
RDF2Vec, TransE and three of the methods described in the previous subsection (i.e., HolE,
DistMult, and ComplEx)
➔ Differently from RDF2Vec, we could not find DBpedia embeddings pre-trained using any of the
other approaches online, thus we conducted a scalability test on them to verify their ability to
handle the size of DBpediawe decided to select only the more scalable RDF2Vec approach in our
evaluation
9
Data Preparation
➔ Corpus: DBpedia + Wikipedia
➔ From the DBpedia ontology, we selected 12 concepts, which are positioned in various levels of the
hierarchy. Furthermore, for each concept, we retrieved 10,000 entities typed by it (in case of
unavailability, all existing entities were retrieved).
➔ For each concept class, we retrieved 10,000 instances and their respective labels; in case of
unavailability, all existing instances were retrieved.
➔ Then, the embeddings of these concepts as well as their associated instances were computed from
the embedding models: (i) skip-gram, and (ii) CBOW and (iii) GloVe trained on Wikipedia and
DBpedia
10
Task 1: Evaluating the Categorization Aspect of
Concepts in Embeddings
➔ Ontological concepts C categorize entities by typing them, mainly using rdf:type. In other words, all
the entities with a common type share specific characteristics. For example, all the entities with
the type dbo:Country have common characteristics distinguishing them from the entities with the
type dbo:Person.
➔ Research Question: How far is the categorization aspect of concepts captured (i.e., encoded) by an
embedding model? In other words, we aim to measure the quality of the embeddings for concepts
via observing their behaviour in categorization tasks.
11
Categorization metric
➔ In the context of unstructured data, this metric aligns a clustering of words into different
categories
➔ how well the embedding of a concept performs as the background concept of the entities typed by
it.
➔ To quantify this metric, we compute the averaged vector of the embeddings of all the entities
having type c and then compute the cosine similarity of this averaged vector and the embedding of
the concept Vc
12
Experiment
13
Coherence metric
It measures whether or not a group of words adjacent in the embedding space are mutually related.
Commonly, this relatedness task has been evaluated in a subjective manner (i.e. using a human judge).
However, in the context of structured data we define the concept of relatedness as the related entities
which share a background concept, a background concept is the concept from which a given entity is
typed (i.e. inherited). For example, the entities dbr:Berlin and dbr:Sana’a are related because both are
typed by the concept dbo:City
14
Coherence metric
A. Quantitative Evaluation: For the given concept c and the given radius n, we find the n-top similar
entities from the pool (having the highest cosine similarity with Vc ). Then, the coherence metric
for the given concept c with the radius n is computed as the number of entities having the same
background concept as the given concept;
B. Qualitative Evaluation: visualization approach
15
Experiment (radius 20)
16
Task 2: Evaluating Hierarchical Aspect of
Concepts in Embeddings
There is a relatively long standing research for measuring the similarity of two given concepts s(ci , cj )
either across ontologies or inside a common ontology.
Typically, the similarity of concepts is calculated at the lexical level and at the conceptual level.
We present three metrics which can be employed for evaluating the embeddings of concepts with
respect to the hierarchical structure and the semantics.
17
Absolute Error
We introduce the metric absolute semantic error which quantitatively measures the quality of
embeddings for concepts against their semantic similarity.
18
Semantic Relatedness metric
We tune this metric from word embedding for knowledge graphs by exchanging words for concepts.
Typically, this metric represents the relatedness score of two given words. In the context of a knowledge
graph, we give a pair of concepts to human judges (usually domain experts) to rate the relatedness score
on a predefined scale, then, the correlation of the cosine similarity of the embeddings for concepts is
measured with human judge scores using Spearman or Pearson.
19
Visualization
The embeddings of all concepts of the knowledge graph can be represented in a two-dimensional
visualization. This approach is an appropriate means for qualitative evaluation of the hierarchical aspect
of concepts. The visualizations are given to a human who judges them to recognize patterns revealing the
hierarchical structure and the semantics.
20
Experiment
21
Task 3: Evaluating Relational Aspect of
Concepts in Embedding
A. Relation Validation: whether or not the inferred relation is compatible with the type of entities
engaged. For example, the relation capital is valid if it is recognized between entities with the types
country and city.
B. Domain and Range of Relations: This validation process in a knowledge graph is eased by
considering the axioms rdfs:domain and rdfs:range of the schema properties and rdf:type of entities.
The expectation from embeddings generated for relations is to truly reflect compatibility with the
embeddings of the concepts asserted in the domain and range
22
Metrics for evaluating the quality of the
embeddings for concepts and relations
A. Selectional preference: This metric presented in assesses the relevance of a given noun as a
subject or object of a given verb (e.g. people-eat or city-talk). We tune this metric for knowledge
graphs as pairs of concept-relation which are represented to a human judge for the approval or
disapproval of their compatibility
B. Semantic transition distance: The inspiration for this metric comes from where Mikolov
demonstrated that capital cities and their corresponding countries follow the same distance. We
introduce this metric relying on an objective assessment. This metric considers the relational
axioms (i.e. rdfs:domain and rdfs:range) in a knowledge graph.
23
Discussion
➔ there is no single embedding model which shows superior performance in every scenario. For
example, while the skip-gram model performs better in the categorization task, the GloVe and
CBOW model perform better for the hierarchical task. Thus, one conclusion is that each of these
models is suited for a specific scenario.
➔ The other conclusion is that it seems that each embedding model captures specific features of the
ontological concepts, so integrating or aligning these embeddings can be a solution for fully
capturing all of these features.
➔ Although our initial expectation was that the embeddings learned from the knowledge graph (i.e.
DBpedia) should have higher quality in comparison to the embeddings learned from unstructured
data (i.e. Wikipedia), in practice we did not observe that as a constant behaviour.
24
Conclusion
➔ Required to generate embeddings with respect to the these three aspects
➔ Improving and extending the evaluation scenario
25
26

Más contenido relacionado

La actualidad más candente

Bloque 3 sugerencias didacticas 1 - primaria
Bloque 3 sugerencias didacticas 1 - primariaBloque 3 sugerencias didacticas 1 - primaria
Bloque 3 sugerencias didacticas 1 - primaria
alo_jl
 
_ CURSO COMPLETO HABILIDADES DOCENTES-1.pdf
_ CURSO COMPLETO HABILIDADES DOCENTES-1.pdf_ CURSO COMPLETO HABILIDADES DOCENTES-1.pdf
_ CURSO COMPLETO HABILIDADES DOCENTES-1.pdf
NUBIALRRAGA
 
2. principios pedagógicos
2.  principios pedagógicos2.  principios pedagógicos
2. principios pedagógicos
thaniaacosta
 
Aprendizaje esperado
Aprendizaje esperadoAprendizaje esperado
Aprendizaje esperado
Chikiflopi
 
Fase 4 verificación de la aplicación de la normatividad educativa520024 3
Fase  4 verificación de la aplicación de la normatividad educativa520024 3Fase  4 verificación de la aplicación de la normatividad educativa520024 3
Fase 4 verificación de la aplicación de la normatividad educativa520024 3
ANDRESRADA15
 
Propuesta de intervención Educativa para la atención de alumnos con necesidad...
Propuesta de intervención Educativa para la atención de alumnos con necesidad...Propuesta de intervención Educativa para la atención de alumnos con necesidad...
Propuesta de intervención Educativa para la atención de alumnos con necesidad...
Lupita Cervantes Collado
 
En qué consiste la prueba enlace
En qué consiste la prueba enlaceEn qué consiste la prueba enlace
En qué consiste la prueba enlace
Vafeln
 

La actualidad más candente (20)

Informe de analisis del contexto
Informe de analisis del contextoInforme de analisis del contexto
Informe de analisis del contexto
 
Bloque 3 sugerencias didacticas 1 - primaria
Bloque 3 sugerencias didacticas 1 - primariaBloque 3 sugerencias didacticas 1 - primaria
Bloque 3 sugerencias didacticas 1 - primaria
 
0. protocolos de atención a la diversidad parte 1 v 12 de septiembre de 2021[...
0. protocolos de atención a la diversidad parte 1 v 12 de septiembre de 2021[...0. protocolos de atención a la diversidad parte 1 v 12 de septiembre de 2021[...
0. protocolos de atención a la diversidad parte 1 v 12 de septiembre de 2021[...
 
Software Educativo
Software EducativoSoftware Educativo
Software Educativo
 
Guia 22
Guia 22Guia 22
Guia 22
 
_ CURSO COMPLETO HABILIDADES DOCENTES-1.pdf
_ CURSO COMPLETO HABILIDADES DOCENTES-1.pdf_ CURSO COMPLETO HABILIDADES DOCENTES-1.pdf
_ CURSO COMPLETO HABILIDADES DOCENTES-1.pdf
 
2. principios pedagógicos
2.  principios pedagógicos2.  principios pedagógicos
2. principios pedagógicos
 
Aprendizaje esperado
Aprendizaje esperadoAprendizaje esperado
Aprendizaje esperado
 
Cuadernillo para el participante_AS.docx
Cuadernillo para el participante_AS.docxCuadernillo para el participante_AS.docx
Cuadernillo para el participante_AS.docx
 
Impacto de las TIC en las en las Necesidades Educativas Especiales (NEE)
Impacto de las TIC en las en las Necesidades Educativas Especiales (NEE)Impacto de las TIC en las en las Necesidades Educativas Especiales (NEE)
Impacto de las TIC en las en las Necesidades Educativas Especiales (NEE)
 
Sistema educativo en Portugal
Sistema educativo en Portugal Sistema educativo en Portugal
Sistema educativo en Portugal
 
Plan de-estudios 2011
Plan de-estudios 2011Plan de-estudios 2011
Plan de-estudios 2011
 
Fase 4 verificación de la aplicación de la normatividad educativa520024 3
Fase  4 verificación de la aplicación de la normatividad educativa520024 3Fase  4 verificación de la aplicación de la normatividad educativa520024 3
Fase 4 verificación de la aplicación de la normatividad educativa520024 3
 
Programa Primaria Segundo grado RIEB 2011
Programa Primaria Segundo grado RIEB 2011Programa Primaria Segundo grado RIEB 2011
Programa Primaria Segundo grado RIEB 2011
 
Propuesta de intervención Educativa para la atención de alumnos con necesidad...
Propuesta de intervención Educativa para la atención de alumnos con necesidad...Propuesta de intervención Educativa para la atención de alumnos con necesidad...
Propuesta de intervención Educativa para la atención de alumnos con necesidad...
 
Capacitaciòn Modelo Curricular
Capacitaciòn Modelo CurricularCapacitaciòn Modelo Curricular
Capacitaciòn Modelo Curricular
 
Acomodo Razonable
Acomodo RazonableAcomodo Razonable
Acomodo Razonable
 
Programa de Estudio 2010 Ciclo 3 - Syllabus 2010 Cycle 3 PNIEB
Programa de Estudio 2010 Ciclo 3 - Syllabus 2010 Cycle 3 PNIEBPrograma de Estudio 2010 Ciclo 3 - Syllabus 2010 Cycle 3 PNIEB
Programa de Estudio 2010 Ciclo 3 - Syllabus 2010 Cycle 3 PNIEB
 
En qué consiste la prueba enlace
En qué consiste la prueba enlaceEn qué consiste la prueba enlace
En qué consiste la prueba enlace
 
Programa de Estudio 2011. Guía para la Educadora (Educación Básica Preescolar)
Programa de Estudio 2011. Guía para la Educadora (Educación Básica Preescolar)Programa de Estudio 2011. Guía para la Educadora (Educación Básica Preescolar)
Programa de Estudio 2011. Guía para la Educadora (Educación Básica Preescolar)
 

Similar a Metrics for Evaluating Quality of Embeddings for Ontological Concepts

Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...
Andre Freitas
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
dannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
IJwest
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
 
[Emnlp] what is glo ve part ii - towards data science
[Emnlp] what is glo ve  part ii - towards data science[Emnlp] what is glo ve  part ii - towards data science
[Emnlp] what is glo ve part ii - towards data science
Nikhil Jaiswal
 

Similar a Metrics for Evaluating Quality of Embeddings for Ontological Concepts (20)

Marvin_Capstone
Marvin_CapstoneMarvin_Capstone
Marvin_Capstone
 
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
 
G04124041046
G04124041046G04124041046
G04124041046
 
Barzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentationBarzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentation
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
 
SEMANTICS GRAPH MINING FOR TOPIC DISCOVERY AND WORD ASSOCIATIONS
SEMANTICS GRAPH MINING FOR TOPIC DISCOVERY AND WORD ASSOCIATIONSSEMANTICS GRAPH MINING FOR TOPIC DISCOVERY AND WORD ASSOCIATIONS
SEMANTICS GRAPH MINING FOR TOPIC DISCOVERY AND WORD ASSOCIATIONS
 
Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsContext Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word Pairs
 
Effect of word embedding vector dimensionality on sentiment analysis through ...
Effect of word embedding vector dimensionality on sentiment analysis through ...Effect of word embedding vector dimensionality on sentiment analysis through ...
Effect of word embedding vector dimensionality on sentiment analysis through ...
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
factorization methods
factorization methodsfactorization methods
factorization methods
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
1808.10245v1 (1).pdf
1808.10245v1 (1).pdf1808.10245v1 (1).pdf
1808.10245v1 (1).pdf
 
Secured Ontology Mapping
Secured Ontology Mapping Secured Ontology Mapping
Secured Ontology Mapping
 
[Emnlp] what is glo ve part ii - towards data science
[Emnlp] what is glo ve  part ii - towards data science[Emnlp] what is glo ve  part ii - towards data science
[Emnlp] what is glo ve part ii - towards data science
 
Semantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional ApproachSemantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional Approach
 

Más de Saeedeh Shekarpour

CEVO: Comprehensive EVent Ontology Enhancing Cognitive Annotation on Relations
CEVO: Comprehensive EVent Ontology  Enhancing Cognitive Annotation on RelationsCEVO: Comprehensive EVent Ontology  Enhancing Cognitive Annotation on Relations
CEVO: Comprehensive EVent Ontology Enhancing Cognitive Annotation on Relations
Saeedeh Shekarpour
 

Más de Saeedeh Shekarpour (7)

CEVO: Comprehensive EVent Ontology Enhancing Cognitive Annotation on Relations
CEVO: Comprehensive EVent Ontology  Enhancing Cognitive Annotation on RelationsCEVO: Comprehensive EVent Ontology  Enhancing Cognitive Annotation on Relations
CEVO: Comprehensive EVent Ontology Enhancing Cognitive Annotation on Relations
 
A quality type aware annotated corpus and lexicon for harassment research
A quality type aware annotated corpus and lexicon for harassment researchA quality type aware annotated corpus and lexicon for harassment research
A quality type aware annotated corpus and lexicon for harassment research
 
Windowing of attention
Windowing of attentionWindowing of attention
Windowing of attention
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems
 
Semantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSemantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked Data
 
Sina presentation in IBM
Sina presentation in IBMSina presentation in IBM
Sina presentation in IBM
 
Wi presentation
Wi presentationWi presentation
Wi presentation
 

Último

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Último (20)

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 

Metrics for Evaluating Quality of Embeddings for Ontological Concepts

  • 1. Metrics for Evaluating Quality of Embeddings for Ontological Concepts Faisal Alshargi University of Leipzig Saeedeh Shekarpour University of Dayton Tommaso Soru University of Leipzig Amit Sheth Kno.e.sis Center 1
  • 2. Motivation: the growth of Linked Open Data 2 March 2018 ~3000 Datasets >190 Triples
  • 3. Motivation: Knowledge Transformation It is necessary to dispose this valuable knowledge for extrinsic tasks such as natural language processing or data mining. ➔ the knowledge (i.e. schema level and instance level) has to be transformed from discrete representations to numerical representations (called embeddings). 3
  • 4. Research Gap and our Contribution Existing deficiency: is the lack of an evaluation framework for comprehensive and fair judgment on the quality of the embeddings of concepts Evaluation can be intrinsic or extrinsic Our Contribution: concerned with evaluating the quality of embeddings for concepts. It extends the state of the art by providing several intrinsic metrics for evaluating the quality of the embedding of concepts on three aspects: i. the categorization aspect, ii. the hierarchical aspect, and iii. the relational aspect. 4
  • 5. Ontological Concept Ontological concepts play a crucial role in ➔ capturing the semantics of a particular domain ➔ typing entities which bridge a schema level and an instance level ➔ determining valid types of sources and destinations for relations in a knowledge graph. Thus, the embeddings of the concepts are expected to truly reflect characteristics of ontological concepts in the embedding space. 5
  • 6. Contribution contributing in providing several metrics for evaluating the quality of the embedding of concepts from three perspectives: (i) how the embedding of concepts behaves for categorizing their instantiated entities; (ii) how the embedding of concepts behaves with respect to hierarchical semantics described in the underlying ontology; and (iii) how the embedding of concepts behaves with respect to relations 6
  • 7. Embedding Models (1) ➔ Word2vec: Skip-Gram Model and Continuous Bag of Words (CBOW) Model. learns two separate embeddings for each target word wi , (i) the word embedding and (ii) the context embedding. ➔ GloVe Model: The GloVe model (Pennington, Socher, and Manning 2014) is a global log-bilinear regression model for the unsupervised learning of word embeddings. It captures global statistics of words in a corpus and benefits the advantages of the other two models: (i) global matrix factorization and (ii) local context window methods. ➔ RDF2Vec Model: RDF2Vec (Ristoski and Paulheim 2016) is an approach for learning embeddings of entities in RDF graphs. It initially converts the RDF graphs into a set of sequences using two strategies: (i) Weisfeiler-Lehman Subtree RDF Graph Kernels, and (ii) graph random walks. Then, word2vec is employed for learning embeddings over these produced sequences. 7
  • 8. Embedding Models (2) ➔ Translation-based Models: The TransE and TransH models assume that the embeddings of both the entities and relations of a knowledge graph are represented in the same semantic space, whereas the TransR considers two separate embedding spaces for entities and relations. All three approaches share the same principle, summing the vectors of the subject and the predicate, one can obtain an approximation of the vectors of the objects. ➔ Other Knowledge Graph Embedding (KGE) Models. ◆ HolE (Holographic Embeddings) ◆ DistMult ◆ ComplEx ◆ ... All approaches above have shown to reach state-of-the-art performances on link prediction and triplet classification. 8
  • 9. Excluding of non-scalable KGE Approaches ➔ We selected the knowledge graph embedding approaches for the evaluation of our metrics among RDF2Vec, TransE and three of the methods described in the previous subsection (i.e., HolE, DistMult, and ComplEx) ➔ Differently from RDF2Vec, we could not find DBpedia embeddings pre-trained using any of the other approaches online, thus we conducted a scalability test on them to verify their ability to handle the size of DBpediawe decided to select only the more scalable RDF2Vec approach in our evaluation 9
  • 10. Data Preparation ➔ Corpus: DBpedia + Wikipedia ➔ From the DBpedia ontology, we selected 12 concepts, which are positioned in various levels of the hierarchy. Furthermore, for each concept, we retrieved 10,000 entities typed by it (in case of unavailability, all existing entities were retrieved). ➔ For each concept class, we retrieved 10,000 instances and their respective labels; in case of unavailability, all existing instances were retrieved. ➔ Then, the embeddings of these concepts as well as their associated instances were computed from the embedding models: (i) skip-gram, and (ii) CBOW and (iii) GloVe trained on Wikipedia and DBpedia 10
  • 11. Task 1: Evaluating the Categorization Aspect of Concepts in Embeddings ➔ Ontological concepts C categorize entities by typing them, mainly using rdf:type. In other words, all the entities with a common type share specific characteristics. For example, all the entities with the type dbo:Country have common characteristics distinguishing them from the entities with the type dbo:Person. ➔ Research Question: How far is the categorization aspect of concepts captured (i.e., encoded) by an embedding model? In other words, we aim to measure the quality of the embeddings for concepts via observing their behaviour in categorization tasks. 11
  • 12. Categorization metric ➔ In the context of unstructured data, this metric aligns a clustering of words into different categories ➔ how well the embedding of a concept performs as the background concept of the entities typed by it. ➔ To quantify this metric, we compute the averaged vector of the embeddings of all the entities having type c and then compute the cosine similarity of this averaged vector and the embedding of the concept Vc 12
  • 14. Coherence metric It measures whether or not a group of words adjacent in the embedding space are mutually related. Commonly, this relatedness task has been evaluated in a subjective manner (i.e. using a human judge). However, in the context of structured data we define the concept of relatedness as the related entities which share a background concept, a background concept is the concept from which a given entity is typed (i.e. inherited). For example, the entities dbr:Berlin and dbr:Sana’a are related because both are typed by the concept dbo:City 14
  • 15. Coherence metric A. Quantitative Evaluation: For the given concept c and the given radius n, we find the n-top similar entities from the pool (having the highest cosine similarity with Vc ). Then, the coherence metric for the given concept c with the radius n is computed as the number of entities having the same background concept as the given concept; B. Qualitative Evaluation: visualization approach 15
  • 17. Task 2: Evaluating Hierarchical Aspect of Concepts in Embeddings There is a relatively long standing research for measuring the similarity of two given concepts s(ci , cj ) either across ontologies or inside a common ontology. Typically, the similarity of concepts is calculated at the lexical level and at the conceptual level. We present three metrics which can be employed for evaluating the embeddings of concepts with respect to the hierarchical structure and the semantics. 17
  • 18. Absolute Error We introduce the metric absolute semantic error which quantitatively measures the quality of embeddings for concepts against their semantic similarity. 18
  • 19. Semantic Relatedness metric We tune this metric from word embedding for knowledge graphs by exchanging words for concepts. Typically, this metric represents the relatedness score of two given words. In the context of a knowledge graph, we give a pair of concepts to human judges (usually domain experts) to rate the relatedness score on a predefined scale, then, the correlation of the cosine similarity of the embeddings for concepts is measured with human judge scores using Spearman or Pearson. 19
  • 20. Visualization The embeddings of all concepts of the knowledge graph can be represented in a two-dimensional visualization. This approach is an appropriate means for qualitative evaluation of the hierarchical aspect of concepts. The visualizations are given to a human who judges them to recognize patterns revealing the hierarchical structure and the semantics. 20
  • 22. Task 3: Evaluating Relational Aspect of Concepts in Embedding A. Relation Validation: whether or not the inferred relation is compatible with the type of entities engaged. For example, the relation capital is valid if it is recognized between entities with the types country and city. B. Domain and Range of Relations: This validation process in a knowledge graph is eased by considering the axioms rdfs:domain and rdfs:range of the schema properties and rdf:type of entities. The expectation from embeddings generated for relations is to truly reflect compatibility with the embeddings of the concepts asserted in the domain and range 22
  • 23. Metrics for evaluating the quality of the embeddings for concepts and relations A. Selectional preference: This metric presented in assesses the relevance of a given noun as a subject or object of a given verb (e.g. people-eat or city-talk). We tune this metric for knowledge graphs as pairs of concept-relation which are represented to a human judge for the approval or disapproval of their compatibility B. Semantic transition distance: The inspiration for this metric comes from where Mikolov demonstrated that capital cities and their corresponding countries follow the same distance. We introduce this metric relying on an objective assessment. This metric considers the relational axioms (i.e. rdfs:domain and rdfs:range) in a knowledge graph. 23
  • 24. Discussion ➔ there is no single embedding model which shows superior performance in every scenario. For example, while the skip-gram model performs better in the categorization task, the GloVe and CBOW model perform better for the hierarchical task. Thus, one conclusion is that each of these models is suited for a specific scenario. ➔ The other conclusion is that it seems that each embedding model captures specific features of the ontological concepts, so integrating or aligning these embeddings can be a solution for fully capturing all of these features. ➔ Although our initial expectation was that the embeddings learned from the knowledge graph (i.e. DBpedia) should have higher quality in comparison to the embeddings learned from unstructured data (i.e. Wikipedia), in practice we did not observe that as a constant behaviour. 24
  • 25. Conclusion ➔ Required to generate embeddings with respect to the these three aspects ➔ Improving and extending the evaluation scenario 25
  • 26. 26