3. Bibliography
[1] Jerome Euzenat and Pavel Shvaiko. 2010. Ontology Matching (1st ed.).
Springer Publishing Company, Incorporated.
[2] Namyoun Choi, Il-Yeol Song, and Hyoil Han. 2006. A survey on ontology
mapping. SIGMOD Rec.35, 3 (September 2006), 34-41.
[3] Yannis Kalfoglou and Marco Schorlemmer. 2003. Ontology mapping: the
state of the art. Knowl. Eng. Rev. 18, 1 (January 2003), 1-31.
[4] Noy, N., 2005. Ontology Mapping and Alignment. Search, p.1-34. Available
at: http://www.aifb.uni-karlsruhe.de/WBS/meh/foam/.
[5] Casanova, M. A., 2012. Tecnologias de Banco de Dados para a Web
Semântica - Módulo 9a - Ontologias - Matching.
5. Context
● We have to deal with heterogeneity
● Different models are based on different
domains of knowledge and use different
tools, at different detail levels
● Distributed nature of ontology development
has lead to different ontologies in the same
or overlapping domains
6. The need for ontology matching
● Creating global ontologies from local ontologies
● Reuse information between ontologies
● Dealing with heterogeneity
● Queries across multiple distributed resources
● Data transformation
9. What is ontology matching?
It is the process of finding relationships
or correspondences between entities of
different ontologies.
entities - classes, instances, properties
or formulas
14. Classifying ontology matching in
regard to the use
● Matching local ontologies to global ontologies
● Matching ontologies of complementary domains
● Merging two ontologies of the same domain
15.
16. Synthetic Classifications
● Granularity/Input Interpretation Layer
○ e.g. element- or structure-level
● Kind of Input Layer
○ Classification based on the kind of input used by
elementary matching techniques
● Basic Techniques Layer
○ Classification based on how input information is
interpreted
17. Granularity/Input Interpretation Layer
● Element-level matching techniques
○ Analysing entities or instances in isolation
○ Ignoring their relations with other entities or their
instances
● Structure-level techniques
○ Analysing how entities or their instances appear
together in a structure (e.g. by representing
ontologies as a graph)
18. Granularity/Input Interpretation Layer
Syntactic techniques
○ Interpret the input with regard to its sole structure
External techniques
○ Uses external resources of a domain and common
knowledge
Semantic techniques
○ Interpret the input by using model-theoretic
semantics
19.
20. Kind of Input Layer
● Terminological
○ Strings found in the ontology descriptions
● Structural
○ Structures found in the ontology descriptions
● Semantics
○ Requires some semantic interpretation of the
ontology
● Extensional
○ Use data instances
● In some papers, semantic=logic;
extensional=semantic
21. Kind of Input Layer (Second level)
● Terminological
○ String-based: terms as sequences of characters
○ Linguistic: interpretation of the terms as linguistic
objects
● Structural
○ Internal: consider the internal structure of entities
○ Relational: consider the relation of entities with other
entities
22.
23. Basic Techniques Layer
A label can be interpreted as
○ A string (a sequence of letters)
○ A word or a phrase in some natural language
A hierarchy can be considered as
○ A graph
○ A taxonomy
24. Basic Techniques Layer
Element-level
● String-based
● Language-based
● Based on linguistic resources
● Constraint-based
● Alignment reuse
● Based on upper level and domain specific formal
ontologies
26. Element-level Techniques
● String-based techniques
● The more similar the strings, the more likely they
are to denote the same concepts
● Distance functions map a pair of strings to a real
number
● Language-based techniques
● Based on natural language processing techniques
exploiting morphological properties of the input
words
27. Element-level Techniques
● Constraint-based techniques
● Deal with the internal constraints being applied to the
definitions of entities, such as types, cardinality of
attributes, etc
● Linguistic resources
● Lexicons or domain specific thesauri, used to match
words based on linguistic relations between them like
synonyms, hyponyms, etc
28. Element-level Techniques
● Alignment reuse
● Record alignments of previously matched
ontologies
● Upper level and domain specific ontologies
● Used as external sources of common knowledge
29. Structure-level Techniques
● Graph-based techniques
● Treat input ontologies as labelled graphs
● If two nodes from two ontologies are similar, their
neighbours may also be somehow similar
● Taxonomy-based techniques
● is-a links connect terms that are already similar,
therefore their neighbours may be also somehow
similar
33. Name-based Techniques
● They can be applied to the name, the label
or the comments of entities in order to find
those which are similar
● They can be used for comparing class
names and/or URIs
34. String-based methods
● Based on string similarity only
● Useful if conceptual schemas (or ontologies) use
very similar strings to denote the same concepts
● Yield a low similarity, if schemas use synonyms with
different syntax
● Yield many false positives, if pairs of strings with low
similarity are selected
36. String-based methods
Levenshtein (edit) distance
● Measure the similarity between two strings by
the minimum number of insertions, deletions, and
substitutions of characters required to transform
one string into the other
● Example:
(“Gaming”, “Games”) = 2 substitutions [“e” by “i” and “n” by “s”]
+ 1 deletion [“g”]
=3
37. String-based methods
Token-based distance
● Usually applied to the complete description of a
concept
● Treats strings as a bag of words (multisets of
substrings)
● May split strings into independent tokens
● Example: "InProceedings" is represented by
● the bag of words {In, Proceedings}
● or a bag of substrings of length 3 {InP, roc, eed, ing, s}
38. String-based methods
Bag of words represented as a vector
● Each dimension corresponds to a token
● Each position of the vector is the number of occurrences of the
token
40. Language-based methods
Intrinsic methods
● reduce each term to a normal form to facilitate
matching
● use traditional natural language processing
techniques
● stopword elimination
● tokenization: segment strings into sequences of tokens
● lemmatization: reduce words to normal forms
● suppress tense, gender and number
42. Language-based methods
Extrinsic methods
Use dictionaries, lexicons and terminologies to
help match terms from different schemas or
ontologies
● e.g. a terminology - a thesaurus which very often
contains phrases rather than single words
● deal with synonyms
● word sense disambiguation
43. Language-based methods
WordNet – an example of an external resource
●
● an electronic lexical database for English
● based on the notion of synsets (sets of synonyms)
● a synset denotes a concept or a sense of a group of terms
● WordNet also provides:
● an hypernym structure (superconcept / subconcept)
● a meronym relation (part of)
● textual descriptions of the concepts (glossary)
44. Language-based methods
● Example
● WordNet 2.0 entry for the word author
author1 noun: Someone who originates or causes or initiates something;
Example ‘he was the generator of several complaints’. Synonym
generator, source. Hypernym maker. Hyponym coiner.
author2 noun: Writes (books or stories or articles or the like) professionally
(for pay). Synonym writer2. Hypernym communicator. Hyponym
abstractor, alliterator, authoress, biographer, coauthor, commentator,
contributor, cyberpunk, drafter, dramatist, encyclopedist, essayist, folk
writer, framer, gagman, ghostwriter, Gothic romancer, hack, journalist,
libretist, lyricist, novelist, pamphleter, paragrapher, poet, polemist,
rhymer, scriptwriter, space writer, speechwriter, tragedian, wordmonger,
word-painter, wordsmith, Andersen, Assimov...
author3 verb.: Be the author of; Example ‘She authored this play’.
Hypernym write. Hyponym co-author, ghost.
45. Language-based methods
● Example
● fragment of the WordNet hierarchy (limited to nouns) for
“illustrator”, “author”, “creator”, “person”, “writer”
(“author”) =
{A1, A2W2}
(“writer”) =
{W1, A2W2, W3}
46. Language-based methods
Example – Synonym Similarity
●
(s,t) = 1 iff (s) (t) (terms have a synset in common)
= 0 otherwise
(“author”) = {A1, A2W2}
(“writer”) = {W1, A2W2, W3}
(“author”) (“writer”)
48. Structure-based techniques
Internal structure (constraint-based approaches)
● based on the internal structure of classes
● calculate the similarity between two classes based on
○ the set of their properties, including keys
○ the range of their properties (attributes and relations)
○ the cardinality of their properties
○ the transitivity or symmetry of their properties
50. Structure-based techniques
Internal structure (constraint-based approaches)
● positive point:
● can be used to eliminate incompatible matches
● negative points:
● does not provide much information about the classes to
compare
● different classes may have properties with the same datatypes
● different models of a concept use different, and incompatible,
types
● approach suggested:
● use method in combination with other methods
51. Structure-based techniques
Relational Structure
● similarity between two concepts
● based on the relations between the concepts with other
concepts
○ similar concepts should have similar related concepts
● given a relation r, a pair of concepts may be:
○ directly related through r
○ inversely related through r
○ transitively related through r
○ the maximal elements of r+
53. Structure-based techniques
Taxonomic Structure
● Similarity between two concepts
○ Based on the graph of the subClassOf relation
○ Example
■ (e,e’) = number of edges of the taxonomy between e and e’,
normalized by dividing by the longest path
54. Structure-based techniques
Bounded path matchers
● use anchors relating paths from two distinct
taxonomies
● take two paths with links from two distinct taxonomies
● compare terms and their positions along these paths
● identify similar terms
55. Structure-based techniques
Example
“Book -> Volume” and
“Popular -> Autobiography”
implies that possibly
“Science -> Biography” or
“Science -> Essay”
56. Structure-based techniques
Summary of relational structure methods
● Powerful methods to match conceptual schemas and
ontologies
○ Allow relations between concepts to be taken into account
● Often used in combination with internal structural and
terminological methods
58. Extensional techniques
● Jaccard Similarity: Given two sets A and B, let P(X)
be the probability of a random instance to be in the set
X.
● Note that the Jaccard Similarity reaches 1 when A = B
and 0 when they are disjoint.
59. Semantic-based techniques
● Semantic-based techniques rely on using the axioms of
ontologies and deductive methods.
● But for an inductive task like ontology matching, they do
not perform well alone. So, a preprocessing is needed.
● Therefore, we need, firstly, to suppress the lack of a
common ground between the ontologies.
● For those reasons, authors propose the use of semantic
techniques in two steps: the so-called anchoring step
and the deriving relations step.
60. Semantic-based techniques
● Anchoring: is matching ontologies o' and o'' to the
background ontology o. This can be done using any
method described so far.
● Deriving relations: is the (indirect) matching of
ontologies o' and o'' by using the correspondences
discovered during the anchoring step.
● Example: Micro-company: Has at most 5 employees.
SME: Has at most 10 associates.
anchoring: employees ---> EMPLOYEE <--- associates
Micro-company ---> FIRM <--- SME
deriving relations: Micro-company is a subclass of SME.
62. Matching strategies - Global
Methods
● Aggregating the results of the basic methods
● Developing a strategy for computing these
similarities
● Learning from data the best method and the
best parameters for matching
● Using probabilistic methods to combine
matchers or to derive missing correspondences
● Involving users in the loop
● Extracting the alignments from the resulting
(dis)similarity
66. Similarity aggregation
Compound similarity is concerned with the
aggregation of heterogeneous similarities
○ e.g. A single similarity measure composed by the
similarity obtained from their names, the similarity of
their superclasses, the similarity of their instances
and that of their properties