Ontologies are used in numerous research disciplines and commercial applications to uniformly and semantically annotate real-world objects. Often there are multiple interrelated ontologies in a domain, and repositories such as BioPortal already provide mappings (links) between these ontologies. Especially manually verified mappings can be reused 1) to create new mappings between so far unconnected sources, and 2) to avoid an expensive re-identification, e.g. when the underlying ontologies change.
New ontology mappings can be determined by reusing and composing previously determined mappings that involve intermediate ontologies. The composition of mappings is very efficient and can achieve mappings of very high quality especially for valuable intermediate ontologies. Moreover, due to a rapid development of application domains, ontologies are frequently changed to include up-to-date knowledge. These changes dramatically influence dependent data as well as applications like ontology mappings and ontology-based annotations. Thus existing mappings may become invalid and need to be migrated to the most recent ontology versions, such that users and dependent applications can consume up-to-date mappings.
In this talk, I will give a brief introduction to ontology mappings and provide an overview on reuse-based approaches for mapping creation and maintenance, currently studied at the Database Group at Leipzig University.
1. 1
REUSE OF ONTOLOGY MAPPINGS
Anika Groß,
Database Group, Universität Leipzig
Canberra, March 2016
2. 2
• Structured representation of knowledge
• Used for annotation as standardized semantic
description of object properties
• Very large ontologies
in the life sciences
ONTOLOGIES
Anatomy Molecular
biology
ChemistryMedicine
Tissue
Anatomic Structure,
System, or Substance
Organ
Lung SkinKidney …
…
4. 4
• Overlapping ontologies → creation of mappings/alignments
• Useful for data integration, analysis across sources …
• Ontology mapping: set of semantic correspondences (links)
between concepts of different ontologies
ONTOLOGY MAPPINGS
𝑶𝟐
tail
head
neck
limbs
limb segments
body
𝑶𝟏
head
lower extremities
limbs
upper extremities
body
neck
trunk
tail
=
=
=
=
<
<
=
𝑶𝑴 𝑶𝟏,𝑶𝟐
• Manual or semi-
automatic identification
(matching)
5. 5
• Ontologies are not static!
• Research, new knowledge continuous changes
• Release of new versions
• Ontology changes
→ Impact on dependent mappings and applications?
EVOLUTION OF ONTOLOGIES AND MAPPINGS
𝑶𝟏
0
𝑶𝟐
𝑶𝑴 𝑶𝟏,𝑶𝟐
6. 6
REUSE EXISTING MAPPINGS TO …
→ create new ontology mappings
• “Indirect” matching: combine existing mappings to create
new mappings between so far unconnected sources
→ create up-to-date ontology mappings
• Migration of outdated mappings to currently valid
ontology versions
Ontologies, ontology mappings, ontology evolution
2) Composition-based ontology matching
3) Adaptation of ontology mappings
4) Outlook
7. 7
ONTOLOGY MATCHING WORKFLOW
• Manual creation of mappings between very large
ontologies is too labor-intensive
• Semi-automatic generation of semantic correspondences:
linguistic, structural, instance-based matching techniques
Matching
Mapping
sim(O1.a, O2.b) = 0.8
sim(O1.a, O2.c) = 0.5
sim(O1.c, O2.c) = 1.0
further input,
e.g. instances, dictionary
…
O1
O2
Pre-
processing
Post-
processing
8. 8
?
• Indirect composition-based matching
• Via intermediate ontology (IO):
important hub ontology,
synonym dictionary, …
MAPPING COMPOSITION
MA_0001421 UBERON:0001092 NCI_C32239
Synonym: Atlas Name: atlas
Name: C1 VertebraName: cervical vertebra 1 Synonym: cervical vertebra 1
Synonym: C1 vertebra
• Find new correspondences via composition
• Reuse existing mappings to
• Increase match quality & save computation time
IO
O1 O2
Groß, Hartung, Kirsten, Rahm: Mapping Composition for Matching Large Life Science
Ontologies. 2nd International Conference on Biomedical Ontology (ICBO), 2011
9. 9
• Use mappings to intermediate ontologies IO1, …, IOk
to indirectly match O1 and O2
• Reduce matching effort by reusing mappings to IO
→ very fast composition
INDIRECT MATCHING
...
IO1
IO2
IOk
O1 O2
...
O1
O2
On
HOOnew
→ IO should have a significant
overlap with O1 and O2
→ IO1, …, IOk may complement
each other
→ Centralized hub HO
→ many mappings to other ontologies
→ Onew aligned with any Oi via HO
10. 10
• (Binary) compose operator
• Composes two mappings 𝑀 𝑂1,𝐼𝑂 and 𝑀𝐼𝑂,𝑂2 to create
a new mapping 𝑀 𝑂1,𝑂2:
COMPOSE OPERATOR
11. 11
O1
IO1
O2
occ = 1: CMO1,O2 = {(a,a),(b,b),(c,c)}
occ = 2: CMO1,O2 = {(a,a)}
Input: Two ontologies O1 and O2, list of intermediate ontologies IO1… IOk,
occurrence count occ
Output: Composed mapping CMO1,O2
COMPOSEMATCH
a
b c
d e
a
b
g h
a
b c
d
f
a
i c
IO2
MapList empty
for each IOi IO do
MO1,IOi getMapping(O1, IOi)
return 𝑚𝑒𝑟𝑔𝑒(MapList, occ)
MapList.add(𝑐𝑜𝑚𝑝𝑜𝑠𝑒(MO1,IOi, MIOi,O2))
MIOi,O2 getMapping(IOi, O2)
end for
MapList
(c,c ), (a,a)
(a,a), (b,b)
12. 12
EVALUATION SETUP
• Match problem
• Adult Mouse Anatomy (MA)
• NCI Thesaurus Anatomy part (NCIT)
Uberon
UMLS
MA NCIT
RadLex
FMA
• Gold standard ~1500 correspondences
• Precompute mappings using a match strategy
~5000
~88,000
~30,800
~81,000
~2,700 ~3,300
#concepts
13. 13
EVALUATION SETUP
• Match problem
• Adult Mouse Anatomy (MA)
• NCI Thesaurus Anatomy part (NCIT)
Preprocessing
Normalization
Linguistic Matcher
(Name, synonyms,
Trigram t = 0.8)
Selection &
Postprocessing
Uberon
UMLS
MA NCIT
RadLex
FMA
• Gold standard ~1500 correspondences
~5000
~88,000
~30,800
~81,000
~2,700 ~3,300
#concepts
14. 14
• Direct match result compared to composeMatch via each IO
• Additional matching of unmatched parts (extendMatch)
RESULTS
88.2%
86%
• Uberon & UMLS → best evaluated intermediate ontologies
Intermediate Ontology IO
15. 15
• Combination of four composed mappings
• Correspondences have to occur in at least 1, …, 4 mappings
RESULTS
union(occ=1)
F-Measure 90.2
Precision 92.7
Recall 87.8
Higher occurrence
→ Recall ↓
extendMatch
→ Recall ↑
16. 16
• Combination of four composed mappings
• Correspondences have to occur in at least 1, …, 4 mappings
RESULTS
http://oaei.ontologymatching.org/[year]/anatomy
Top Results OAEI
Other systems later adopted similar techniques to make use of domain
specific background knowledge (e.g. including Uberon, UMLS)
17. 17
COMPOSITION VIA SEVERAL SOURCES
• Many “mapping path” alternatives…
Geo
Names
Linked
GeoData
PubMed
Wrong domain
Hartung, Groß, Rahm: Composition Methods for Link Discovery. Proc. of 15.
GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2013
• Which intermediate source(s) should be used?
S T
A
B
C
S T
A
B
C
18. 18
COMPOSITION VIA SEVERAL SOURCES
• Many “mapping path” alternatives…
Geo
Names
Linked
GeoData
PubMedWorldFact
Book
Too special
Hartung, Groß, Rahm: Composition Methods for Link Discovery. Proc. of 15.
GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2013
• Which intermediate source(s) should be used?
S T
A
B
C
S T
A
B
C
19. 19
COMPOSITION VIA SEVERAL SOURCES
• Many “mapping path” alternatives…
Geo
Names
Linked
GeoData
PubMedWorldFact
Book
DBpedia
Ok, universal knowledge source
Hartung, Groß, Rahm: Composition Methods for Link Discovery. Proc. of 15.
GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2013
• Which intermediate source(s) should be used?
S T
A
B
C
S T
A
B
C
21. 21
EFFECTIVENESS OF MAPPINGS FOR COMPOSITION
Source S Target TIntermediate IMS,I MI,T
domain(MS,I) range(MS,I) domain(MI,T) range(MI,T)
Binary:
n-ary:
1. Mapping coverage in S and T should be high
2. Overlap of entities in I should be high
22. 22
Mapping-based
• Take all mapping paths between S and T
• Different path filtering methods
1) Effectiveness: k most effective mapping
paths (selEff)
2) Complement: k best complementing
mapping paths w.r.t. S and T (selComp)
Link-based
• Select best routes in a graph of links between
entities/concepts (not on “mapping level”)
• Graph-based approach
• Transformation of S, T and mappings
in M into a weighted, directed graph
• Application of Shortest-Path algorithm
to solve mapping composition problem
DIFFERENT COMPOSITION STRATEGIES
Hartung, Groß, Rahm: Composition Methods for Link Discovery. Proc. of 15.
GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2013
23. 23
Reuse of mappings and composition strategies
→ very useful to create new correspondences/links
EVALUATION
60
70
80
90
100
NYT-DBp NYT-FB NYT-GeoN MA-NCIT
F-measure
all selEff selCompl link
• selEff, selComp, link
always better than
naïve (all) approach
Geography
(Instance Matching track)
Anatomy
track
• Selection strategies
better for Anatomy
• Link strategy slightly
better for Geography
+ Best Compose
approach always
better than direct
match
24. 24
REUSE EXISTING MAPPINGS TO …
→ create new ontology mappings
• “Indirect” matching: combine existing mappings to create
new mappings between so far unconnected sources
→ create up-to-date ontology mappings
• Migration of outdated mappings to currently valid
ontology versions
Ontologies, ontology mappings, ontology evolution
Composition-based ontology matching
2) Adaptation of ontology mappings
3) Outlook
25. 25
𝑶𝟏′
𝑶𝟐′
𝑶𝟏
𝑶𝟐
𝑂𝑀 𝑂1,𝑂2 𝑂𝑀 𝑂1′,𝑂2′ ?
Requirements
• High mapping quality
• Mapping consistency
• Include new concepts
• Reduction of manual effort, involve
user feedback
• Support of semantic mappings
• Mappings can become invalid → need to be updated
• Reuse existing mappings (avoid full re-determination)
MAPPING ADAPTATION PROBLEM
Groß: Evolution von ontologiebasierten Mappings in den Lebenswissenschaften,
Dissertation, Universität Leipzig, 2014.
Groß, Dos Reis, Hartung, Pruski, Rahm: Semi-automatic adaptation of mappings between life science
ontologies. Proc. 9th Intl. Conference on Data Integration in the Life Sciences (DILS), 2013.
29. 29
• COnto-Diff: Diff Evolution Mapping 𝑑𝑖𝑓𝑓(𝑂 𝑜𝑙𝑑, 𝑂 𝑛𝑒𝑤)
• Based on match mapping between two ontology versions 𝑂 𝑜𝑙𝑑 and 𝑂 𝑛𝑒𝑤
• Set of basic and complex change operations
addC, addR, …
delC, delR, toObsolete, …
split, merge, substitute, …
• GENERIC ONTOLOGY MATCHING AND MAPPING MANAGEMENT
• Generic infrastructure to manage and analyze evolution of
ontologies and mappings
GOMMA
30. 30
• Combine ‘old‘ ontology mapping with ontology evolution mapping
(between old and new version): compose-Operator
• Reuse and adapt existing correspondences
COMPOSITION-BASED ADAPTATION
• Semantic correspondence types?
+ Matching added concepts (𝑂1’𝑂1, 𝑂2’ 𝑂2)
tail
head
neck
limbs
lower extremities limb segments
limbs
upper extremities
body
neck
body
𝑶𝟏 𝑶𝟐
trunk
limbs
head and neck
body
𝑶𝟐‘
lower limbs
upper limbs
==
=
=
=
=
=
<
<
>
>
<
<
tail
head
𝑶𝑴 𝑶𝟏,𝑶𝟐 𝑶𝑴 𝑶𝟐,𝑶𝟐′
trunk
semType:
= equivalent
< less general
> more general
31. 31
𝑶𝑴 𝑶𝟏,𝑶𝟐′
• Combine ‘old‘ ontology mapping with ontology evolution mapping
(between old and new version): compose-Operator
• Reuse and adapt existing correspondences
COMPOSITION-BASED ADAPTATION
• Semantic correspondence types?
+ Matching added concepts (𝑂1’𝑂1, 𝑂2’ 𝑂2)
lower extremities
limbs
upper extremities
body
neck
𝑶𝟏
trunk
limbs
head and neck
body
𝑶𝟐‘
lower limbs
upper limbs
tail
head
trunk
semType:
= equivalent
< less general
> more general
?
33. 33
• Modular, flexible adaptation approach
• Individual migration for different change operations
using Change Handler 𝐶𝐻
• Reuse and adaptation of existing correspondences
DIFF-BASED ADAPTATION OF ONTOLOGY MAPPINGS
34. 34
DIFF-BASED ADAPTATION OF ONTOLOGY MAPPINGS
tail
head
neck
limbs
lower extremities limb segments
limbs
upper extremities
body
neck
body
𝑶𝟏 𝑶𝟐
trunk
limbs
head and neck
body
𝑶𝟐‘
lower limbs
upper limbs
trunk
=
>
=
=
=
=
=
=
<
<
>
<
<
tail
head
𝑶𝑴 𝑶𝟏,𝑶𝟐 𝑶𝑴 𝑶𝟐,𝑶𝟐′
35. 35
DIFF-BASED ADAPTATION OF ONTOLOGY MAPPINGS
tail
head
neck
limbs
lower extremities limb segments
limbs
upper extremities
body
neck
body
𝑶𝟏 𝑶𝟐
trunk
limbs
head and neck
body
𝑶𝟐‘
lower limbs
upper limbs
trunk
=
>
=
=
=
=
=
=
<
<
>
<
<
tail
head
𝑶𝑴 𝑶𝟏,𝑶𝟐 𝑶𝑴 𝑶𝟐,𝑶𝟐′
merge({head, neck}, head and neck)
addC(trunk)
delC(tail)
𝒅𝒊𝒇𝒇 𝑶𝟐,𝑶𝟐′
split (limb segments, {lower limbs, upper limbs})
36. 36
DIFF-BASED ADAPTATION OF ONTOLOGY MAPPINGS
DiffAdapt 𝑶𝑴 𝑶𝟐,𝑶𝟏, 𝒅𝒊𝒇𝒇 𝑶𝟐,𝑶𝟐′, 𝑶𝟐, 𝑶𝟐′, 𝑶𝟏, 𝑪𝑯
1. Determination of affected correspondences 𝑶𝑴𝒊𝒏𝒇𝒍 using 𝒅𝒊𝒇𝒇 𝑶𝟐,𝑶𝟐′
2. Reuse of unaffected mapping part: 𝑂𝑀 𝑂2′,𝑂1← 𝑂𝑀 𝑂2,𝑂1 𝑂𝑀𝑖𝑛𝑓𝑙
3. For each 𝑐ℎ ∈ 𝐶𝐻
• Adaptation of 𝑂𝑀𝑖𝑛𝑓𝑙 using a change hander strategy (𝒅𝒊𝒇𝒇 𝑶𝟐,𝑶𝟐′, 𝑶𝟐, 𝑶𝟐′
, 𝑶𝟏)
4. Union of 𝑂𝑀𝑖𝑛𝑓𝑙 with unaffected mapping part:
𝑂𝑀 𝑂2′,𝑂1← 𝑂𝑀 𝑂2′,𝑂1 ∪ 𝑂𝑀𝑖𝑛𝑓𝑙
tail
head
neck
limbs
lower extremities limb segments
limbs
upper extremities
body
neck
body
𝑶𝟏 𝑶𝟐
trunk
limbs
head and neck
body
𝑶𝟐‘
lower limbs
upper limbs
trunk
=
>
=
=
=
=
=
=
<
<
>
<
<
tail
head
𝑶𝑴 𝑶𝟏,𝑶𝟐 𝑶𝑴 𝑶𝟐,𝑶𝟐′
𝑶𝑴𝒊𝒏𝒇𝒍
Unaffected
39. 39
70
75
80
85
90
95
100
Unaff CA CA+m DA DA+m
MAPPING QUALITY SCT-NCIT
• Unaffected correspondences only (Unaff ): good results
• CA: Precision ↓
• CA+m: Recall ↑ , F-Measure ≈ 90%
• Diff-based approaches: increased quality, especially Precision ↑
• DA+m: best quality, F-Measure ≈ 94%
RecallUnaff
F-MeasureUnaff
Precision Recall F-Measure
Composition Diff
40. 40
Adaptation Strategy
1) Automatic detection of consistent mappings
w.r.t. new ontology version
2) Recommendations for new correspondences
→ Aim: complete mapping
3) Expert validation of correspondence (𝑡𝑜𝑉𝑒𝑟𝑖𝑓𝑦 status)
SEMI-AUTOMATIC MAPPING ADAPTATION
High mapping quality
Consistent mapping
New correspondences for new concepts
Reduction of manual effort
Consider mapping semantics
41. 41
• Ontology matching and entity linking
• Integration of larger sets of heterogeneous sources:
holistic matching and reuse of clustered entities
• Semantic enrichment with concepts of ontologies
• Interactive tools for link verification
• Mapping semantics
• Use of semantic relationships (is-a, part-of, …) in
mappings and Diff
• Evolution and adaptation of ontology-based annotations
OUTLOOK