Barangay Council for the Protection of Children (BCPC) Orientation.pptx
FCA-MERGE: Bottom-Up Merging of Ontologies
1. FCA-M ERGE: Bottom-Up Merging of Ontologies
Gerd Stumme and Alexander Maedche
Institute AIFB, University of Karlsruhe, 76128 Karlsruhe, Germany
stumme,maedche @aifb.uni-karlsruhe.de
http://www.aifb.uni-karlsruhe.de/WBS
Abstract for comparisons, these approaches do not offer a structural
description of the global merging process.
Ontologies have been established for knowledge We propose the new method FCA–M ERGE for merging
sharing and are widely used as a means for con- ontologies following a bottom-up approach which offers a
ceptually structuring domains of interest. With global structural description of the merging process. For both
the growing usage of ontologies, the problem of source ontologies, it extracts instances from a given set set of
overlapping knowledge in a common domain be- domain-specific text documents by applying natural language
comes critical. We propose the new method processing techniques. Based on the extracted instances we
FCA–M ERGE for merging ontologies following a apply mathematically founded techniques taken from Formal
bottom-up approach which offers a structural de- Concept Analysis [Wille, 1982; Ganter and Wille, 1999 ] to
scription of the merging process. The method derive a lattice of concepts as a structural result of FCA–
is guided by application-specific instances of two M ERGE. The produced result is explored and transformed to
given source ontologies, that are to be merged. We the merged ontology by the ontology engineer. The extrac-
apply techniques from natural language processing tion of instances from text documents circumvents the prob-
and formal concept analysis to derive a lattice of lem that in most applications there are no objects which are
concepts as a structural result of FCA–M ERGE. simultaneously instances of both ontologies, and which could
The generated result is then explored and trans- be used as a basis for identifying similar concepts.
formed into the merged ontology with human in- The remainder of the paper is as follows. We briefly in-
teraction. troduce some basic definitions concentrating on a formal def-
inition of what an ontology is and recall the basics of For-
mal Concept Analysis in Section 2. Before we present our
1 Introduction generic method for ontology merging in Section 4, we give
Ontologies have been established for knowledge sharing and an overview over existing and related work in Section 3. Sec-
are widely used as a means for conceptually structuring do- tion 5 provides a detailed description of FCA–M ERGE. Sec-
mains of interest. With the growing usage of ontologies, the tion 6 summarizes the paper and concludes with an outlook
problem of overlapping knowledge in a common domain oc- on future work.
curs more often and becomes critical. Domain-specific on-
tologies are modeled by multiple authors in multiple settings. 2 Ontologies and Formal Concept Analysis
These ontologies lay the foundation for building new domain- In this section, we briefly introduce some basic definitions.
specific ontologies in similar domains by assembling and ex- We thereby concentrate on a formal definition of what an on-
tending multiple ontologies from repositories. tology is and recall the basics of Formal Concept Analysis.
The process of ontology merging takes as input two source
ontologies and returns a merged ontology based on the given 2.1 Ontologies
source ontologies. Manual ontology merging using con- There is no common formal definition of what an ontology is.
ventional editing tools without intelligent support is diffi- However, most approaches share a few core items: concepts,
cult, labor intensive and error prone. Therefore, several sys- a hierarchical IS-A-relation, and further relations. For sake
tems and frameworks for supporting the knowledge engi- of generality, we do not discuss more specific features like
neer in the ontology merging task have recently been pro- constraints, functions, or axioms here. We formalize the core
posed [Hovy, 1998; Chalupsky, 2000; Noy and Musen, 2000; in the following way.
McGuinness et al, 2000 ]. The approaches rely on syntactic
and semantic matching heuristics which are derived from the Definition: A (core) ontology is a tuple Ç
behavior of ontology engineers when confronted with the task ´ × Ê µ, where is a set whose elements are called
of merging ontologies, i. e. human behaviour is simulated. concepts, × is a partial order on (i. e., a binary rela-
Although some of them locally use different kinds of logics tion × ¢ which is reflexive, transitive, and anti-
2. symmetric), and Ê is a set whose elements are called relation syntactic rewriting supports the translation between two dif-
names (or relations for short), and Ê · is a function ferent knowledge representation languages, semantic rewrit-
which assigns to each relation name its arity. ing offers means for inference-based transformations. It ex-
plicitly allows to violate the preservation of semantics in
As said above, the definition considers the core elements of trade-off for a more expressive, flexible transformation mech-
most languages for ontology representation only. It is possi- anism.
ble to map the definition to most types of ontology represen- In [McGuinness et al, 2000 ] the Chimaera system is de-
tation languages. Our implementation, for instance, is based scribed. It provides support for merging of ontological terms
on Frame Logic [Kifer et al, 1995 ]. Frame Logic has a well- from different sources, for checking the coverage and cor-
founded semantics, but we do not refer to it in this paper. rectness of ontologies and for maintaining ontologies over
time. Chimaera supports the merging of ontologies by co-
2.2 Formal Concept Analysis alescing two semantically identical terms from different on-
We recall the basics of Formal Concept Analysis (FCA) as far tologies and by identifying terms that should be related by
as they are needed for this paper. A more extensive overview subsumption or disjointness relationships. Chimaera offers a
is given in [Ganter and Wille, 1999 ]. To allow a mathematical broad collection of functions, but the underlying assumptions
description of concepts as being composed of extensions and about structural properties of the ontologies at hand are not
intensions, FCA starts with a formal context defined as a triple made explicit.
à ´ µ
Å Á , where is a set of objects, Å is a set of Prompt [Noy and Musen, 2000 ] is an algorithm for ontol-
attributes, and Á is a binary relation between and Å (i. e. ogy merging and alignment embedded in Prot´ g´ 2000. It
e e
Á ´ µ
¢ Å ). Ñ ¾ Á is read “object has attribute Ñ”. starts with the identification of matching class names. Based
ѾŠ¾ on this initial step an iterative approach is carried out for per-
Definition: For , we define ¼
´ µ
Ñ ¾ Á and, for Å , we define ¼ ¾ forming automatic updates, finding resulting conflicts, and
Ѿ ´ µ
Ñ ¾Á .
making suggestions to remove these conflicts.
A formal concept of a formal context ´ µ
Å Á is defined
The tools described above offer extensive merging func-
as a pair ´ µ
with , Å, ¼ and ¼ .
tionalities, most of them based on syntactic and semantic
matching heuristics, which are derived from the behaviour of
The sets and are called the extent and the intent of the
formal concept ´ µ
. The subconcept–superconcept rela-
ontology engineers when confronted with the task of merg-
tion is formalized by ½ ½ ´ µ ´
¾ ¾ ´µ ½ µ ing ontologies. OntoMorph and Chimarea use a descrip-
´
¾ ´µ ½ µ
¾ The set of all formal concepts of a
tion logics based approach that influences the merging pro-
cess locally, e. g. checking subsumption relationships be-
context à together with the partial order is always a com- tween terms. None of these approaches offers a structural de-
plete lattice,1 called the concept lattice of à and denoted by
´ µ
à .
scription of the global merging process. FCA–M ERGE can
be regarded as complementary to existing work, offering a
A possible confusion might arise from the double use of structural description of the overall merging process with an
the word ‘concept’ in FCA and in ontologies. This comes underlying mathematical framework.
from the fact that FCA and ontologies are two models for The work closest to our approach is described in [Schmitt
the concept of ‘concept’ which arose independently. In order and Saake, 1998 ]. They apply Formal Concept Analysis to
to distinguish both notions, we will always refer to the FCA a related problem, namely database schema integration. As
concepts as ‘formal concepts’. The concepts in ontologies in our approach, a knowledge engineer has to interpret the
are referred to just as ‘concepts’ or as ‘ontology concepts’. results in order to make modeling decisions. Our technique
There is no direct counter-part of formal concepts in ontolo- differs in two points: There is no need of knowledge acquisi-
gies. Ontology concepts are best compared to FCA attributes, tion from a domain expert in the preprocessing phase; and it
as both can be considered as unary predicates on the set of ob- additionally suggests new concepts and relations for the tar-
jects. get ontology.
3 Related Work 4 Bottom-Up Ontology Merging
A first approach for supporting the merging of ontologies is As said above, we propose a bottom-up approach for ontol-
described in [Hovy, 1998 ]. There, several heuristics are de- ogy merging. Our mechanism is based on application-specific
scribed for identifying corresponding concepts in different instances of the two given ontologies Ç ½ and Ǿ that are to
ontologies, e.g. comparing the names of two concepts, com- be merged. The overall process of merging two ontologies is
paring the natural language definitions of two concepts by depicted in Figure 1 and consists of three steps, namely (i)
linguistic techniques, and checking the closeness of two con- instance extraction and computing of two formal contexts à ½
cepts in the concept hierarchy. and à ¾ , (ii) the FCA-M ERGE core algorithm that derives a
The OntoMorph system [Chalupsky, 2000 ] offers two common context and computes a concept lattice, and (iii) the
kinds of mechanisms for translating and merging ontologies: generation of the final merged ontology based on the concept
lattice.
1 Our method takes as input data the two ontologies and a
I. e., for each set of formal concepts, there is always a greatest
common subconcept and a least common superconcept. set of natural language documents. The documents have to
3. 1
1 5 The FCA–M ERGE Method
Linguistic
Processing 1 In this section, we discuss the three steps of FCA–M ERGE in
FCA- Lattice more detail. We illustrate FCA–M ERGE with a small exam-
Merge Exploration
Linguistic
new ple taken from the tourism domain, where we have built sev-
Processing 2
eral specific ontology-based information systems. Our gen-
eral experiments are based on tourism ontologies that have
2 2 been modeled in an ontology engineering seminar. Differ-
Figure 1: Ontology Merging Method ent ontologies have been modeled for a given text corpus on
the web, which is provided by a WWW provider for tourist
information. 2 The corpus describes actual objects, like loca-
tions, accommodations, furnishings of accommodations, ad-
be relevant to both ontologies, so that the documents are de- ministrative information, and cultural events. For the scenario
scribed by the concepts contained in the ontology. The doc- described here, we have selected two ontologies: The first on-
uments may be taken from the target application which re- tology contains 67 concepts and 31 relations, and the second
quires the final merged ontology. From the documents in , ontology contains 51 concepts and 22 relations. The under-
we extract instances. The mechanism for instance extraction lying text corpus consists of 233 natural language documents
is further described in Subsection 5.1. It returns, for each on- taken from the WWW provider described above. For demon-
stration purposes, we restrict ourselves first to two very small
subsets ǽ and Ǿ of the two ontologies described above;
tology, a formal context indicating which ontology concepts
appear in which documents.
and to 14 out of the 233 documents. These examples will
The extraction of the instances from documents is neces- be translated in English. In Subsection 5.3, we provide some
sary because there are usually no instances which are already examples from the merging of the larger ontologies.
classified by both ontologies. However, if this situation is
given, one can skip the first step and use the classification of 5.1 Linguistic Analysis and Context Generation
the instances directly as input for the two formal contexts. The aim of this first step is to generate, for each ontology
The second step of our ontology merging approach com-
Ç ¾ ½¾ , a formal context à ´ µ
Å Á . The set
of documents is taken as object set ( ), and the set
prises the FCA–M ERGE core algorithm. The core algorithm
of concepts is taken as attribute set (Å ). While these
merges the two contexts and computes a concept lattice from
sets come for free, the difficult step is generating the binary
the merged context using FCA techniques. More precisely, it
computes a pruned concept lattice which has the same degree
relation Á . The relation ´ µ
Ñ ¾ Á shall hold whenever
document contains an instance of Ñ.
of detail as the two source ontologies. The techniques ap-
The computation uses linguistic techniques as described
plied for generating the pruned concept lattice are described
in the sequel. We conceive an information extraction-based
in Subsection 5.2 in more detail.
approach for ontology-based extraction, which has been im-
Instance extraction and the FCA–M ERGE core algorithm plemented on top of SMES (Saarbr¨ cken Message Extrac-
u
are fully automatic. The final step of deriving the merged tion System), a shallow text processor for German (cf. [Neu-
ontology from the concept lattice requires human interaction. mann et al, 1997 ]). The architecture of SMES comprises
Based on the pruned concept lattice and the sets of relation a tokenizer based on regular expressions, a lexical analysis
names ʽ and ʾ , the ontology engineer creates the con- component including a word and a domain lexicon, and a
cepts and relations of the target ontology. We offer graphical chunk parser. The tokenizer scans the text in order to identify
means of the ontology engineering environment OntoEdit for boundaries of words and complex expressions like “$20.00”
supporting this process. or “Mecklenburg–Vorpommern”, 3 and to expand abbrevia-
tions.
For obtaining good results, a few assumptions have to be The lexicon contains more than 120,000 stem entries and
met by the input data: Firstly, the documents have to be rel- more than 12,000 subcategorization frames describing infor-
evant to each of the source ontologies. A document from mation used for lexical analysis and chunk parsing. Further-
which no instance is extracted for each source ontology can more, the domain-specific part of the lexicon contains lexical
be neglected for our task. Secondly, the documents have entries that express natural language representations of con-
to cover all concepts from the source ontologies. Concepts cepts and relations. Lexical entries may refer to several con-
which are not covered have to be treated manually after our cepts or relations, and one concept or relation may be referred
merging procedure (or the set of documents has to be ex- to by several lexical entries.
panded). And last but not least, the documents must sepa- Lexical analysis uses the lexicon to perform (1) morpho-
rate the concepts well enough. If two concepts which are logical analysis, i. e. the identification of the canonical com-
considered as different always appear in the same documents, mon stem of a set of related word forms and the analysis
FCA-M ERGE will map them to the same concept in the target of compounds, (2) recognition of named entities, (3) part-of-
ontology (unless this decision is overruled by the knowledge speech tagging, and (4) retrieval of domain-specific informa-
engineer). When this situation appears too often, the knowl-
2
edge engineer might want to add more documents which fur- URL: http://www.all-in-all.com
3
ther separate the concepts. a region in the north east of Germany
4. Accommodation
Root_1
Root_2
Vacation
Hotel_1
Concert
Musical
Hotel_2
Hotel
Event
Hotel
Root
Root
Á½ Á¾ Accommodation_2 Event_1
¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢
¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢
doc1 doc1
Concert_1
¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢
doc2 doc2
¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢
doc3 doc3 Musical_2
¢ ¢ ¢ ¢ ¢
doc4 doc4
¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢
doc5 doc5
¢ ¢ ¢ ¢ ¢
doc6 doc6
Vacation_1
¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢
doc7 doc7
¢ ¢ ¢ ¢ ¢ ¢ ¢
doc8 doc8
¢ ¢ ¢ ¢ ¢ ¢ ¢
doc9 doc9
¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢
doc10 doc10
¢ ¢ ¢ ¢ ¢
doc11 doc11
¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢
doc12 doc12
¢ ¢ ¢ ¢ ¢ ¢ ¢
doc13 doc13
doc14 doc14 Figure 3: The pruned concept lattice
Figure 2: The two contexts à ½ and à ¾ as result of the first
step We will not compute the whole concept lattice of à , as it
would provide too many too specific concepts. We restrict
the computation to those formal concepts which are above
tion. While steps (1), (2), and (3) can be viewed as standard
at least one formal concept generated by an (ontology) con-
for information extraction approaches, step (4) is of specific
cept of the source ontologies. This assures that we remain
interest for our instance extraction mechanism. This step as-
within the range of specificity of the source ontologies. More
sociates single words or complex expressions with a concept
precisely, the pruned concept lattice is given by Ô Ã ´ µ
from the ontology if a corresponding entry in the domain-
specific part of the lexicon exists. For instance, the expression ´ µ ´ µ
¾ Ã ´
Ñ¾Å Ñ ¼ Ñ ¼¼ µ ´ µ
. For
“Hotel Schwarzer Adler” is associated with the concept Ho- our example, the pruned concept lattice is shown in Figure 3.
tel. If the concept Hotel is in ontology Ç ½ and document It consists of six formal concepts. Two formal concepts of
contains the expression “Hotel Schwarzer Adler”, then the the total concept lattice are pruned since they are too specific
relation ( ,Hotel) ¾Á½ holds. compared to the two source ontologies.
Finally, the transitivity of the × -relation is compiled The computation of the pruned concept lattice is done with
into the formal context, i. e. Ñ ¾Á and Ñ × Ò im- ´ µ the algorithm T ITANIC [Stumme et al, 2000 ]. It is slightly
plies ´ µ
Ò ¾Á . This means that if ( ,Hotel) ¾Á ½ holds modified to allow the pruning. Compared to other algorithms
for computing concept lattices, T ITANIC has — for our pur-
and Hotel × Accommodation, then the document
also describes an instance of the concept Accommodation: pose — the advantage that it computes the formal concepts
( ,Accommodation) ¾Á½ . via their key sets (or minimal generators). A key set is a min-
Figure 2 depicts the contexts à ½ and à ¾ that have been imal set of attributes which generates a given formal concept.
I. e., Ã Å is a key set for the formal concept ´ µ
if and
generated from the documents for the small example ontolo-
gies. E. g., document doc5 contains instances of the con- ´
only if à ¼ à ¼¼ µ ´ µ and ´ ¼ ¼¼
µ ´ µ
for all
cepts Event, Concert, and Root of ontology Ç ½ , and à with Ã.
Musical and Root of ontology Ç ¾ . All other documents In our application, key sets serve two purposes. Firstly,
contain some information on hotels, as they contain instances they indicate if the generated formal concept gives rise to a
of the concept Hotel both in ontology Ç ½ and in ontology new concept in the target ontology or not. A concept is new
Ǿ . if and only if it has no key sets of cardinality one. Secondly,
the key sets of cardinality two or more can be used as generic
5.2 Generating the Pruned Concept Lattice names for new concepts and they indicate the arity of new
The second step takes as input the two formal contexts à ½ relations.
and à ¾ which were generated in the last step, and returns
a pruned concept lattice (see below), which will be used as
5.3 Generating the new Ontology from the
input in the next step. Concept Lattice
First we merge the two formal contexts into a new formal While the previous steps (instance extraction, context deriva-
context à , from which we will derive the pruned concept lat- tion, context merging, and T ITANIC) are fully automatic, the
tice. Before merging the two formal contexts, we have to derivation of the merged ontology from the concept lattice
disambiguate the attribute sets, since ½ and ¾ may con- requires human interaction, since it heavily relies on back-
tain the same concepts: Let Å ¼ Ñ Ñ ¾ Å , ´ µ ground knowledge of the domain expert.
for ¾ ½¾ . The disambiguation of the concepts allows the The result from the last step is a pruned concept lattice.
possibility that the same concept exists in both ontologies, but From it we have to derive the target ontology. Each of the
is treated differently. For instance, a Campground may be formal concepts of the pruned concept lattice is a candidate
considered as an Accommodation in the first ontology, but for a concept, a relation, or a new subsumption in the target
not in the second one. Then the merged formal context is ob- ontology. In the sequel we describe the strategy which under-
tained by à ŠÁ with ´ ,Å µ
Ž ž ,
¼ ¼
lies this derivation process. Of course, most of the technical
and ´ ´Ñ µµ
¾Á ¸ Ñ ¾Á. ´ µ details are hidden from the user.
5. There is a number of queries which may be used to focus for the name of the new concept, or for the concepts which
on the most relevant parts of the pruned concept lattice. We should be linked with the new relation. Only those key sets
discuss these queries after the description of the general strat- with minimal cardinality are considered, as they provide the
egy — which follows now. shortest names for new concepts and minimal arities for new
As the documents are not needed for the generation of the relations, resp.
target ontology, we restrict our attention to the intents of the For instance, the formal concept in the middle of Fig-
formal concepts, which are sets of (ontology) concepts of the ure 3 has Hotel 2, Event 1 , Hotel 1, Event 1 ,
source ontologies. For each formal concept of the pruned and Accommodation 2, Event 1 as key sets. The user
concept lattice, we analyze the related key sets. For each for- can now decide if to create a new concept with the default
mal concept, the following cases can be distinguished: name HotelEvent (which is unlikely in this situation), or
1. It has exactly one key set of cardinality 1. to create a new relation with arity (Hotel, Event), e. g., the
2. It has two or more key sets of cardinality 1. relation organizesEvent.
3. It has no key sets of cardinality 0 or 1. Key sets of cardinality 2 serve yet another purpose:
4. It has the empty set as key set. 4 ѽ Ѿ being a key set implies that neither Ñ ½ × Ñ¾ nor
Ѿ × Ñ½ currently hold. Thus when the user does not use
The generation of the target ontology starts with all concepts a key set of cardinality 2 for generating a new concept or re-
being in one of the two first situations. The first case is the lation, she should check if it is reasonable to add one of the
easiest: The formal concept is generated by exactly one on- two subsumptions to the target ontology. This case does not
tology concept from one of the source ontologies. It can show up in our small example. An example from the large
be included in the target ontology without interaction of the ontologies is given at the end of the section.
knowledge engineer. In our example, these are the two formal There is exactly one formal concept in the fourth case (as
concepts labeled by Vacation 1 and by Event 1. the empty set is always a key set). This formal concept gives
In the second case, two or more concepts of the source on- rise to a new largest concept in the target ontology, the Root
tologies generate the same formal concept. This indicates concept. It is up to the knowledge engineer to accept or to
that the concepts should be merged into one concept in the reject this concept. Many ontology tools require the existence
target ontology. The user is asked which of the names to of such a largest concept. In our example, this is the formal
retain. In the example, this is the case for two formal con- concept labeled by Root 1 and Root 2.
cepts: The key sets Concert 1 and Musical 2 gen-
Finally, the is a order on the concepts of the target ontology
erate the same formal concept, and are thus suggested to
can be derived automatically from the pruned concept lattice:
be merged; and the key sets Hotel 1 , Hotel 2 , and
If the concepts ½ and ¾ are derived from the formal concepts
Accommodation 2 also generate the same formal con-
cept.5 The latter case is interesting, since it includes two con-
´ µ ´ µ
½ ½ and ¾ ¾ , resp., then ½ × ¾ if and only if
cepts of the same ontology. This means that the set of docu- ½ ¾ (or if the user explicitly modeled it based on a key
set of cardinality 2).
ments does not provide enough details to separate these two
concepts. Either the knowledge engineer decides to merge Querying the pruned concept lattice. In order to support the
the concepts (for instance because he observes that the dis- knowledge engineer in the different steps, there is a number
tinction is of no importance in the target application), or he of queries for focusing his attention to the significant parts of
adds them as separate concepts to the target ontology. If there the pruned concept lattice.
are too many suggestions to merge concepts which should be Two queries support the handling of the second case (in
distinguished, this is an indication that the set of documents which different ontology concepts generate the same formal
was not large enough. In such a case, the user might want to ´ µ
concept). The first is a list of all pairs Ñ ½ Ѿ ¾ ½ ¢ ¾
re-launch FCA–M ERGE with a larger set of documents. with ѽ ¼ Ѿ ¼ . It indicates which concepts from the
When all formal concepts in the first two cases are dealt different source ontologies should be merged.
with, then all concepts from the source ontologies are in- In our small example, this list contains for instance the pair
cluded in the target ontology. Now, all relations from the two (Concert 1, Musical 2). In the larger application (which
source ontologies are copied into the target ontology. Possi- is based on the German language), pairs like (Zoo 1, Tier-
ble conflicts and duplicats have to be resolved by the ontology park 2) and (Zoo 1, Tiergarten 2) are listed. We de-
engineer. cided to merge Zoo [engl.: zoo] and Tierpark [zoo], but
In the next step, we deal with all formal concepts covered not Zoo and Tiergarten [zoological garden].
by the third case. They are all generated by at least two con- The second query returns, for ontology Ç with ¾ ½¾ ,
cepts from the source ontologies, and are candidates for new
ontology concepts or relations in the target ontology. The de-
´
the list of pairs Ñ Ò ¾ µ ¢ with Ñ ¼ Ò ¼ . It
helps checking which concepts out of a single ontology might
cision whether to add a concept or a relation to the target on- be subject to merge. The user might either conclude that some
tology (or to discard the suggestion) is a modeling decision, of these concept pairs can be merged because their differen-
and is left to the user. The key sets provide suggestions either tiation is not necessary in the target application; or he might
4
This implies (by the definition of key sets) that the formal con- decide that the set of documents must be extended because it
cept does not have another key set. does not differentiate the concepts enough.
5
Root 1 and Root 2 are no key sets, as each of them has In the small example, the list for Ç ½ contains only the pair
a subset (namely the empty set) generating the same formal concept. (Hotel 1, Accommodation 1). In the larger application,
6. we had additionally pairs like (R¨umliches, Gebiet) and
a two contexts and the computation of the pruned concept lat-
(Auto, Fortbewegungsmittel). For the target applica- tice; and the semi-automatic ontology creation phase which
tion, we merged R¨umliches [spatial thing] and Gebiet
a supports the user in modeling the target ontology. The pa-
[region], but not Auto [car] and Fortbewegungsmittel per described the underlying assumptions and discussed the
[means of travel]. methodology.
The number of suggestions provided for the third situation Future work includes the closer integration of the FCA–
can be quite high. There are three queries which present only M ERGE method in the ontology engineering environment
the most significant formal concepts. These queries can also O NTO E DIT. In particular, we will offer views on the pruned
be combined. concept lattice based on the queries described in Subsec-
Firstly, one can fix an upper bound for the cardinality of the tion 5.3.
key sets. The lower the bound is, the fewer new concepts are The evaluation of ontology merging is an open issue [Noy
presented. A typical value is 2, which allows to retain all con- and Musen, 2000 ]. We plan to use FCA–M ERGE to generate
cepts from the two source ontologies (as they are generated independently a set of merged ontologies (based on two given
by key sets of cardinality 1), and to discover new binary rela- source ontologies). Comparing these merged ontologies us-
tions between concepts from the different source ontologies, ing the standard information retrieval measures as proposed
but no relations of higher arity. If one is interested in having in [Noy and Musen, 2000 ] will allow us to evaluate the per-
exactly the old concepts and relations in the target ontology, formance of FCA–M ERGE.
and no suggestions for new concepts and relations, then the On the theoretical side, an interesting open question is the
upper bound for the key set size is set to 1. extension of the formalism to features of specific ontology
Secondly, one can fix a minimum support. This prunes all languages, like for instance functions or axioms. The ques-
formal concepts where the cardinality of the extent is too low tion is ( ) how they can be exploited for the merging process,
(compared to the overall number of documents). The default and ( ) how new functions and axioms describing the inter-
is no pruning, i. e., with a minimum support of 0 %. It is also play between the source ontologies can be generated for the
possible to fix different minimum supports for different car- target ontology.
dinalities of the key sets. The typical case is to set the min-
imum support to 0 % for key sets of cardinality 1, and to a References
higher percentage for key sets of higher cardinality. This way [Chalupsky, 2000 ] H. Chalupsky: OntoMorph: A translation
we retain all concepts from the source ontologies, and gen- system for symbolic knowledge. Proc. KR ’00.
erate new concepts and relations only if they have a certain [Ganter and Wille, 1999 ] B. Ganter, R. Wille: Formal Con-
(statistical) significance. cept Analysis: mathematical foundations. Springer.
Thirdly, one can consider only those key sets of cardinal- [Hovy, 1998 ] E. Hovy: Combining and standardizing large-
ity 2 in which the two concepts come from one ontology each. scale, practical ontologies for machine translation and
This way, only those formal concepts are presented which other uses. Proc. 1st Intl. Conf. on Language Resources
give rise to concepts or relations linking the two source on- and Evaluation, Granada.
tologies. This restriction is useful whenever the quality of [Kifer et al, 1995 ] M. Kifer, G. Lausen, J. Wu: Logical foun-
each source ontolology per se is known to be high, i. e., when dations of object-oriented and frame-based languages.
there is no need to extend each of the source ontologies alone. Journal of the ACM 42.
In the small example, there are no key sets with cardinal- [McGuinness et al, 2000 ] D. L. McGuinness, R. Fikes, J.
ity 3 or higher. The three key sets with cardinality 2 (as
given above) all have a support of ½½ ½ ± . In the
Rice, and S. Wilder: An environment for merging and
testing large Oontologies. Proc. KR ’00.
larger application, we fixed 2 as upper bound for the cardinal- [Neumann et al, 1997 ] G. Neumann, R. Backofen, J. Baur,
ity of the key sets. We obtained key sets like (Telefon 1 M. Becker, C. Braun: An information extraction core
[telephone], ¨ffentliche Einrichtung 2 [public in-
O system for real world German text processing. Proc.
stitution]) (support = 24.5 %), (Unterkunft 1 [accom- ANLP-97, Washington.
modation], Fortbewegungsmittel 2 [means of travel]) [Noy and Musen, 2000 ] N. Fridman Noy, M. A. Musen:
(1.7 %), (Schloß 1 [castle], Bauwerk 2 [building]) PROMPT: algorithm and tool for automated ontology
(2.1 %), and (Zimmer 1 [room], Bibliothek 2 [library]) merging and alignment. Proc. AAAI ’00.
(2.1 %). The first gave rise to a new concept Tele- [Schmitt and Saake, 1998 ] I. Schmitt, G. Saake: Merging in-
fonzelle [public phone], the second to a new binary rela- heritance hierarchies for database integration. Proc.
tion hatVerkehrsanbindung [hasPublicTransportCon- CoopIS’98. IEEE Computer Science Press.
nection], the third to a new subsumption Schloß × [Stumme et al, 2000 ] G. Stumme, R. Taouil, Y. Bastide, N.
Bauwerk, and the fourth was discarded as meaningless. Pasquier, L. Lakhal: Fast computation of concept lat-
tices using data mining Ttechniques. Proc. KRDB ´00,
6 Conclusion and Future Work CEUR-Workshop Proc. http://sunsite.informatik.rwth-
aachen.de/Publications/CEUR-WS/
FCA–M ERGE is a bottom-up technique for merging ontolo- [Wille, 1982] R. Wille: Restructuring lattice theory: an ap-
gies based on a set of documents. In this paper, we described
proach based on hierarchies of concepts. In: I. Rival
the three steps of the technique: the linguistic analysis of the
(ed.): Ordered sets. Reidel, Dordrecht, 445–470.
texts which returns two formal contexts; the merging of the