1. Using ontology for natural language processing
Cr˘c˘oanu Constantin Sergiu
a a
January 21, 2012
Abstract
Natural language processing is represented by a set of methods and
techniques used to mediate the human-machine communication. To make
this possible we have to define a communication format and software able
to analyse, understand and give appropriate response. For the commu-
nication level a formal representation of the knowledge it is needed and
this can be represented by ontology.
Keywords: natural language processing, ontology, artificial intelligence
1
2. 1 Introduction
Ontology is defined as representing knowledge in a formal model and is based on
conceptualization; conceptualization of a knowledge area must be understood as
objects, concepts plus other entities that are assumed to exist and the relations
that exist among them.
Depending on the purpose, context, coverage and the way that are used, on-
tology can be general, middle or specific.
Natural language processing is considered to be a sub-field of artificial intelli-
gence and has the main goal of making systems smart enough to make inferences
and respond with correct and complete answer when requested by a user. Using
ontologies in natural language processing is a relatively new part of artificial in-
telligence.
An artificial neural network is a computational model inspired by biological neu-
ral networks that is able to learn and it is used to solve problems that need an
answer based on previous experience of the system.
2 Technical
The NLP view described in this article uses a conjunction of general and specific
ontologies. Basically there are two methods to create an ontology: from scratch
or using already existing ontologies. There are at least three ways of combining
ontologies: inclusion, restriction and refinement.
Our approach has three parts:
• a general ontology based on lexemes is needed. Suggested Upper Merged
Ontology (SUMO) is currently the best candidate because its domain forms
the largest formal public ontology in existence today and it is the only for-
mal ontology that has been mapped to all of the Wordnet lexicon.WordNet
is a large lexical database of English. Nouns, verbs, adjectives and adverbs
are grouped into sets of cognitive synonyms (synsets), each expressing a
distinct concept.
• a middle or specific level ontology must be used.
2
3. • and the program that is a mediator between human language and machine
language by using the two types of ontologies.
3 Architecture
The next diagram shows the relation between human language, ontologies and
natural language processing.
Human knowledge is mapped to a middle or specific ontology. This ontology
will use general ontology when needed. The NLP chooses the correct ontology
for the current domain and apply the corresponding algorithm. Then the knowl-
edge is translated into Machine Language. It is well agreed today that NLP
has not yet reached its goal, to make machine understand human language by
drawing inferences, but for now we receive an answer to our current request and
sometimes computers seem smart.
4 How to be used
This section describes two concrete cases of using ontologies with natural lan-
guage processing.
First example is using ontologies for automatically translating from one language
to other(for example from German to English). For this four ontologies can be
used :
3
4. • ontology mapped for language A lexicon
• ontology with grammar rules for lexicon A
• ontology mapped for language B lexicon
• ontology with grammar rules for lexicon B
• ontology containing a dictionary that maps language A to B
These ontologies are merged to form a new ontology. An algorithm to combine
and use the resulting ontology is also needed. The program that implements this
algorithm would have as an input a text in language A and as an output the
corresponding text in language B.
The resulting ontology can be completed by using artificial neural network :
after each training translation performed the system can learn new rules that
must be taken into account.
A second example for using ontology is an automatic speech recognition and
generation system. This system can be used for example in automotive industry
or in banking services. One ontology is needed to generate two grammars, one
for speech recognition and the second one for speech generation. The system
puts questions to the user, gives or receives answers and executes commands.
5 How to extend ontologies
To completely define our solution for mediating human-machine communication
we need to define also a way of extending ontologies. According to [Jingshan
Huang et al.] for defining the semantics of an ontology concept three elements
must be determined: concept’s name, properties and relationships.
The proposed solution for extending ontologies is based on artificial neural net-
work. Every ontology it is represented by a directed graph G. Every graph it
is represented in a plan of it’s own, nodes are horizontal connected in the same
type of ontology(general, middle or specific), but these graphs are also vertical
connected, specific ontology is based on middle ontology and middle ontology
uses general ontology.
4
5. Figure 1: there are three plans, one for the general ontology, one for the
middle ontologies and one for specific ontologies
Graph description:
G = {VG , EG }
VG is the nodes set; every node has two views :
• it represents a concept: it has a name
• it is a perceptron: all inferences about this concept are represented by
the formula Σi=1,n xi wi ck where xi is the input for the ith input, wi is the
weight of this input and ck is the context; ith , wi ∈ [0, 1] and ck ∈ {0, 1};
the learning rule is wi = wi + [T − A] ∗ xi and T is the correct result
that the neuron should have shown, A is the actual output of the neuron;
ck = 1 only if there is a number of inferences ≥ Θ that influence each
other
EG is the edges set; every edge represents a property or a relationship.
Properties and relationships are the equivalent of inferences which are grouped
into subsets that influence each other.
Until now we have defined an artificial neuronal network but the main purpose
5
6. is to be able to extend the ontology and this is done by using training sessions.
After each session the knowledge grows and training can stop when the trained
system is smart enough for a specified set of requirements.
6 Conclusions
Natural language processing can successfully use ontologies to mediate human-
machine communication. The final goal for this research domain is to transform
natural language processing into natural language understanding by the machine.
A complete natural language understanding must be able to:
• Paraphrase an input text
• Translate text into another language
• Answer questions about the contents of the text
• Draw inferences from the text
The first three objectives have relatively been accomplished but the fourth re-
mains only a concept that might become reality if NLP uses ontologies for con-
structing inferences.
References
[1] Dr. Elizabeth D. Liddy, Natural Language Processing Encyclopedia of Li-
brary and Information Science: Second Edition DOI: 10.1081/E-ELIS-
120008664
[2] Dario Bianchi and Agostino Poggi, Ontology Based Automatic Speech
Recognition and Generation for Human-Agent Interaction, University of
Modena and Reggio Emilia, Italy, June 14-June 16, ISBN: 0-7695-2183-
5
[3] Ru-Yng Chang, Chu-Ren Huang, Feng-Ju Lo, Sueming Chang, From Gen-
eral Ontology to Specialized Ontology: A study based on a single author
historical corpus
[4] Jingshan Huang, Jiangbo Dang, Jose M. Vidal, Michael N. Huhns, Ontology
Matching Using an Artificial Neural Network to Learn Weights
6