This paper examines the appropriateness of natural language dialogue (NLD) with assistive robots. Assistive robots are defined in terms of an existing human-robot interaction taxonomy. A
decision support procedure is outlined for assistive technology
researchers and practitioners to evaluate the appropriateness of
NLD in assistive robots. Several conjectures are made on when
NLD may be appropriate as a human-robot interaction mode.
On Natural Language Dialogue with Assistive Robots
1. On Natural Language Dialogue with Assistive Robots
Vladimir A. Kulyukin
Computer Science Assistive Technology Laboratory
Department of Computer Science
Utah State University
vladimir.kulyukin@usu.edu
ABSTRACT paper, this claim is examined in the context of assistive robotics.
This paper examines the appropriateness of natural language In examining the NLD claim, we do not argue either for or against
dialogue (NLD) with assistive robots. Assistive robots are defined NLD per se. Rather, our objective is to develop a decision support
in terms of an existing human-robot interaction taxonomy. A procedure that may be helpful to the assistive robotics community
decision support procedure is outlined for assistive technology and the AT community at large in evaluating the appropriateness
researchers and practitioners to evaluate the appropriateness of of NLD in prospective assistive robotic applications.
NLD in assistive robots. Several conjectures are made on when Our paper is organized as follows. First, assistive robots are
NLD may be appropriate as a human-robot interaction mode. defined in terms of an existing HRI taxonomy. Second, the NLD
claim is stated and the main arguments for and against it are
Categories and Subject Descriptors examined. Third, the NLD claim is critiqued in the context of
H.1.2 [Models and Principles]: User/Machine Systems – human assistive robotics. Fourth, a decision support procedure is outlined
factors. for evaluating the appropriateness of NLD in assistive robots.
Finally, several conjectures are made on when NLD may be
appropriate as an HRI mode.
General Terms
Performance, Design, Experimentation, Human Factors.
2. WHAT IS AN ASSISTIVE ROBOT?
Before we can discuss the appropriateness of NLD in assistive
Keywords robots, we need to define what is meant by an assistive robot. One
assistive technology, natural language dialogue, assistive robotics. way to define a concept is to place that concept in an existing
taxonomy. We will use the HRI taxonomy developed by Yanco
1. INTRODUCTION and Drury [46].
Human-robot interaction (HRI) is an active research area whose Yanco and Drury [46] postulate eight categories: 1) autonomy
findings are of significance to assistive robotics. Robotic solutions level, 2) amount of intervention, 3) human robot ratio, 4) type of
have now been implemented in wheelchair navigation [45], shared human robot interaction, 5) decision support for operators,
hospital delivery [40], microsurgery [44], robot-assisted 6) criticality, 7) time/space, and 8) composition of robot teams.
navigation [1, 6, 27, 28], care for the elderly [34], and life support
partners [24]. In each of these tasks, the effectiveness of HRI is The autonomy level measures the percentage of time that the
critical. robot operates independently. The amount of intervention
measures the amount of time that the robot's operator operates the
An important question facing many assistive technology (AT) robot. Both categories are real-valued in the range from 0 to 1.
researchers and practitioners today is what is the best way for The human robot ratio is a non-reduced fraction of the number of
humans to interact with assistive robots. As more robotic devices human operators over the number of robots. The type of shared
find their way into healthcare and rehabilitation, it is imperative human robot interaction elaborates the human robot ratio through
to find ways for humans to effectively interact with those devices. a finite set of symbolic values, e.g., one human - one robot, one
Many researchers have argued that natural language dialogue human - robot team, one human - multiple robots, human team -
(NLD) is a promising HRI mode [12, 25, 32, 40, 42]. In this one robot, etc. The decision support category defines information
provided for operators for decision support. This category has the
following symbolically valued subcategories: available sensors,
Permission to make digital or hard copies of all or part of this work for available sensor information, type of sensor fusion, and type of
personal or classroom use is granted without fee provided that copies are pre-processing. Criticality estimates how critical it is for a robot
not made or distributed for profit or commercial advantage and that to complete a specific task. A critical task is defined to be one
copies bear this notice and the full citation on the first page. To copy where a failure affects the life of a human. Criticality can be
otherwise, or republish, to post on servers or to redistribute to lists, high, medium, or low. The time/space category describes HRI
requires prior specific permission and/or a fee.
based on whether humans and robots interact in the same time
HRI’04, March 2–3, 2006, Salt Lake City, Utah, USA.
Copyright 2006 ACM 1-58113-000-0/00/0004…$5.00.
(synchronous) or not (asynchronous) and at the same space
2. (collocated) or not (non-collocated). Finally, the composition of of the possibly steep learning curves required for many GUI-
robot teams can be homogeneous or heterogeneous. If the robot based
team is heterogeneous, the available robot types can be further
specified.
Where in this taxonomy can we place assistive robots? To answer
this question, we first need to determine which categories should
be used. We believe that two categories in the proposed taxonomy
may not be appropriate for our purposes: decision support and
composition of robot teams. Our main concern with the decision
support category is that it is underspecified: the types of available
sensors and sensor fusion algorithms used in assistive robots vary
a great deal and can be classified only at a very generic level.
Moreover, since sensor data pre-processing is determined by
sensor fusion algorithms, it cannot be given an independent value.
Although composition of robot teams is well defined, it may not
be applicable to assistive robots for practical reasons: we are not Figure 1: NLD Architecture.
aware of a single AT system in which a team of assistive robots is
used. This is not to say, of course, that such teams are impossible
in principle and will not be deployed in the future. approaches. Second, it is argued that, from the practical point of
Let us now attempt to describe assistive robots in terms of the view, GUI-based approaches are appropriate in environments
remaining six categories. Since an assistive robot is designed to where the operator has access to a monitor, a keyboard, or some
provide support to a disabled individual [6, 28, 45] or to a other hardware device, e.g., an engineer monitoring a mining
caregiver [40, 44], it is unlikely to be fully autonomous. More robot or an astronaut monitoring a space shuttle arm. In some
often than not, it is designed to receive directions from a human, environments, however, access to hardware devices is either not
complete the tasks specified by those directions, report on its available or impractical. For example, an injured person in an
progress, and wait for more directions. Thus, its autonomy level is urban disaster area is unlikely to communicate with a rescue robot
less than 1 and the amount of human intervention is greater than through a GUI. Thus, NLD in robots interacting with humans has
0. The majority of existing assistive robots have a human-robot practical benefits. Finally, an argument is made that building
ratio of 1: one human operating one robot. For some assistive robots capable of NLD yields insights into human cognition [19,
robots, e.g., Hygeirobot, a hospital delivery robot [40], the ratio 25, 42]. Unlike the first two arguments made by the NLD camp,
can be more than 1. In principle, many nurses can request this argument does not address either the practicality or the
Hygeirobot to deliver certain medicines to different rooms. naturalness of NLD, the most important objective being a
However, this type of ratio is an exception, not the rule. Thus, the cognitively plausible integration of language, perception, and
types of HRI in assistive robots are either one human - one robot action. Although one may question the appropriateness or
(more frequent) or human team - one robot (rare). The criticality feasibility of the third argument in a specific context, the
of assistive robots is high for obvious reasons. Finally, since argument, in and of itself, appears to be sound insomuch as the
assistive robots share the time and space with their human objective is to test a computational realization of some cognitive
operators, HRI with assistive robots is synchronous and theory. Consequently, below we will focus only on the first two
collocated. arguments.
Regardless of which of the three arguments they advance in
3. The NLD Claim defense of NLD, many researchers appear to show a consensus on
The essence of the NLD claim, stated simply, is as follows: NLD what the generic architecture of the NLD system should be. The
is a promising HRI mode in robots capable of collaborating with consensus architecture is shown in Figure 1. While the names of
humans in dynamic and complex environments [32, 40, 42]. The individual components vary from system to system, their overall
claim is often challenged on methodological grounds: if there is a functionality is the same or similar in all systems. The differences
library of robust routines that a robot can execute, why bother that exist among NLD systems have to do with how different
with NLD at all? Why not create a graphical user interface (GUI) components are realized, communication protocols among the
to that library that allows human operators to invoke various components, and the overall distribution of functionality among
routines by pointing and clicking? Indeed, many current the components.
approaches to HRI are GUI-based [10, 15, 36]. The robot's
abilities are expressed in a GUI through graphical components. To The Speech Recognition module converts phonemes to symbols.
cause the robot to take an action, an operator sends a message to Symbols are given to the natural language processing (NLP)
the robot through a GUI event. The robot sends feedback to the module, which performs their syntactic and/or semantic analysis
GUI and the interaction cycle repeats. and gives the computed representation to the dialogue
management system (DMS).
The NLD proponents respond to this challenge with three
arguments. First, it is argued that, to humans, language is the most The DMS makes decisions as to what action should be performed
readily available means of communication. If the robot can do by the robot. If the robot is to perform an action, a representation
NLD, the human can interact with the robot naturally, without any of that action is given to the robot hardware for execution. If the
input from the NLP component requires a response, the DMS
3. determines whether the response can be generated from the affecting the cognitive adequacy of the disabled, may lead to
current internal state or requires a query to the robot hardware. In pronounced speech defects that ASR cannot handle. Sometimes
the former case, the DMS generates an appropriate representation training may not be feasible, because it is difficult, if not
of the response and sends it to the natural language generation impossible, to find a representative sample of the target users. For
(NLG) module; in the latter case, the DMS sends a query to the example, it is difficult to collect a representative sample from the
robot hardware, waits for the response, translates the response population of the visually impaired users of a robotic guide
into an appropriate representation and gives that representation to deployed at an airport [26]. In other contexts, the incentives for
the NLG module. Finally, the NLG component converts the input the target user to undergo training may be a serious issue. For
representation into symbols that are given to the Speech Synthesis example, a computer savvy nurse may prefer a GUI-based
module for vocalization. interface to a hospital delivery robot simply because she is
already more comfortable with GUIs and must use them anyway.
4. A CRITIQUE OF THE NLD CLAIM The same observation is applicable to many office delivery robots
Before we offer our critique of the NLD claim, we would like to [32]. In a typical office environment, many office workers have
reiterate that the critique should be taken only in the context of access to personal computers. If there is one thing that these
assistive robotics. In examining the NLD claim and its arguments, people know how to do, it is pointing and clicking. Thus, it may
we will start with the computational realization of the claim in the well be more convenient for them to request the robot to bring
consensus architecture and, after identifying its weaknesses, we them cups of coffee through a simple GUI client on their desktops
will analyze what impact those weaknesses have on the arguments than through an NLD with the robot.
in defense of the NLD claim.
We believe that there are three main weaknesses in the NLD 4.2 Speech Synthesis
architecture: the uni-directional link between speech recognition Speech remains the primary output mode in NLD systems [38].
and NLP, the uni-directional link between NLG and speech However, as recent research in environmental sonification shows,
synthesis, and the bi-directional link between the DMS and the speech beacons have drawbacks. Tran, Letowski, and Abouchacra
robot hardware. In Figure 1, the weak links are labeled with [43] show experimentally that speech beacons are harder to
question mark icons. localize than non-speech beacons. Kulyukin et al. [29] suggest
that visually impaired users do not always choose speech beacons
when presented with an opportunity to select from a set of speech
4.1 Speech Recognition and non-speech beacons to specify a navigation event. For
From the perspective of assistive robotics, there are two problems example, in one of our experiments, several visually impaired
posed by speech recognition: speech recognition errors and users opted for a non-speech beacon that consisted of a short
shared vocabulary. Speech recognition errors present a serious audio file with the sound of water bubbles to signify the presence
safety concern for AT applications. An automatic speech of a water fountain.
recognition (ASR) system, such as Microsoft's SAPI 5.1 [3] or
IBM's ViaVoice [2], may well average 95 to 97 percent accuracy Since non-speech beacons remain a relatively new research area,
in dictation experiments, where user training is available and the unanswered questions abound. One definite advantage of speech
consequences of misrecognized words are easily absorbed. beacons is that they do not need to be learned, whereas non-
However, an assistive robot that misrecognizes 5 out of 100 speech beacons must be learned by the user prior to using the
commands is a definite risk, because high criticality is not system. On the other hand, non-speech beacons appear to require
guaranteed. less cognitive processing than speech beacons [43] and are easier
to perceive in noisy environments [29, 41]. It is also harder to
We learned this criticality lesson from our experiments with a engage in a dialogue with a third party when one has to attend to
robotic guide for the blind [28]. When we first started working on frequent speech beacons produced by the system [39].
the robotic guide, we thought of using ASR as a means for the
visually impaired to interact with the robot [41]. We quickly Finally, there appears to be a dearth of statistically valid data on
discovered that many speech recognition errors occur when the the perception of speech vs. non-speech beacons. While small
person guided by the robot stops and engages in conversation samples of participants in published audio perception studies are
with someone [29]. Since speech recognition runs continuously, easily explained through funding constraints, the absence of a
some phrases said by the guided person to the interlocutor were sound experimental design framework for evaluating audio
erroneously recognized as route directives, which caused the perception that is accepted by most HRI researchers is a
robot to start moving in a wrong direction. Several times the ASR conceptual and practical hurdle.
system on our robotic guide interpreted coughs and throat clearing
sounds as directives to go to different locations. 4.3 DMS and Robot Hardware
Since NLD always happens in a context, the robot and the user It is often stated that NLD-capable robots are intended to be used
refer to that context through a shared vocabulary [16]. To put it by people with little or no computing experience [40]. Even if one
differently, the human must know the vocabulary used by the is to assume that the problems with shared vocabulary, speech
robot. The acquisition of the shared vocabulary poses another recognition, and speech perception are surmountable in a
challenge to the AT community: do the target users have particular application, hardware reliability remains a serious
sufficient cognitive and physical abilities and, if they do, do they concern. For the NLD architecture to operate at human rates, the
have any incentives to undergo the necessary training? Certain DMS system must be aware of the state of the robot hardware at
cognitive disabilities rule out the use of speech from the start. all times. The degree of self awareness is conditional on the
Certain physical disabilities, e.g. spinal cord injuries, while not degree of hardware reliability. The less reliable the robot
4. hardware, the less probable it is that the internal state of the DMS how do robots actually interact with people? To answer this
accurately reflects the actual state of the robot hardware. RHINO question, it is natural to look at some robots who have been made
[9], a robotic museum guide briefly deployed at the Deutsches to interact with humans in various contexts. Our sample of
Museum in Bonn, Germany, is a good example of how even the applications cannot possibly cover the full spectrum of research
most advanced and sophisticated hardware architectures offer no on interactive social robots. For a more comprehensive survey,
guarantee against run time failures. RHINO runs 20 parallel the reader is referred to [11, 14].
processes on 3 on-board PCs and 2 off-board SUN workstations
Perzanowski et al. [37] present their research at the Naval
connected via a customized Ethernet-based point-to-point socket
Research Laboratory (NRL) on interpreting gestures and spoken
communication protocol. Even with these high software and
natural language commands with their mobile robot. The robot,
hardware commitments, RHINO reportedly experienced six
called Coyote, interprets such commands as “turn 30 degrees
collisions over a period of forty-seven hours, although each tour
left/right,” “turn to my left/right,” etc. The speech recognition
was less than ten minutes long [9].
system used in Coyote is the SSI PE 200. The authors conclude
Lack of self awareness stems from the NLD design practices that numbers are difficult to understand and pose major problems
prevalent in HRI. The first design practice, adopted in Hygeirobot for automatic speech recognition. They conclude with a hope for a
[40], develops all of the NLD components on a simulated robot more sophisticated and advanced speech recognition system to
hardware, postponing the integration and field tests with the continue their work on integrating natural language and gesture.
actual hardware until later. Under the second design
Huttenrauch et al. [22] present a longitudinal study of one user
methodology, adopted in HARUNOBU-6 [35], a robotic travel
interacting with an office delivery robot. The paper states that the
aid to guide the visually impaired, and RHINO [9], the NLD
speech interface is the primary interaction mode when the robot is
capabilities are added after the robotic hardware is designed,
in close proximity of the user. When the user and the robot are not
developed, and deployed. Both practices increase the probability
collocated in the same task space, the user controls and monitors
that the DMS and the robot hardware can be out of sync at run
the robot via a graphical interface. However, in the actual
time. Under the first practice, the NLD designer must make
evaluation of the system, the speech interface is not used at all.
assumptions about the robot hardware. Under the second practice,
The close proximity communication is realized through a portable
the NLD designer is forced to work with an API to the robot
robot-control interface on an iPAQ Personal Digital Assistant
hardware that may not be sufficiently detailed to discover run-
(PDA). It is stated that the portable interface was created for the
time problems with the hardware.
robot to be instructed in close proximity to the user without using
Another cause of weak self awareness may lie in the dialogue a speech interface. No explanation is offered as to why speech
management techniques used in the DMS component. These was not used in the actual experiments.
techniques can be divided into three broad groups: state-based
Huttenrauch and Eklundh [21] present an HRI study with a
[33], frame-based [20], and plan-based [4]. All of them are largely
service robot operating in an office environment. The robot's task
deterministic. Consequently, one may question their
is to bring coffee cups to a mobility impaired user. The robot's
appropriateness for describing the inherently stochastic nature of
task is to navigate to a kitchen area and to request a coffee cup to
robotic hardware. Probabilistic techniques have been attempted
be filled and placed on a tray mounted on top of the robot.
[34], but no evaluation data are given describing their
Participants in the experiments are requested to detect the robot
performance.
and listen to its spoken request for filling a cup with coffee. If a
We became aware of the problem of weak self awareness during participant decides to help, he or she fills a cup from a thermos
navigation experiments with our robotic guide for the blind. The and puts the filled cup on the robot's tray. While speech-based
cause of the problem was always the mismatch between the NLD is mentioned as a possibility, speech is used only on the
verbalized intent of the robot and the robot's actual actions. output. To actually communicate with the robot, the participant
During several trial runs, the robot informed the navigator that it must locate and press a button on the robot for confirmation that
had started making a u-turn after it had already started executing the coffee is ready for delivery.
the maneuver. Although the robot's message accurately described
Billard [5] describes Robota, a mini-humanoid doll robot. The
the robot's intention, it caused visible discomfort for the blind
robot is used for educational purposes. The objective is to teach
navigators. At several T-intersections the robot would tell the
the students how to create robots with social skills. Speech is
navigator that it was turning left (or right) and then, due to the
emphasized as an important social skill. The author mentions
presence of people, it would start drifting in the opposite direction
IBM's ViaVoice speech recognition system and Microsoft's SAPI.
before actually making a turn. When that happened, we observed
It is not clear from the article which one was actually used or if
that several blind navigators pulled hard on the robot's handle,
they were both used. It is stated that, to be an effective
sometimes driving the robot to a virtual halt. We conjecture that
educational tool, Robota must recognize a number of key words
when a communication mismatch occurs, i.e., when the robot
and key phrases and generate sensible answers to queries.
starts doing something other than what it said it would do, the
However, no statistics, either descriptive or inferential, are given
human navigators become apprehensive and try to stop the robot.
on the usability of speech as an input or output mode.
4.4 How do robots interact with people? Montemerlo et al. [34] describe the robot Pearl, which acts as a
As assistive technology researchers, we are interested, first and robotic guide for the elderly. In the reported experiment, 6 elderly
foremost, in effective and safe interaction between a disabled people were guided to different locations at an assisted living
person and an assistive device. Effective and safe interaction must institution. The researchers report problems both with speech
eliminate or, at the very least, considerably reduce ambiguity. So, synthesis and speech recognition, but do not give any statistical
5. breakdown on the types of errors and on how they affected the practically impossible. If that is the case, some
robot's performance or the human's perception of the robot. augmented communication techniques may be more
Gockley et al. [17] describe Valerie the Roboceptionist, a appropriate than NLD.
stationary robotic receptionist that resides in a small booth near
• Does speech misrecognition undermine the criticality
the main entrance of a hall at Carnegie Mellon University and
of the assistive robot? If the answer to this question is
engages in NLD with people entering the hall. People can ask
affirmative, speech is out of the question. The
Valerie for office numbers and directions. While Valerie uses
affirmative answer to this question led us to switch from
speech on the output, the input is restricted to the keyboard. The
wearable microphones to wearable keypads in our
researchers state that the keyboard was chosen over speech,
robotic guide for the blind [28].
because the keyboard input is easier to control and more reliable
than a typical speech recognition system. • Can speech misrecognition be overcome through
user training or a hardware device? This seemingly
Breazeal et al. [7] report several experiments with their Leonardo straightforward question is not easy to answer. The
platform in which human subjects guide the robot to perform ability to achieve convergence through training an ASR
different tasks using speech and gesture. The basic tasks the system varies from individual to individual [23]. The
subjects are required to perform with the robot are: teaching the most reliable way of achieving convergence is to go
robot the names and locations of different buttons placed in front through the training and make the decision on the basis
of the robot, checking to see if the robot knows the names of the of obtained results. In practice, however, this frequently
buttons, asking the robot to push the buttons, and telling the robot turns out to be an expensive option. Several hardware
that the task is done. The speech understanding system uses solutions can reduce the ambiguity caused by speech in
Sphinx-4 [31] with a limited grammar to parse incoming phrases. noisy environments. One hardware solution that may
The subjects are taught the list of allowed phrases prior to reduce ambiguity in noisy environments is a push-to-
interacting with the system, which eliminates the shared speak button. To speak to the robot the user must press
vocabulary problem from the start. The list includes greetings, a button. The push-to-speak option may eliminate some
button labels, requests to point or press the buttons, and ambiguity caused by continuous speech with a third
acknowledgments of task completions. The researchers mention party, coughs, or throat clearings. Another hardware
speech recognition failures as one of the two common sources of solution is the use of the keyboard.
error, the first being the failures of the vision system to recognize
a gesture. No statistics are given on speech recognition failures, • Can the target user acquire shared vocabulary? The
probably because the focus of the research is on the effects of answer to this question has to do with the expertise level
non-verbal communication on the efficiency of task completion. of the user and the expertise level of the vocabulary.
Ideally, the expertise level at which the robot
We can make several generic observations on NLD in robots from vocabulary is designed should match the expertise level
these investigations. First, speech as an NLD input mode is used of the user.
only when the person interacting with the robot is one of the
developers or when the person receives training prior to • Is the use of NLD economically justified? In many
interaction. Training, however, does not guarantee against run AT contexts, especially those that involve caregivers,
time errors. Second, speech is used in laboratory environments e.g., assisted living homes and hospitals, there are
where nuisance variables, e.g., human traffic and background already well established procedures and processes for
noise, are easily controlled. Third, robots that work with people doing things. Before going ahead with NLD and
over extended periods of time in real environments either do not redesigning the existing information infrastructure, it is
use NLD at all or do not use speech as an NLD input mode. worth pondering if that infrastructure can be used
Language input is solicited through GUIs or keyboards. Fourth, without disruptive modification. The more disruptive
there is a dearth of statistically sound studies on the actual use of the modification, the more the prospective caregiver is
NLD dialogues. Presented evidence is mostly descriptive and likely to resist.
anecdotal. • How reliable is the robot hardware? Special care is
needed with proof-of-concept devices fresh out of
5. DECISION SUPPORT PROCEDURE research labs. It is not enough for the robot to perform
Given these observations, how can an AT researcher or well enough as a prototype. A good place to start is to
practitioner evaluate the appropriateness of NLD in an assistive evaluate the hardware in terms of standard task-oriented
robot? Below we outline a decision support procedure as a series metrics for HRI [13].
of questions that can be asked at the beginning of an assistive
• Are speech beacons appropriate? As we noted above,
robotics project to determine if a given assistive robot should be
in some situations speech may not be an ideal output
dialogical. This procedure is to be viewed as a step toward a
mode for a dialogue between the user and the assistive
comprehensive, living document that will be of value to the AT
robot. In the absence of a sound audio perception design
community and will be subsequently refined and expanded
framework, the most reliable way of obtaining the
through future research efforts.
answer is pilot experiments.
• Does the target user have any disabilities prohibiting
the use of natural language? As was stated above, 6. THREE CONJECTURES
certain cognitive and physical disabilities render the use In this section, we will attempt to generalize our discussion and
of natural language, especially spoken language, speculate when NLD might be appropriate for HRI in general. We
6. subscribe to the view that the ultimate objective of robotics is the scene. Even if one does not accept the evolution argument, one is
production of effective servants, not alternative life forms [18]. likely to agree that for language to be justified our robots must
The robot must do what its user tells it to do when the user tells it first become physically and cognitively interesting to us.
to do so, not when the robot itself considers it fit to do so. Viewed
under this bias, the relevant question is whether NLD is the most
7. CONCLUSIONS
effective means of telling the robot what to do and receiving
We examined the claim that NLD is a promising HRI mode in the
feedback from the robot.
context of assistive robotics. We defined assistive robots in terms
To speculate on this question, let us postulate three real-valued of an existing HRI taxonomy. Based on the results of our analysis,
variables: A, I, and P. A measures how autonomous the robot is, I we outlined a preliminary decision support procedure for AT
measures how much the human operator must intervene in the researchers and practitioners who need to evaluate the
actual operation of the robot, and P measures how much potential appropriateness of NLD in assistive robots. Finally, we offered
exists for NLD. Let us further agree that all variables are in the several conjectures about when NLD may be appropriate as an
closed range from 0 to 1. Let us now make and discuss a few HRI mode in general.
conjectures about P.
Conjecture 1: As A goes to 1, P goes to 0.
8. ACKNOWLEDGMENTS
The author would like to acknowledge that this research has been
This conjecture states that the closer the robot is to full autonomy, supported, in part, through NSF CAREER grant (IIS-0346880)
the smaller the potential for NLD. In the extreme case of a factory and two Community University Research Initiative (CURI) grants
assembly robot, there is no need for NLD. Generally speaking, the (CURI-04 and CURI-05) from the State of Utah.. The author
more the robot is capable of making and executing its own would like to thank the participants of the 2005 AAAI Spring
decisions autonomously, the less we need to interact with it Workshop on Dialogical Robots at Stanford University for their
through NLD. Full autonomy is currently achievable only in comments on his presentation “Should Rehabilitation Robots be
restricted environments. A fully autonomous robot, by definition, Dialogical?” The present paper has benefited a great deal from
does routine, repetitive work that a human is not interested in the workshop discussions. The author is grateful to four
doing. anonymous reviewers for their comments.
Conjecture 2: As I goes to 1, P goes to 0.
9. REFERENCES
This conjecture states that as the level of human intervention [1] Guido. In http://www.haptica.com/walker.html. Haptica
increases, the potential for NLD decreases. Language is a slow Corporation.
medium, because producing and interpreting utterances takes
time. A larger amount of human intervention implies that the [2] ViaVoice, IBM ViaVoice Software. In
operator is involved in the decision making process of the robot. http://www3.ibm.com/software/speech/. IBM Corporation.
This, in turn, implies that the operator is likely to have a formal [3] SAPI Speech sdk 5.1. In
model of what the robot is doing and why. Formal models are http://www.microsoft.com/speech/downloads/sdk51/.
expressed in formal languages for which natural language is not Microsoft Corporation.
appropriate due to its ambiguity and vagueness. There is a reason
[4] Allen, J., Byron, D., Dzikovska, M., Ferguson, G., Galescu,
why the Mars Rover [30] did not have an NLD component: it is
L., and Stent, A. Towards conversational human-computer
always faster for a knowledgeable operator to interact with the
interaction. AI Magazine , 22(4):27-37, 2001.
underlying computational model directly rather than through a
slow intermediary such as natural language. [5] Billard, A. Robota: Clever toy and educational tool. Robotics
and Autonomous Systems, 42:259--269, 2003.
Conjecture 3: P > 0 , when A ∈ [a1 ,1 − δ 1 ] , where [6] Borenstein, J. and Ulrich, I. The guidecane - a computerized
0 < δ 1 < 1 , 0 < a1 < 1 − δ 1 , and I ∈ [a 2 ,1 − δ 2 ] , where travel guide for the active guidance of blind pedestrians. In
Proceedings of the IEEE International Conference on
0 < δ 2 < 1, 0 < a2 < 1 − δ 2 . Robotics and Automation. IEEE, 1994.
[7] Breazeal, C., Kidd, C., Thomaz, A., Huffman, G. and Berlin,
This conjecture states that the potential for NLD exists when the M. Effectiveness of nonverbal communication on efficiency
robot is partially autonomous and requires some human and robustness in human-robot teamwork. In Proceedings of
intervention. Obviously, A and I are inversely related. But how the IEEE International Conference on Intelligent Robots and
high should A be for NLD to be justified? We conjecture that A Systems (IROS), pp. 383--388. IEEE/RSJ, July 2005.
should be at 0.5 or above. In other words, the robot should be
[8] Brooks, R. Elephants don't play chess. In Designing
physically competent. Unfortunately, that does not seem
Autonomous Agents: Theory and Practice from Biology to
sufficient. The robot's physical competence should also be
Engineering and Back, Pattie Maes, Edtr., pp. 3-15.
cognitively interesting for the human. It is not all that exciting to
MIT/Elsevier, 1991.
have a dialogue with a robot that can only make turns and change
rotational and translational velocities. As argued by Brooks [8], [9] Burgard, W., Cremers, A., Fox, D., Hahnel, D., Schultz, D.
language may have been a late arrival on the road of evolution. Steiner, W., and Thrun, S. Experiences with an interactive
Living forms had been working for millennia on their basic museum tour-guide robot. Artificial Intelligence, 114(3):3--
physical and perceptual skills before language appeared on the 55, 1999.
7. [10] Cheng, G. and Zelinsky, A. Supervised autonomy: A [24] Krikke, J. Robotics research exploits opportunities for
framework for human-robot systems. Autonomous Robots, growth. Pervasive Computing, pp. 7-10, July-September
3(10):251-266, 2001. 2005.
[11] Dautenhahn, K., Woods, S., Kaouri, C., Walters, M.., Koay, [25] Kulyukin, V. Human-robot interaction through gesture-free
K., and Werry, I. What is a robot companion - friend, spoken dialogue. Autonomous Robots, 3(16):239-257, 2004.
assistant or butler? In Proceedings IEEE/RSJ International [26] Kulyukin, V., Gharpure, C., De Graw, N., Nicholson, J., and
Conference on Intelligent Robots and Systems, pp. 1488- Pavithran, S. A robotic guide for the visually impaired in
1493. Edmonton, Canada, 2005. indoor environments. In Conference of the Rehabilitation
[12] Fitzgerald, W. and Firby, R.J. Dialogue systems require a Engineering and Assistive Technology of North America
reactive task architecture. In Proceedings of the American (RESNA). RESNA, 2004.
Association for Artificial Intelligence (AAAI) Spring [27] Kulyukin, V., Gharpure, C., and Nicholson, J. Robocart:
Symposium. AAAI, Palo Alto, CA 2000. Toward robot-assisted navigation of grocery stores by the
[13] Fong, T., Kaber, D., Lewis, M., Scholtz, J., Schultz, A., and visually impaired. In IEEE/RSJ International Conference on
Steinfeld, A. Common metrics for human-robot interaction. Intelligent Robots and Systems (IROS), pp. 979-984.
In http://citeseer.ist.psu.edu/667916.html. 2000. IEEE/RSJ, July 2005.
[14] Fong, T., Nourbakhsh, I., and Dautenhahn, K. A survey of [28] Kulyukin, V., Gharpure, C., Sute, P., De Graw, N.,
socially interactive robots. Robotics and Autonomous Nicholson, N., and Pavithran, S. A robotic wayfinding
Systems, 42:143-166, 2003. system for the visually impaired. In Proceedings of the
[15] Fong, T. and Thorpe, C. Vehicle teleoperation interfaces. Innovative Applications of Artificial Intelligence (IAAI).
Autonomous Robots, 2(11):9-18, 2001. AAAI, 2004.
[16] Frankie, J., Rayner, M., and Hockey, B. Accuracy, coverage, [29] Kulyukin, V., Sute, P., Gharpure, C., and Pavithran, S.
and speed: What do they mean to users. In Proceedings of Perception of audio cues in robot-assisted navigation. In
the Workshop on Natural Language Interfaces. The Hague, Proceedings of the Conference of the Rehabilitation
The Netherlands, 2000. Engineering and Assistive Technology Society of North
America (RESNA). RESNA, 2004.
[17] Gockley, R., Bruce, A., Forlizzi, J., Michalowski, M.,
Mundell, A., Rosenthall, S., Sellner, B., Simmons, R., [30] JP Laboratory. Mars exploration rover mission. In
Snipes, K., Schultz, A., and Wang, J. Designing robots for http://marsrovers.jpl.nasa.gov/home. NASA.
long-term social interaction. In Proceedings of the IEEE [31] Lamere, P., Kwok, P., Walker, W., Gouvea, E., Singh, R.,
International Conference on Intelligent Robots and Systems Raj, B., and Wolf, P. Design of the cmu sphinx-4 decoder.
(IROS), pp. 2199-2204. IEEE/RSJ, July 2005. In 8th European Conference on Speech Communication and
[18] Goodrich, M. and Olsen, D. Seven principles of efficient Technology (EUROSPEECH), 2003.
human-robot interaction. In Proceedings of the IEEE [32] Matsui, T., Asah, H., Fry, J., Motomura, Y., Asano, F.,
International Conference on Systems, Man, and Cybernetics, Kurita, T., Hara, I., and Otsu, N. Integrated natural spoken
pp. 3943-3948. IEEE, October 2003. dialogue system of jijo-2 mobile robot for office services. In
[19] Horswill, I. Integrating vision and natural language without Proceedings of the American Association for Artificial
central models. In Proceedings of the American Association Intelligence (AAAI) Annual Conference. AAAI, 1999.
for Artificial Intelligence (AAAI) Fall Symposium on [33] McTear, M. Modelling spoken dialogues with state transition
Embedded Language and Action. AAAI, Cambridge, MA diagrams: Experiences of the cslu toolkit. In Proceedings of
1995. the International Conference on Spoken Language
[20] Hulsijn, J., Steetskamp, R., ter Doest, H., van de Burgt, S., Processing, 1998.
and Nijholt, A. Speech sdk 5.1. In Proceedings of the Twente [34] Montemerlo, M., Pineau, J., Roy, N., Thrun, S., and Verma,
Workshop on Language Technology: Dialogue Management V. Experiences with a mobile robotic guide for the elderly.
in Natural Language Systems. University of Twente, The In Proceedings of the Annual Conference of the American
Netherlands, 1996. Association for Artificial Intelligence (AAAI), pp. 587-592.
[21] Huttenrauch, H. and Eklundh, K. To help or not to help a AAAI, 2002.
service robot. In 12th IEEE International Workshop on [35] Mori, H. and Kotani, S.Robotic travel aid for the blind:
Robot and Human Interactive Communication (ROMAN), Harunobu-6. In Proceedings of the Second European
pp. 379-384, 2003. Conference on Disability, Virtual Reality, and Assistive
[22] Huttenrauch, H., Green, A., Norman, M., Oestreicher, L., Technology, 1998.
and Eklundh, K. Involving users in the design of a mobile [36] Olsen, D. and Goodrich, M. Metrics for evaluating human-
office robot. IEEE Transactions on Systems, Man and robot interactions. In Performance Metrics for Intelligent
Cybernetics, 34(2):113-124, 2004. Systems (PERMIS). NIST, September 2003.
[23] Koester, H. User performance with speech recognition: A [37] Perzanowski, D., Schultz, A., and Adams, W. Integrating
literature review. Assistive Technology, 13(2):116--130, natural language and gesture in a robotics domain. In IEEE
2001. International Symposium on Computational Intelligence in
8. Robotics and Automation, pp. 247-252. Gaithersburg, MD, [43] Tran, T., Letowski, T., and Abouchacra, K. Evaluation of
USA, September 1998. acoustic beacon characteristics for navigation tasks.
[38] Raman, T. Auditory User Interfaces: Toward the Speaking Ergonomics, 43(6):807--827, 2000.
Computer. Kluwer Academic Publishers, Norwell, MA, [44] Versweyveld, L. Voice-controlled surgical robot ready to
1997. assist in minimally invasive heart surgery. Virtual Medical
[39] Ross, D. and Blasch, B. Development of a wearable Worlds Monthly, March 1998.
computer orientation system. IEEE Personal and Ubiquitous [45] Yanco, H. Shared User-Computer Control of a Robotic
Computing, 6:49--63, 2002. Wheelchair System. Ph.D. Thesis, Department of Electrical
[40] Spiliotopoulos. D. Human-robot interaction based on spoken Engineering and Computer Science, MIT, Cambridge, MA,
natural language dialogue. In Proceedings of the European 2000.
Workshop on Service and Humanoid Robots, 2001. [46] Yanco, H. and Drury, J. A Taxonomy for Human-Robot
[41] Sute, P. Perception of Audio Cues for the Visually Impaired. Interaction. In Proceedings of the AAAI Fall Symposium on
Masters Thesis, Department of Computer Science, Utah Human-Robot Interaction, pp. 111-119, 2002.
State University, Logan, UT, 2004.
[42] Torrance, M. Natural Communication with Robots. Masters
Thesis, Department of Electrical Engineering and Computer
Science, MIT, Cambridge, MA, 1994.