Fusion engines are fundamental components of multimodal interactive systems, to interpret temporal combinations of deterministic as well as non-deterministic inputs whose meaning can vary according to the context, user and task. While various surveys have already been released on the topic of multimodal interactive systems, the current paper focuses on the design, specification, construction and evaluation of fusion engines. The article first introduces the adopted terminology and the major challenges that fusion engines propose to solve. Further, a history of the work achieved in the field of fusion engines is presented according to the main phases of the BRETAM model. A classification of existing approaches for fusion engines is then presented. The classification dimensions include the types of applications, the fusion principles and the temporal aspects. Finally, unsolved challenges, such as software frameworks, quantitative evaluation, machine learning and adaptation, sketch future work in the field of fusion engines.
Fusion Engines for Input Multimodal Interfaces: a Survey
1. Special session on
Multimodal Fusion
• A survey: Fusion Engines for Multimodal Input
• 5 papers
D. Lalanne (Switzerland), L. Nigay (France),
P. Palanque (France), P. Robinson (UK),
J. Vanderdonckt (Belgium)
1
2. Multimodal fusion
• Multimodal fusion for
• Perception
• Interaction
• Focus on multimodal interaction
• 4 papers on multimodal interaction
• 1 paper on multimodal perception
(first one)
2
4. Input Fusion Engines
• Multimodal fusion
• Combining and interpreting data
from multiple input modalities
• Usage of input modalities
Combined
Independent
Sequential Parallel
Alternate
Exclusive
Synergistic
Concurrent
4
5. Input Fusion Engines
• Combined usage (sequential, parallel) why?
• Natural interaction is multimodal by nature.
• The combination of input modalities increases
the bandwidth of the human-computer
interaction.
5
6. Fusion engines
• A very dynamic domain
• ˜15 years of contributions: 1993-2008
6
7. Input Fusion engines
• Some key features
• Multiple and temporal combinations
• Types of data and time synchronization
• Probabilistic inputs
• Non deterministic inputs
• Robustness
• Error handling
• Adaptation to context
• Context = (user, environment, platform)
7
12. Reference Tool/ language/ program
Fusion Time Representation
Application types
Notation Type Level Input Devices
Ambiguity
Resolution
Quantitat
ive Qualitative
B
Bolt [4] Put that here system None None Dialog Speech gesture ? N ? Map manipulation
R Wahlster
Erreur !
Source du
renvoi
introuvable. XTRA None Unification Dialog Keyboard Mouse N Y Map manipulation
Neal [26] Cubricon
Generalized Augmented
Transition Network Procedural Dialog Speech Mouse Keyboard
Proximity-
based N Y Map manipulation
E
Koons [19] No name Parse tree
Frame-
based Dialog Speech, Eye gaze, Gesture
First
solution Y Y 3D World
Nigay [28] Pac-Amodeus Melting Pot
Frame-
based Dialog + low level Speech, Keyboard, Mouse
Context-
based
resolution Y N Flight Scheduling
Cohen [9] Quickset Feature Structure Unification Dialog Pen Voice
S / G & G / S
& N best Y N
Simulation System
training
Bellik [3] MEDITOR None
Frame-
based Dialog + low level Speech Mouse
History
Buffer Y Y Text Editor
Martin [22] TYCOON
Set of processes – Guided
Propagation Networks Procedural Dialog Speech Keyboard Mouse
Probability-
based
resolution Y Y
Edition of graphical
user interfaces
Johnston [18] FST Finite State Automata Procedural Dialog Speech pen
Possible (N
best) Y Y Corporate Directory
T & A Krahnstoever
[20] iMap Stream Stamped
Frame-
based Dialog Speech gesture Not given Y N Crisis Management
Dumas [12] HephaisTK XML Typed (SMUIML)
Frame-
based Dialog Speech Mouse Phidgets First one Y Y Meeting assistants
Holzapfel [17] No Name Typed Feature Structure Unification Dialog Speech gesture N Best list Y N Humanoid Robot
Pfleger [33] PATE XML Typed Unification Dialog Speech pen N Best list Y Y Bathroom design Tool
Milota [25] No Name Multimodal Parse Tree Unification Dialog
Speech Mouse keyboard
Touchscreen S / G & G /S Y N Graphic Design
Melichar [24] WCI
Multimodal Generic Dialog
Node Unification Dialog Speech Mouse Keyboard First One ? ? Multimedia DB
Sun [37] PUMPP Matrix Unification Dialog Speech gesture S / G N Y Traffic Control
Bourguet [7] Mengine Finite State machine Procedural Low level Speech Mouse Not given N Y No example
Latoschik [21] No Name
Temporal Augmented
Transition Network Procedural Dialog Speech gesture
Fuzzy
constraints Y Y Virtual reality
Bouchet [5]
[6]
Mansoux [23]
ICARE
(Input/Output) Melting pot
Frame-
based Dialog + low level
Speech, Helmet visor
HOTAS, Tactile surface,
GPS localization,
Magnetometer, Mouse,
Keyboard
Context-
based
resolution Y N
Aircraft Cockpit,
Authentication, Mobile
Augmented Reality
systems (Game, Post-
it), Augmented Surgery
Navarre [30] Petshop Petri nets Procedural Dialog + low level
Speech mouse Keyboard
Touchscreen *** Y Y Aircraft Cockpit
Flippo [14] No Name Semantic tree Hybrid Dialog
Speech Mouse Gaze
gesture
Feedback
for missing
data Y N Collaborative Map
Portillo [34] MIMUS
Feature Value Structure
(DTAC) Hybrid Dialog Speech Mouse
Knowledgea
ble agent Y N
Duarte [11] FAME Behavioral Matrix Hybrid Dialog Speech Mouse Keyboard Not given ? ? Digital talking Book
12
13. Reference
Tool/
language/
program
Fusion
Time
Representation
Applicatio
n types
Notation Type Level Input Devices
Ambiguity
Resolution
Quantit
ative
Qualita
tive
B Bolt [4] Put that here system None None Dialog Speech gesture ? N ? Map manipulation
R Wahlster XTRA None Unification Dialog Keyboard Mouse N Y Map manipulation
Neal [26] Cubricon
Generalized
Augmented
Transition Network Procedural Dialog Speech Mouse Keyboard Proximity-based N Y Map manipulation
E Koons [19] No name Parse tree Frame-based Dialog Speech, Eye gaze, Gesture First solution Y Y 3D World
Nigay [28] Pac-Amodeus Melting Pot Frame-based Dialog + low level Speech, Keyboard, Mouse
Context-based
resolution Y N Flight Scheduling
Cohen [9] Quickset Feature Structure Unification Dialog Pen Voice S / G & G / S & N best Y N
Simulation System
training
Bellik [3] MEDITOR None Frame-based Dialog + low level Speech Mouse History Buffer Y Y Text Editor
Martin [22] TYCOON
Set of processes –
Guided Propagation
Networks Procedural Dialog Speech Keyboard Mouse
Probability-based
resolution Y Y
Edition of graphical
user interfaces
Johnston [18] FST Finite State Automata Procedural Dialog Speech pen Possible (N best) Y Y Corporate Directory
T & A Krahnstoever [20] iMap Stream Stamped Frame-based Dialog Speech gesture Not given Y N Crisis Management
Dumas [12] HephaisTK
XML Typed
(SMUIML) Frame-based Dialog Speech Mouse Phidgets First one Y Y Meeting assistants
Holzapfel [17] No Name
Typed Feature
Structure Unification Dialog Speech gesture N Best list Y N Humanoid Robot
Pfleger [33] PATE XML Typed Unification Dialog Speech pen N Best list Y Y Bathroom design Tool
Milota [25] No Name
Multimodal Parse
Tree Unification Dialog
Speech Mouse keyboard
Touchscreen S / G & G /S Y N Graphic Design
Melichar [24] WCI
Multimodal Generic
Dialog Node Unification Dialog Speech Mouse Keyboard First One ? ? Multimedia DB
Sun [37] PUMPP Matrix Unification Dialog Speech gesture S / G N Y Traffic Control
Bourguet [7] Mengine Finite State machine Procedural Low level Speech Mouse Not given N Y No example
Latoschik [21] No Name
Temporal Augmented
Transition Network Procedural Dialog Speech gesture Fuzzy constraints Y Y Virtual reality
Bouchet [5] [6]
Mansoux [23]
ICARE
(Input/Output) Melting pot Frame-based Dialog + low level
Speech, Helmet visor HOTAS,
Tactile surface, GPS
localization, Magnetometer,
Mouse, Keyboard
Context-based
resolution Y N
Aircraft Cockpit,
Authentication, Mobile
Augmented Reality
systems (Game, Post-
it), Augmented Surgery
Speech mouse Keyboard 13
14. Special session
Multimodal Fusion
• Content
• A survey
• 5 papers
• Schedule
• 10 mn introduction and survey outlook
• 15 mn per paper + 5 mn questions
• 10 mn for questions on the session
D. Lalanne (Switzerland), L. Nigay (France),
P. Palanque (France), P. Robinson (UK),
J. Vanderdonckt (Belgium)
15. Special session
Multimodal Fusion
• H. Mendonça: Agent-based fusion
• B. Dumas: An evaluation framework to
benchmarck fusion engines
• L. Nigay: CARE-based fusion
• J. Ladry & P. Palanque: Petri net based formal
description and execution of fusion engines
• M. Sezgin: Fusion of speech and facial
expression recognition