Gesture-controlled applications are typically tied to specific gestures, and also tied to specific recognition methods and specific gesture-detection devices. We propose a concern-separation architecture, which mediates the following concerns: gesture acquisition; gesture recognition; and gestural control. It enables application developers to respond to gesture-independent commands, recognized using plug-in gesture-recognition modules that process gesture data via both device-dependent and device-independent data formats and callbacks. Its feasibility is demonstrated with a sample implementation.
Separating Gesture Detection and Application Control Concerns with a Multimodal Architecture
1. Separating Gesture Detection and Application Control
Concerns with a Multimodal Architecture
Luís Fernandes et al.
INESC TEC & UTAD - PORTUGAL
2. Problem
“Most gestures are neither natural nor easy to learn or
remember. (…) Even the simple headshake is puzzling when
cultures intermix”
D.A. Norman (2010)
8. Final thoughts
Results are very preliminary
Open issues and topics for reflection:
Access to low-level data
Diverse set of data formats
9. Bringing User Experience empirical data to gesture-control
and somatic interaction in virtual reality videogames:
an Exploratory Study with a multimodal interaction prototype
google it in December!
Subsequent paper – accepted for publication this November
Good morning everyone,
My name is Luís Fernandes and I’m here to present a Multimodal Architecture to separate gesture detection and application control concerns.
The proposal herein was developed in the context of a corporate-funded innovation project, called InMERSE.
In so-called ‘natural’ user interfaces, gestural interaction with the user environment is a forefront element.
They purport to be natural by leveraging users’ pre-existing skills. However, that’s quite not the case, since the meaning associated with the gestures vary across cultures, social groups and sometimes even from person to person.
So current gestural commands methods for applications will likely become obsolete
In order to prevent this from happening, our contribution is an architecture proposal which separates concerns: gesture acquisition, gesture recognition and gestural control.
Today, there are more and more low-cost computational devices and sensors available. Gestural interaction devices are one of the cases.
But they’re all quite different, when it comes to the form of detection and data structure. Some use image processing, others, such as Myo, use electromyography.
To produce a concrete implementation of the architecture presented, we employed two distinct gesture-acquisition devices: leap motion and the recent Kinect 2.Leap Motion samples the space above at regular intervals to detect the position of forearms, hands and fingers.
Whereas Kinect 2 in spite of acting in a similar way, it targets the full body. Meaning, their data structure is different from each other. Each one of devices API or SDK ends up being different.
The core structure of this architecture is then the separation of 3 concerns: gesture acquisition, gesture recognition and application commands.
In this picture we can see that different gesture acquisition devices (leap motion and Kinect 2) are interfaces by device specific Adapter modules;
Multiple Decoder Modules are plugged-in to provide gesture recognition services tuned to different requirements
And the core Framework modules provides the intermediation and abstraction services and enables Application modules to react to abstract
Commands rather than the specific gestures that elicited them.
This architecture provides applications access to three different kinds of data:
Commands – which are gesture independent information
Gestures – simply a transparent access to Decoder’s output
Basic Data – or transparent access to the framework data structures containing the gestural data.
We have developed a prototype implementation of the InMERSE Architecture to ascertain its feasibility.
We store the basic data coming from each device in a Frame Buffer.
Decoders will then try to detect gestures based on that data.
And finally, based on a set of configurations, gestures are mapped to commands, which applications can listen to.
We also included an extra scenario. We considered the need for an application to perform continuous hand-tracking operations, such as dragging virtual items or pointing
at virtual elements. Thus, the framework might operate in two different modes: Acquisition or Detection mode.
To test the operation of the implemented prototype, we developed a digital signage application, which can be used seamlessly with
Leap motion or Kinect 2 gesture acquisition devices. The user can pan, zoom in and out using hand gestures.
“The usage/efficiency of this prototype architecture needs to be tested in depth, as the results are very preliminary”
There are still open issues and topics for reflection, requiring further work.
Access to low-level data seems to defeat the purpose of separation of concerns – and it does.
However, it means an application can use the command-based approach whenever possible, and only be tied to lower level data when that is unavoidable.
While at the moment we are simply considering skeleton data with joint positions, and using traditional polymorphism to enable diverse data formats, in the future
We should enable a more diverse set of data formats, using techniques such as ontology representation.