3. Vision-Based UI
Technologies
•Head and face tracking
•Face recognition
•Gaze tracking
•Face expression
analysis
•Lip reading
•Gesture recognition
•Body tracking
Applications
•Keyboardless UI
•Speech assistance
•Games
•Social interfaces
•Avatar control
•Virtual environments
User Properties
•Presence
•Location
•Identity
•Expression
•Gesture
•Focus of attention
4. Issues
• Is the event-based model appropriate?
• What is a perceptual event?
• Is there a useful, reliable subset?
• Non-deterministic events
• Future progress (expanding the event set)
• Input/output modalities? (vision, speech,
haptic, taste, smell?)
5. Issues (cont.)
• Allocation of resources
• Multiple goal management
• Training, calibration
• Quality and control of sensors
• Environment restrictions
6. Why PUIs?
• Transfer of natural, social skills
• People already anthropomorphize
technology
• Computer technology will be pervasive
(ubiquitous), not just “PCs” - device-
oriented interfaces often not appropriate
• Examples: Lifelike characters (Peedy), MS
Bob (boo!), Office Asst., MS Agent...
7. Future of human-computer
environments
• Move away from “glorified typewriter”
model (master-slave relationship)
• New physical and social environments for
computing (not just for engineers!)
• GUI, mouse and keyboard will be less
useful, too constraining
• Ubiquitous computing environment
(EasyLiving?)
8. How do people interact?
• Peer-peer or employer-employee relationship
• Modalities
– Touch (haptic)
– Word only (text: books, letters)
– Voice only (speech: phone, radio) (non-speech sound:
music, misc. sounds)
– Visual (video: movies, TV)
– Face-to-face (all of the above)
9. Relevant technologies
• Speech/sound recognition
• Natural language processing
• Dialog management
• Vision (recognition and tracking)
• User modeling
• Haptic I/O
• Graphics/animation/visualization
• Text-to-speech
• Sound generation
• Locomotion
• Multi-modal interfaces
AVSP is just one (small) intersection of
interesting/useful technologies!!
10. VBUI research growth
• CVPR 1991
– 3 out of 146 papers
– blocks, chairs, airplanes
• CVPR 1997
– 30 out of 172 papers
– faces, heads, hands, bodies… (and blocks,
chairs, airplanes)
11. Hands-Free PC
Head Control
Attention Modeling
Disability I
Aware Scroll Bar
3D View w/ Parallax
Hand Control
Dictation + Hands
Game Control I
Awareness
“MS Hello”
Speech Aid I
PUI
Face Analysis
Face Chat
Speech Aid II
Hands-Free PC
Hands-Plus PC
Hands-Off Web Browser
Disability II
speech input
12. Wall PC
Hands-Free PC
Arm / Body Gestures
“Peedy” w/ sight
Visual Conductor
Body Chat
Game Control II
Gestures w/ Props
Multiple People in View
Wall PC
MS Home I
Game Control III
Body + Hand + Face Chat
Vision at a Distance
13. Immersive Computing
Wall PC
Tracking People
Vision At An Angle
Multiple Cameras /
Controllable Cameras
Group Chat
Immersive Computing
MS Home II
14. Evolution of User Interfaces
2000s ??? ???
When Paradigm Implementation
1950s None Switches, punched cards
1970s Typewriter Command-line interface
1980s Desktop Graphical UI (GUI)
15. The Next Big Thing in UI?
• Multimodal UI
– naturally co-occurring modalities
• Tangible UI
– coupling of physical objects and digital data
• Immersive environments
– Wearable computers, VR, AR, smart rooms...
• Post-WIMP interaction techniques
– 2D/3D widgets, sketching, gestures...
• Natural or SILK Interfaces
– speech, image, language, knowledge base
16. Evolution of User Interfaces
2000s Natural Interaction Perceptual UI (PUI)
When Paradigm Implementation
1950s None Switches, punched cards
1970s Typewriter Command-line interface
1980s Desktop Graphical UI (GUI)
17. Perceptual User Interfaces
• Goal: For people to be able to interact with
computers in a similar fashion to how they
interact with each other and with the physical
world
• Bidirectional - both human and machine
perception
• Integrate speech, vision, language, haptics, UI,
dialog management, learning,
Highly interactive, multimodal interfaces modeled after
natural human-to-human interaction
18. What is This Good For?
• Both control and awareness
• Redundant channels for heterogeneous users, tasks
• Ubiquitous computing scenarios
• Affective and social interfaces
– Transfer of natural social skills - easy to learn
– People already anthropomorphize technology (Reeves & Nass)
• Augmenting human-human communication
• Back channels of communication (e.g., nodding, “hmm”)
• Leverage human capabilities
– People perceive and do multiple things at once
• Focus on activity, not just tasks
19. Examples
Control
• Speech and pen
gesture
• Two-handed input
• Text and pen gesture
• Gesture and gaze
• Speech, gesture,
keyboard
Context
• Human presence and identity
• Backchannel info
• Facial expression
• Level of interest, engagement
• Focus of attention
25. What Your Brain Does
Almost certain to be Bill Clinton
Dark circular overlay
Gray hair
Neck
Right ear
Woman’s dress suit
Armani suit
White shirt
Left eye (open)
CNN caption
(Washington 1995?)
Clinton occluding
Monica
Person
contour
Person with
glasses in crowd
Nose
Cheek
Monica’s mouth
(smiling)
Lapel
Necklace
Right eye (open)
Dark brown hair
Pony tail
Clinton greeting Lewinsky
Monica Lewinsky
Illuminated
from above
28. Perceptual User Interfaces
• Goal: For people to be able to interact with computers in a
similar fashion to how they interact with each other and
with the physical world
• Bidirectional - both human and machine perception
• Integrate speech, vision, language, haptics, UI, dialog
management, learning, perception/cognitive abilities…
• Emphasis on transparent, passive sensing where
appropriate
Highly interactive, multimodal interfaces modeled after
natural human-to-human interaction
29. Examples
Control
• Speech and pen gesture
• Two-handed input
• Text and pen gesture
• Gesture and gaze
• Speech, gesture, keyboard
• Body gesture and speech
Context
• Human presence and identity
• Backchannel info
• Facial expression
• Level of interest, engagement
• Focus of attention
31. Vision Based Interfaces (VBI)
• Visual cues are important in communication!
• Useful visual cues
– presence
– location
– identity (and age, sex, nationality, etc.)
– facial expression
– body language
– attention (gaze direction)
– gestures for control and communication
– lip movement
– activity
33. Elements of VBI
Head tracking
Gaze tracking
Lip reading
Face recognition
Facial expression
Hand tracking
Hand gestures
Arm gestures
Body tracking
Activity analysis
34. VBI Research Projects
• User tracking (“draping”)
• Appearance-based gesture recognition
• 3D articulated body tracking
Common themes:
– fast
– software only
– interactive