3. Speech User Interface basic concepts
Speech Recognition
•The translation of spoken text into written text
algorithm
"#'spit&S#" "speech"
•Statistical Processing
Phonetic representation
•Hidden Marcov Models
of speech
•Dynamic Time Warping
Types of speech recognition:
•Command and control
•Dictation
4. Speech User Interface basic concepts
Speech Recognition Components
•Audio input (front-end)
•Grammars – contain commands that can be spoken by the user
•Acoustic models – language dependant, used to “define” the language features
•Recognition algorithms (back-end)
Back end
feature extraction result
Audio input / Acoustic Recognition
Grammars
models algorithms
Front end
5. Speech User Interface basic concepts
Speech Recognition APIs
Microsoft SAPI IBM: Embedded ViaVoice
Nuance: VoCon VoiceBox Speech Recognition
6. Speech User Interface basic concepts
Speech Synthesis
•The translation of written text into spoken text
g2p
"speech" "#'spit&S#"
7. Speech User Interface basic concepts
Speech Synthesis APIs
Microsoft SAPI SoftVoice TTS Apple PlainTalk
Nuance: Vocalizer SVOX TTS eSpeak
8. Speech User Interface basic concepts - Usage
In car:
•Control media player / radio stations
•Control navigation
•Control phone book and phone activities
•Find POI locations (POI : point of interests)
•E-mail/SMS reading
On the web:
•HTML 5 speech input
•Google Search with voice input
•Reading of web page content
9. Speech Recognition Challenges – Accuracy
Audio Input
Problem: Audio signal quality
Impact: loss of recognition accuracy
Solution 1: Echo cancellation
Solution 2: Beamforming
10. Speech Recognition Challenges – Accuracy
Audio Input
Problem: Talk-over problem
Impact: loss of recognition accuracy
Solution: Barge-In
TTS
User
11. Speech Recognition Challenges – User responsiveness
Speech Recognition
Problem: resources are not ready and user starts to speak the command
Solution: Delayed speech recognition
Resource loading / Back-end processing
Front-end processing
Delayed Speech Recognition
13. Speech Recognition Challenges – Performance
Grammars
Use cases:
• Command & Control grammars
• 200 – 500 commands
•Navigation grammars
• 100k+ static data
•Music grammars
• 10k+ dynamic data
14. Speech Recognition Challenges – Performance
Grammars (1)
Problem: Grammar size too big
Impact:
• increased loading times of files from disk to memory
Solution: Grammar optimization
•merging of similar command tokens
16. Speech Recognition Challenges – Performance
Grammars (3)
Problem: Grammar token collisions
Impact:
• loss of recognition accuracy
Solution:
•replacement of collision prone tokens with synonyms
•adding special pronunciation tokens to collision words
Examples:
sum – sun – sung
bet – bed
17. Speech Recognition Challenges – Performance
Dynamic Grammars
Problem: synchronization with USB devices, phones, navigation databases takes
too much time
Solution 1: implementation of a caching mechanism
18. Speech Recognition Challenges – Performance
Use id3 parser to read from mp3 files
Title: One
titles, artists, composers, genre, album.
Artist: U2,
etc. Album: Achtung Baby,
Genre: rock
...
Phoneme
cache
dynamic transcriptions
grammar add to slot:
title <DYN_TITLE>
artist <DYN_ARTIST>
19. Speech Recognition Challenges – Performance
Dynamic Grammars
Solution 2: split the processing in two, and dispatch part of the work to a different
processor
Use id3 parser to read from mp3 files CPU1
Title: One
titles, artists, composers, genre, album. Artist: U2,
etc. Album: Achtung Baby,
Genre: rock
...
CPU2
CPU1
dynamic CPU2
Preprocessing step
grammar add to slot:
title <DYN_TITLE>
artist <DYN_ARTIST>
20. Speech Recognition Challenges – Reliability
Reliability - the ability of the system to keep operating over time
Problem: system has to operate correctly over large periods of time
Solution 1: automated tests
Solution 2: drive tests
21. Speech Recognition Challenges – Fault tolerance
Problem: Recovery from system failures must be possible
Solution:
• system is modeled in a modular manner, with components that
communicate via internal car area network.
• individual components can be restarted without affecting other system
components