Speech recognition challenges

Speech Recognition Challenges

Presenter: Alexandru Chica

Contents

Speech User Interface basic concepts

•Speech recognition

•Speech synthesis


•Accuracy

•User responsiveness

•Performance

•Reliability

•Fault tolerance


Speech Recognition

•The translation of spoken text into written text

algorithm

"#'spit&S#" "speech"

•Statistical Processing
Phonetic representation
•Hidden Marcov Models
of speech
•Dynamic Time Warping

Types of speech recognition:
•Command and control
•Dictation


Speech Recognition Components

•Audio input (front-end)
•Grammars – contain commands that can be spoken by the user
•Acoustic models – language dependant, used to “define” the language features
•Recognition algorithms (back-end)

Back end

feature extraction result
Audio input / Acoustic Recognition
Grammars
models algorithms
Front end


Speech Recognition APIs

Microsoft SAPI IBM: Embedded ViaVoice

Nuance: VoCon VoiceBox Speech Recognition


Speech Synthesis

•The translation of written text into spoken text

g2p
"speech" "#'spit&S#"


Speech Synthesis APIs

Microsoft SAPI SoftVoice TTS Apple PlainTalk

Nuance: Vocalizer SVOX TTS eSpeak

Speech User Interface basic concepts - Usage

In car:
•Control media player / radio stations

•Control navigation

•Control phone book and phone activities

•Find POI locations (POI : point of interests)

•E-mail/SMS reading

On the web:
•HTML 5 speech input

•Google Search with voice input

•Reading of web page content

Speech Recognition Challenges – Accuracy

Audio Input

Problem: Audio signal quality
Impact: loss of recognition accuracy

Solution 1: Echo cancellation

Solution 2: Beamforming

Speech Recognition Challenges – Accuracy

Audio Input

Problem: Talk-over problem
Impact: loss of recognition accuracy

Solution: Barge-In

TTS

User

Speech Recognition Challenges – User responsiveness

Speech Recognition

Problem: resources are not ready and user starts to speak the command
Solution: Delayed speech recognition

Resource loading / Back-end processing
Front-end processing

Delayed Speech Recognition

Speech Recognition Challenges – User responsiveness

Speech Recognition

Problem: synchronization with multiple applications (media, phone, navigation)

Solution: apply concurrent design patterns

•Active Object

•Monitor

•Double-checked locking

Speech Recognition Challenges – Performance

Grammars

Use cases:

• Command & Control grammars
• 200 – 500 commands

•Navigation grammars
• 100k+ static data

•Music grammars
• 10k+ dynamic data


Grammars (1)

Problem: Grammar size too big
Impact:
• increased loading times of files from disk to memory

Solution: Grammar optimization
•merging of similar command tokens


Grammars (2)

•removal / replacement of recursion rules


Grammars (3)

Problem: Grammar token collisions
Impact:
• loss of recognition accuracy
Solution:
•replacement of collision prone tokens with synonyms
•adding special pronunciation tokens to collision words

Examples:

sum – sun – sung

bet – bed


Dynamic Grammars

Problem: synchronization with USB devices, phones, navigation databases takes
too much time

Solution 1: implementation of a caching mechanism

Use id3 parser to read from mp3 files
Title: One
titles, artists, composers, genre, album.
Artist: U2,
etc. Album: Achtung Baby,
Genre: rock

...

Phoneme
cache

dynamic transcriptions
grammar add to slot:
title <DYN_TITLE>
artist <DYN_ARTIST>


Dynamic Grammars

Solution 2: split the processing in two, and dispatch part of the work to a different
processor
Use id3 parser to read from mp3 files CPU1
Title: One
titles, artists, composers, genre, album. Artist: U2,
etc. Album: Achtung Baby,
Genre: rock

...

CPU2
CPU1

dynamic CPU2
Preprocessing step
grammar add to slot:
title <DYN_TITLE>
artist <DYN_ARTIST>

Speech Recognition Challenges – Reliability

Reliability - the ability of the system to keep operating over time

Problem: system has to operate correctly over large periods of time

Solution 1: automated tests

Solution 2: drive tests

Speech Recognition Challenges – Fault tolerance

Problem: Recovery from system failures must be possible

Solution:

• system is modeled in a modular manner, with components that
communicate via internal car area network.

• individual components can be restarted without affecting other system
components


TTS & ASR Demo


Questions ?


Thank You

Speech recognition challenges

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Speech recognition challenges

Similar a Speech recognition challenges (20)

Speech recognition challenges