SlideShare una empresa de Scribd logo
1 de 24
Speech Recognition Challenges




                    Presenter: Alexandru Chica
Contents

 Speech User Interface basic concepts

    •Speech recognition

    •Speech synthesis

 Speech Recognition Challenges

    •Accuracy

    •User responsiveness

    •Performance

    •Reliability

    •Fault tolerance
Speech User Interface basic concepts

 Speech Recognition

    •The translation of spoken text into written text

                 algorithm

                                   "#'spit&S#"                   "speech"

             •Statistical Processing
                                       Phonetic representation
            •Hidden Marcov Models
                                             of speech
            •Dynamic Time Warping




  Types of speech recognition:
     •Command and control
     •Dictation
Speech User Interface basic concepts

 Speech Recognition Components

    •Audio input (front-end)
    •Grammars – contain commands that can be spoken by the user
    •Acoustic models – language dependant, used to “define” the language features
    •Recognition algorithms (back-end)


                                                     Back end



                    feature extraction                                        result
    Audio input /                        Acoustic               Recognition
                                                    Grammars
                                         models                 algorithms
     Front end
Speech User Interface basic concepts

 Speech Recognition APIs




                Microsoft SAPI    IBM: Embedded ViaVoice




                 Nuance: VoCon   VoiceBox Speech Recognition
Speech User Interface basic concepts

 Speech Synthesis

    •The translation of written text into spoken text


                          g2p
     "speech"                      "#'spit&S#"
Speech User Interface basic concepts

 Speech Synthesis APIs




       Microsoft SAPI       SoftVoice TTS   Apple PlainTalk




        Nuance: Vocalizer     SVOX TTS       eSpeak
Speech User Interface basic concepts - Usage

 In car:
     •Control media player / radio stations

     •Control navigation

     •Control phone book and phone activities

     •Find POI locations (POI : point of interests)

     •E-mail/SMS reading

 On the web:
     •HTML 5 speech input

     •Google Search with voice input

     •Reading of web page content
Speech Recognition Challenges – Accuracy

 Audio Input

 Problem: Audio signal quality
 Impact: loss of recognition accuracy

 Solution 1: Echo cancellation

 Solution 2: Beamforming
Speech Recognition Challenges – Accuracy

 Audio Input

 Problem: Talk-over problem
 Impact: loss of recognition accuracy

 Solution: Barge-In

                         TTS




                                           User
Speech Recognition Challenges – User responsiveness

 Speech Recognition

 Problem: resources are not ready and user starts to speak the command
 Solution: Delayed speech recognition




           Resource loading /                Back-end processing
          Front-end processing


                                 Delayed Speech Recognition
Speech Recognition Challenges – User responsiveness

 Speech Recognition

 Problem: synchronization with multiple applications (media, phone, navigation)

 Solution: apply concurrent design patterns

     •Active Object


     •Monitor


     •Double-checked locking
Speech Recognition Challenges – Performance

 Grammars

 Use cases:

 • Command & Control grammars
     • 200 – 500 commands

 •Navigation grammars
    • 100k+ static data

 •Music grammars
    • 10k+ dynamic data
Speech Recognition Challenges – Performance

 Grammars (1)

 Problem: Grammar size too big
 Impact:
    • increased loading times of files from disk to memory

 Solution: Grammar optimization
     •merging of similar command tokens
Speech Recognition Challenges – Performance

 Grammars (2)

 •removal / replacement of recursion rules
Speech Recognition Challenges – Performance

 Grammars (3)

 Problem: Grammar token collisions
 Impact:
     • loss of recognition accuracy
 Solution:
     •replacement of collision prone tokens with synonyms
     •adding special pronunciation tokens to collision words


 Examples:

     sum – sun – sung

     bet – bed
Speech Recognition Challenges – Performance

 Dynamic Grammars

 Problem: synchronization with USB devices, phones, navigation databases takes
 too much time

 Solution 1: implementation of a caching mechanism
Speech Recognition Challenges – Performance
                      Use id3 parser to read from mp3 files
                                                                    Title: One
                        titles, artists, composers, genre, album.
                                                                    Artist: U2,
                        etc.                                        Album: Achtung Baby,
                                                                    Genre: rock

                                                                    ...



                                                    Phoneme
                                                      cache




         dynamic                                                          transcriptions
         grammar                               add to slot:
                                          title <DYN_TITLE>
                                        artist <DYN_ARTIST>
Speech Recognition Challenges – Performance

 Dynamic Grammars

 Solution 2: split the processing in two, and dispatch part of the work to a different
 processor
                           Use id3 parser to read from mp3 files         CPU1
                                                                                Title: One
                             titles, artists, composers, genre, album.          Artist: U2,
                             etc.                                               Album: Achtung Baby,
                                                                                Genre: rock

                                                                                ...

                  CPU2
                                                                         CPU1

        dynamic                     CPU2
                                                                                Preprocessing step
        grammar                                   add to slot:
                                             title <DYN_TITLE>
                                           artist <DYN_ARTIST>
Speech Recognition Challenges – Reliability

 Reliability - the ability of the system to keep operating over time

 Problem: system has to operate correctly over large periods of time

 Solution 1: automated tests

 Solution 2: drive tests
Speech Recognition Challenges – Fault tolerance

 Problem: Recovery from system failures must be possible

 Solution:

 • system is modeled in a modular manner, with components that
   communicate via internal car area network.

 • individual components can be restarted without affecting other system
   components
Speech Recognition Challenges




                    TTS & ASR Demo
Speech Recognition Challenges




                       Questions ?
Speech Recognition Challenges




                        Thank You

Más contenido relacionado

La actualidad más candente

COMPUTER GRAPHICS PROJECT REPORT
COMPUTER GRAPHICS PROJECT REPORTCOMPUTER GRAPHICS PROJECT REPORT
COMPUTER GRAPHICS PROJECT REPORTvineet raj
 
Hand gesture recognition system(FYP REPORT)
Hand gesture recognition system(FYP REPORT)Hand gesture recognition system(FYP REPORT)
Hand gesture recognition system(FYP REPORT)Afnan Rehman
 
Image encryption and decryption
Image encryption and decryptionImage encryption and decryption
Image encryption and decryptionAashish R
 
Data Security Using Steganography
Data Security Using Steganography Data Security Using Steganography
Data Security Using Steganography NidhinRaj Saikripa
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognitionCharu Joshi
 
The byzantine generals problem
The byzantine generals problemThe byzantine generals problem
The byzantine generals problemNGUYEN VAN LUONG
 
Church Turing Thesis
Church Turing ThesisChurch Turing Thesis
Church Turing ThesisHemant Sharma
 
HAND GESTURE RECOGNITION.ppt (1).pptx
HAND GESTURE RECOGNITION.ppt (1).pptxHAND GESTURE RECOGNITION.ppt (1).pptx
HAND GESTURE RECOGNITION.ppt (1).pptxDeepakkumaragrahari1
 
Ppt presentation
Ppt presentationPpt presentation
Ppt presentationvishal4799
 
Facial Expression Recognition System using Deep Convolutional Neural Networks.
Facial Expression Recognition  System using Deep Convolutional Neural Networks.Facial Expression Recognition  System using Deep Convolutional Neural Networks.
Facial Expression Recognition System using Deep Convolutional Neural Networks.Sandeep Wakchaure
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Currency recognition using image processing in MATLAB
Currency recognition using image processing in MATLABCurrency recognition using image processing in MATLAB
Currency recognition using image processing in MATLABthahani kunju
 
20 Latest Computer Science Seminar Topics on Emerging Technologies
20 Latest Computer Science Seminar Topics on Emerging Technologies20 Latest Computer Science Seminar Topics on Emerging Technologies
20 Latest Computer Science Seminar Topics on Emerging TechnologiesSeminar Links
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognitionananth
 
Presentation On Steganography
Presentation On SteganographyPresentation On Steganography
Presentation On SteganographyTeachMission
 
ppt of gesture recognition
ppt of gesture recognitionppt of gesture recognition
ppt of gesture recognitionAayush Agrawal
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentationhimanshubhatti
 
Introduction to myanmar Text-To-Speech
Introduction to myanmar Text-To-SpeechIntroduction to myanmar Text-To-Speech
Introduction to myanmar Text-To-SpeechNgwe Tun
 

La actualidad más candente (20)

COMPUTER GRAPHICS PROJECT REPORT
COMPUTER GRAPHICS PROJECT REPORTCOMPUTER GRAPHICS PROJECT REPORT
COMPUTER GRAPHICS PROJECT REPORT
 
Hand gesture recognition system(FYP REPORT)
Hand gesture recognition system(FYP REPORT)Hand gesture recognition system(FYP REPORT)
Hand gesture recognition system(FYP REPORT)
 
Image encryption and decryption
Image encryption and decryptionImage encryption and decryption
Image encryption and decryption
 
Data Security Using Steganography
Data Security Using Steganography Data Security Using Steganography
Data Security Using Steganography
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
The byzantine generals problem
The byzantine generals problemThe byzantine generals problem
The byzantine generals problem
 
unit 1 ppt.pptx
unit 1 ppt.pptxunit 1 ppt.pptx
unit 1 ppt.pptx
 
Church Turing Thesis
Church Turing ThesisChurch Turing Thesis
Church Turing Thesis
 
Transport layer protocols : TCP and UDP
Transport layer protocols  : TCP and UDPTransport layer protocols  : TCP and UDP
Transport layer protocols : TCP and UDP
 
HAND GESTURE RECOGNITION.ppt (1).pptx
HAND GESTURE RECOGNITION.ppt (1).pptxHAND GESTURE RECOGNITION.ppt (1).pptx
HAND GESTURE RECOGNITION.ppt (1).pptx
 
Ppt presentation
Ppt presentationPpt presentation
Ppt presentation
 
Facial Expression Recognition System using Deep Convolutional Neural Networks.
Facial Expression Recognition  System using Deep Convolutional Neural Networks.Facial Expression Recognition  System using Deep Convolutional Neural Networks.
Facial Expression Recognition System using Deep Convolutional Neural Networks.
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Currency recognition using image processing in MATLAB
Currency recognition using image processing in MATLABCurrency recognition using image processing in MATLAB
Currency recognition using image processing in MATLAB
 
20 Latest Computer Science Seminar Topics on Emerging Technologies
20 Latest Computer Science Seminar Topics on Emerging Technologies20 Latest Computer Science Seminar Topics on Emerging Technologies
20 Latest Computer Science Seminar Topics on Emerging Technologies
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
Presentation On Steganography
Presentation On SteganographyPresentation On Steganography
Presentation On Steganography
 
ppt of gesture recognition
ppt of gesture recognitionppt of gesture recognition
ppt of gesture recognition
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
Introduction to myanmar Text-To-Speech
Introduction to myanmar Text-To-SpeechIntroduction to myanmar Text-To-Speech
Introduction to myanmar Text-To-Speech
 

Destacado

Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminarDiptimaya Sarangi
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySeminar Links
 
Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemREHMAT ULLAH
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionRichie
 
Dev Days, Speech Recognition, LM Aubert
Dev Days, Speech Recognition, LM AubertDev Days, Speech Recognition, LM Aubert
Dev Days, Speech Recognition, LM Aubertaubertlm
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologyAamir-sheriff
 
Speech recognition An overview
Speech recognition An overviewSpeech recognition An overview
Speech recognition An overviewsajanazoya
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overviewVarun Jain
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice RecognitionAmrita More
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By MatlabAnkit Gujrati
 
Good presentation!
Good presentation!Good presentation!
Good presentation!Arry Arman
 
IT Introduction - 06. Graphic & Multimedia
IT Introduction - 06. Graphic & MultimediaIT Introduction - 06. Graphic & Multimedia
IT Introduction - 06. Graphic & MultimediaArry Arman
 
Rajul computer presentation
Rajul computer presentationRajul computer presentation
Rajul computer presentationNeetu Jain
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by IqbalIqbal
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition子毅 楊
 
Gujarati Text-to-Speech Presentation
Gujarati Text-to-Speech PresentationGujarati Text-to-Speech Presentation
Gujarati Text-to-Speech Presentationsamyakbhuta
 
Developing with Speech and Voice Recognition in Mobile Apps
Developing with Speech and Voice Recognition in Mobile AppsDeveloping with Speech and Voice Recognition in Mobile Apps
Developing with Speech and Voice Recognition in Mobile AppsNick Landry
 

Destacado (20)

Uses of speech recognition system
Uses of speech recognition systemUses of speech recognition system
Uses of speech recognition system
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition system
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Dev Days, Speech Recognition, LM Aubert
Dev Days, Speech Recognition, LM AubertDev Days, Speech Recognition, LM Aubert
Dev Days, Speech Recognition, LM Aubert
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
Speech recognition An overview
Speech recognition An overviewSpeech recognition An overview
Speech recognition An overview
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overview
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
fundamentals of speech recognition
fundamentals of speech recognitionfundamentals of speech recognition
fundamentals of speech recognition
 
Good presentation!
Good presentation!Good presentation!
Good presentation!
 
IT Introduction - 06. Graphic & Multimedia
IT Introduction - 06. Graphic & MultimediaIT Introduction - 06. Graphic & Multimedia
IT Introduction - 06. Graphic & Multimedia
 
Rajul computer presentation
Rajul computer presentationRajul computer presentation
Rajul computer presentation
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition
 
Gujarati Text-to-Speech Presentation
Gujarati Text-to-Speech PresentationGujarati Text-to-Speech Presentation
Gujarati Text-to-Speech Presentation
 
Developing with Speech and Voice Recognition in Mobile Apps
Developing with Speech and Voice Recognition in Mobile AppsDeveloping with Speech and Voice Recognition in Mobile Apps
Developing with Speech and Voice Recognition in Mobile Apps
 

Similar a Speech recognition challenges

Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generatorsPaul Kahoro
 
Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...
Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...
Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...sebastianewert
 
Ry pyconjp2015 karaoke
Ry pyconjp2015 karaokeRy pyconjp2015 karaoke
Ry pyconjp2015 karaokeRenyuan Lyu
 
DSL Construction rith Ruby
DSL Construction rith RubyDSL Construction rith Ruby
DSL Construction rith RubyThoughtWorks
 
Speech recognition1
Speech recognition1Speech recognition1
Speech recognition1Sai Kiran
 
Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3Renyuan Lyu
 
Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3Renyuan Lyu
 
ScoReader: A Mobile Computer Vision System for Optical Music Recognition
ScoReader: A Mobile Computer Vision System for Optical Music RecognitionScoReader: A Mobile Computer Vision System for Optical Music Recognition
ScoReader: A Mobile Computer Vision System for Optical Music Recognitionchaser55
 
Speech recognition system
Speech recognition systemSpeech recognition system
Speech recognition systemRipal Ranpara
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheetLuke Summers
 
General Speereo Technology
General Speereo TechnologyGeneral Speereo Technology
General Speereo TechnologyDaniel Ischenko
 
Media as Levers (pdf)
Media as Levers (pdf)Media as Levers (pdf)
Media as Levers (pdf)Lawrie Hunter
 
Sound recording glossary improved
Sound recording glossary improvedSound recording glossary improved
Sound recording glossary improveddavidhall1415
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1Samiul Parag
 
Sound recording glossary
Sound recording glossarySound recording glossary
Sound recording glossarydavidhall1415
 

Similar a Speech recognition challenges (20)

Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generators
 
Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...
Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...
Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...
 
Ry pyconjp2015 karaoke
Ry pyconjp2015 karaokeRy pyconjp2015 karaoke
Ry pyconjp2015 karaoke
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
 
DSL Construction rith Ruby
DSL Construction rith RubyDSL Construction rith Ruby
DSL Construction rith Ruby
 
Speech recognition1
Speech recognition1Speech recognition1
Speech recognition1
 
Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3
 
Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3
 
ScoReader: A Mobile Computer Vision System for Optical Music Recognition
ScoReader: A Mobile Computer Vision System for Optical Music RecognitionScoReader: A Mobile Computer Vision System for Optical Music Recognition
ScoReader: A Mobile Computer Vision System for Optical Music Recognition
 
Speech recognition system
Speech recognition systemSpeech recognition system
Speech recognition system
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
 
Iitdmj 1
Iitdmj 1Iitdmj 1
Iitdmj 1
 
General Speereo Technology
General Speereo TechnologyGeneral Speereo Technology
General Speereo Technology
 
Media as Levers (pdf)
Media as Levers (pdf)Media as Levers (pdf)
Media as Levers (pdf)
 
Sound recording glossary improved
Sound recording glossary improvedSound recording glossary improved
Sound recording glossary improved
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1
 
IG2 Task 1
IG2 Task 1 IG2 Task 1
IG2 Task 1
 
Ig2task1worksheet
Ig2task1worksheetIg2task1worksheet
Ig2task1worksheet
 
Sound recording glossary
Sound recording glossarySound recording glossary
Sound recording glossary
 

Speech recognition challenges

  • 1. Speech Recognition Challenges Presenter: Alexandru Chica
  • 2. Contents Speech User Interface basic concepts •Speech recognition •Speech synthesis Speech Recognition Challenges •Accuracy •User responsiveness •Performance •Reliability •Fault tolerance
  • 3. Speech User Interface basic concepts Speech Recognition •The translation of spoken text into written text algorithm "#'spit&S#" "speech" •Statistical Processing Phonetic representation •Hidden Marcov Models of speech •Dynamic Time Warping Types of speech recognition: •Command and control •Dictation
  • 4. Speech User Interface basic concepts Speech Recognition Components •Audio input (front-end) •Grammars – contain commands that can be spoken by the user •Acoustic models – language dependant, used to “define” the language features •Recognition algorithms (back-end) Back end feature extraction result Audio input / Acoustic Recognition Grammars models algorithms Front end
  • 5. Speech User Interface basic concepts Speech Recognition APIs Microsoft SAPI IBM: Embedded ViaVoice Nuance: VoCon VoiceBox Speech Recognition
  • 6. Speech User Interface basic concepts Speech Synthesis •The translation of written text into spoken text g2p "speech" "#'spit&S#"
  • 7. Speech User Interface basic concepts Speech Synthesis APIs Microsoft SAPI SoftVoice TTS Apple PlainTalk Nuance: Vocalizer SVOX TTS eSpeak
  • 8. Speech User Interface basic concepts - Usage In car: •Control media player / radio stations •Control navigation •Control phone book and phone activities •Find POI locations (POI : point of interests) •E-mail/SMS reading On the web: •HTML 5 speech input •Google Search with voice input •Reading of web page content
  • 9. Speech Recognition Challenges – Accuracy Audio Input Problem: Audio signal quality Impact: loss of recognition accuracy Solution 1: Echo cancellation Solution 2: Beamforming
  • 10. Speech Recognition Challenges – Accuracy Audio Input Problem: Talk-over problem Impact: loss of recognition accuracy Solution: Barge-In TTS User
  • 11. Speech Recognition Challenges – User responsiveness Speech Recognition Problem: resources are not ready and user starts to speak the command Solution: Delayed speech recognition Resource loading / Back-end processing Front-end processing Delayed Speech Recognition
  • 12. Speech Recognition Challenges – User responsiveness Speech Recognition Problem: synchronization with multiple applications (media, phone, navigation) Solution: apply concurrent design patterns •Active Object •Monitor •Double-checked locking
  • 13. Speech Recognition Challenges – Performance Grammars Use cases: • Command & Control grammars • 200 – 500 commands •Navigation grammars • 100k+ static data •Music grammars • 10k+ dynamic data
  • 14. Speech Recognition Challenges – Performance Grammars (1) Problem: Grammar size too big Impact: • increased loading times of files from disk to memory Solution: Grammar optimization •merging of similar command tokens
  • 15. Speech Recognition Challenges – Performance Grammars (2) •removal / replacement of recursion rules
  • 16. Speech Recognition Challenges – Performance Grammars (3) Problem: Grammar token collisions Impact: • loss of recognition accuracy Solution: •replacement of collision prone tokens with synonyms •adding special pronunciation tokens to collision words Examples: sum – sun – sung bet – bed
  • 17. Speech Recognition Challenges – Performance Dynamic Grammars Problem: synchronization with USB devices, phones, navigation databases takes too much time Solution 1: implementation of a caching mechanism
  • 18. Speech Recognition Challenges – Performance Use id3 parser to read from mp3 files Title: One titles, artists, composers, genre, album. Artist: U2, etc. Album: Achtung Baby, Genre: rock ... Phoneme cache dynamic transcriptions grammar add to slot: title <DYN_TITLE> artist <DYN_ARTIST>
  • 19. Speech Recognition Challenges – Performance Dynamic Grammars Solution 2: split the processing in two, and dispatch part of the work to a different processor Use id3 parser to read from mp3 files CPU1 Title: One titles, artists, composers, genre, album. Artist: U2, etc. Album: Achtung Baby, Genre: rock ... CPU2 CPU1 dynamic CPU2 Preprocessing step grammar add to slot: title <DYN_TITLE> artist <DYN_ARTIST>
  • 20. Speech Recognition Challenges – Reliability Reliability - the ability of the system to keep operating over time Problem: system has to operate correctly over large periods of time Solution 1: automated tests Solution 2: drive tests
  • 21. Speech Recognition Challenges – Fault tolerance Problem: Recovery from system failures must be possible Solution: • system is modeled in a modular manner, with components that communicate via internal car area network. • individual components can be restarted without affecting other system components