SlideShare una empresa de Scribd logo
1 de 12
Acoustic Speech Recognition Techniques
Audio Signal Recognized Text
Sonu Kumar Mishra
BE Comp 2015-16
 Introduction
 Historical Survey
 Motivation
 General Model of ASR
 Feature Extraction
 Hidden Markov Model
 Existing Systems
 Developing ASR Systems
 Revolution in ASR
Human
Computer
Speech Speech
Text Text
Meaning
Input Output
UnderstandingGeneration
 The first speech recognition system (Audrey) was
developed at bell laboratories in 1952. It could
recognise numbers spoken by one person.
 In 1970s Carnegie Mellon came out with HARPY
system which could recognise 1011 words with
different pronunciation.
 In 1980s new systems based on Hidden Markov
Model was introduced. HMM was statistical
approach and more robust than the earlier
technology.
 Language is the fundamental mode of
communication. Communicating with
machines in natural language effectively is a
challenge.
 We can use ASR systems to control machines,
find contents online and contribute to
generate contents.
 Most of the speech recognition systems and
contents (about 80 %) are available for 10
major languages. Hence, there is a need to
expand the system for local languages.
 If we can interact with machines in our own
local language then it would be greater
achievement for modern era.
Speech Input
Analog to digital
Feature Extraction
(Generate speech fingerprint)
Compare and Select words
Prob
Matrices
updated
Compare and Select
Sentence level match
Pick most probable word
Output
Training algorithm
Word
fingerprint
template
Sentence
fingerprint
template
HMM
“Dividing sound waves, extract phonemes & represent using some parameters”
LPCC MFCC RASTA-PLP
Low resource, High
popularity,
Easy implementation,
Single speaker, single
language, Below 300
words
Moderate resource,
High popularity,
Easy-moderate impl,
Multi speaker, Multi-
language, Moderate
vocabulary
High resource, Low
popularity, Modrate-
hard impl, Multi-
speaker, Multi-
language, Large
vocabulary
Power Spectral Analysis
(FFT)
First Order Derivative
(DELTA)
Energy Normalization
Outdated Techniques
DNN
Multiple
User
Multiple
Language
Large
Vocabulary
Abundant
Resources
Phonetic Dictionary (TTS Synthesizer)
Z1
XnX2X1
Z2 Zn
Observed
Data
Hidden or latent data
“Markov Chain”
Why HMM ?
 Simple for sequential and temporal data.
 Handle real world applications.
 It works on the principle of
New State = ʄ (old state, noise)
Initial
Probability
Transition
Probability
Observed
Probability
Applications :
 Speech Recognition.
 Facial Expression Recognition.
 Handwriting Recognition.
 Bioinformatics : Analyzing biological
data.
 Large systems like Siri,
Google voice and
Cortana are based on
neural network.
 High computing
processors.
 AI algorithms
 Parallel processing.
Time
Money
Scientists
Computing
Power
Engineers
Using built-in supportIn order to build from scratch
Kaldi is a toolkit for speech recognition written in C++ and
licensed under the Apache License v2.0. Kaldi is intended for
use by speech recognition researchers.
An open source toolkit for speech recognition, which includes
a recognizer library written in C; an adjustable, modifiable
recognizer written in Java.
Acoustic model, language model, Input source, Dictionary
 Use of ASR systems to interact
with the devices used in daily
life.
 ASR systems working in local
languages.
 Developing Neural network
based ASR systems working in all
major languages.
Speech recognition techniques

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Unit 1 speech processing
Unit 1 speech processingUnit 1 speech processing
Unit 1 speech processing
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail Inteligence
 
Voice Morping ppt
Voice Morping pptVoice Morping ppt
Voice Morping ppt
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Sign language recognizer
Sign language recognizerSign language recognizer
Sign language recognizer
 
Speech processing
Speech processingSpeech processing
Speech processing
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
 
Background subtraction
Background subtractionBackground subtraction
Background subtraction
 
Voice based email for blinds
Voice based email for blindsVoice based email for blinds
Voice based email for blinds
 
IEEE Papers on Image Processing
IEEE Papers on Image ProcessingIEEE Papers on Image Processing
IEEE Papers on Image Processing
 
Speech Recognition System
Speech Recognition SystemSpeech Recognition System
Speech Recognition System
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
 

Similar a Speech recognition techniques

SMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk SystemSMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk System
CSCJournals
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
ankit_saluja
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
ankit_saluja
 

Similar a Speech recognition techniques (20)

Asr
AsrAsr
Asr
 
Assign
AssignAssign
Assign
 
sr.ppt
sr.pptsr.ppt
sr.ppt
 
Voice recognitionr.ppt
Voice recognitionr.pptVoice recognitionr.ppt
Voice recognitionr.ppt
 
sr.ppt
sr.pptsr.ppt
sr.ppt
 
Speech Technology Overview
Speech Technology OverviewSpeech Technology Overview
Speech Technology Overview
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognition
 
Asr
AsrAsr
Asr
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
SMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk SystemSMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk System
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Artificial Intelligence- An Introduction
Artificial Intelligence- An IntroductionArtificial Intelligence- An Introduction
Artificial Intelligence- An Introduction
 
Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction
 
speech enhancement
speech enhancementspeech enhancement
speech enhancement
 
Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...
 
De4201715719
De4201715719De4201715719
De4201715719
 
TUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKS
TUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKSTUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKS
TUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKS
 
Tuning Dari Speech Classification Employing Deep Neural Networks
Tuning Dari Speech Classification Employing Deep Neural NetworksTuning Dari Speech Classification Employing Deep Neural Networks
Tuning Dari Speech Classification Employing Deep Neural Networks
 
A Short Introduction To Text-To-Speech Synthesis
A Short Introduction To Text-To-Speech SynthesisA Short Introduction To Text-To-Speech Synthesis
A Short Introduction To Text-To-Speech Synthesis
 

Último

DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 

Último (20)

DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 

Speech recognition techniques

  • 1. Acoustic Speech Recognition Techniques Audio Signal Recognized Text Sonu Kumar Mishra BE Comp 2015-16
  • 2.  Introduction  Historical Survey  Motivation  General Model of ASR  Feature Extraction  Hidden Markov Model  Existing Systems  Developing ASR Systems  Revolution in ASR
  • 4.  The first speech recognition system (Audrey) was developed at bell laboratories in 1952. It could recognise numbers spoken by one person.  In 1970s Carnegie Mellon came out with HARPY system which could recognise 1011 words with different pronunciation.  In 1980s new systems based on Hidden Markov Model was introduced. HMM was statistical approach and more robust than the earlier technology.
  • 5.  Language is the fundamental mode of communication. Communicating with machines in natural language effectively is a challenge.  We can use ASR systems to control machines, find contents online and contribute to generate contents.  Most of the speech recognition systems and contents (about 80 %) are available for 10 major languages. Hence, there is a need to expand the system for local languages.  If we can interact with machines in our own local language then it would be greater achievement for modern era.
  • 6. Speech Input Analog to digital Feature Extraction (Generate speech fingerprint) Compare and Select words Prob Matrices updated Compare and Select Sentence level match Pick most probable word Output Training algorithm Word fingerprint template Sentence fingerprint template HMM
  • 7. “Dividing sound waves, extract phonemes & represent using some parameters” LPCC MFCC RASTA-PLP Low resource, High popularity, Easy implementation, Single speaker, single language, Below 300 words Moderate resource, High popularity, Easy-moderate impl, Multi speaker, Multi- language, Moderate vocabulary High resource, Low popularity, Modrate- hard impl, Multi- speaker, Multi- language, Large vocabulary Power Spectral Analysis (FFT) First Order Derivative (DELTA) Energy Normalization Outdated Techniques DNN Multiple User Multiple Language Large Vocabulary Abundant Resources Phonetic Dictionary (TTS Synthesizer)
  • 8. Z1 XnX2X1 Z2 Zn Observed Data Hidden or latent data “Markov Chain” Why HMM ?  Simple for sequential and temporal data.  Handle real world applications.  It works on the principle of New State = ʄ (old state, noise) Initial Probability Transition Probability Observed Probability Applications :  Speech Recognition.  Facial Expression Recognition.  Handwriting Recognition.  Bioinformatics : Analyzing biological data.
  • 9.  Large systems like Siri, Google voice and Cortana are based on neural network.  High computing processors.  AI algorithms  Parallel processing.
  • 10. Time Money Scientists Computing Power Engineers Using built-in supportIn order to build from scratch Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers. An open source toolkit for speech recognition, which includes a recognizer library written in C; an adjustable, modifiable recognizer written in Java. Acoustic model, language model, Input source, Dictionary
  • 11.  Use of ASR systems to interact with the devices used in daily life.  ASR systems working in local languages.  Developing Neural network based ASR systems working in all major languages.