SlideShare una empresa de Scribd logo
1 de 19
Speech Recognition
An Overview

Submitted To:

Submitted By:

Name: Dr. M Syamala Devi

Name: Varun Jain

Designation: Professor

Roll No.: 34

Department: DCSA, Panjab
University, Chandigarh

Class: MCA 5th Semester (M)

Panjab University, Chandigarh
Table of Contents


Introduction



Introduction – A closer look



Terms and Concepts in Speech Recognition



Pronunciation





Utterances
Grammar

How it works


Acoustic Model



Grammar



Recognized Text



Performance Measurement



Standards



Google Search by Voice (GSbV) – A case study


GSbV – A screenshot



GSbV – An example



Conclusions



References
Introduction


Have you ever talked to your computer?



I mean, have you really, really talked to your computer?



Where it actually recognized what you said and then did something as a result?



If you have, then you've used a technology known as speech recognition.



It is the process of converting spoken input to text.



Speech recognition allows you to provide input to an application with your voice.



Speech recognition is thus sometimes referred to as speech-to-text.



In the desktop world, you need a microphone to be able to do this.



The speech recognition process is performed by a software component known as
the speech recognition engine.



The primary function of the speech recognition engine is to process spoken input
and translate it into text that an application understands.
Introduction – A closer look


Upon hearing a voice as input, a speech recognition application can do one of
the following thing:






The application can interpret the result of the recognition as a command. In this
case, the application is a command and control application. An example of a
command and control application is one in which the caller says “check balance”,
and the application returns the current balance of the caller’s account.
If an application handles the recognized text simply as text, then it is considered a
dictation application. In a dictation application, if you said “check balance,” the
application would not interpret the result, but simply return the text “check
balance”.

For example, in response to hearing "Please say coffee, tea, or milk," you say
"coffee" and the speech recognition application you're calling tells you what
the flavor of the day is and then asks if you'd like to place an order.
Terms and Concepts in Speech
Recognition


There are basically three terms that are majorly used in context of speech
recognition concept.



These three are:


Utterance



Pronunciation



Grammar
Utterance


When the user says something, this is known as an utterance.



An utterance is any stream of speech between two periods of silence.



Utterances are sent to the speech engine to be processed.



Silence, in speech recognition, is almost as important as what is spoken, because
silence delineates the start and end of an utterance.



The speech recognition engine is "listening" for speech input.



When the engine detects audio input - in other words, a lack of silence -- the
beginning of an utterance is signaled.



Similarly, when the engine detects a certain amount of silence following the
audio, the end of the utterance occurs.



If the user doesn’t say anything, the engine returns what is known as a silence
timeout - an indication that there was no speech detected within the expected
timeframe, and the application takes an appropriate action, such as re-prompting
the user for input.
Pronunciation


The speech recognition engine uses all sorts of data, statistical models, and
algorithms to convert spoken input into text.



One piece of information that the speech recognition engine uses to process a
word is its pronunciation, which represents what the speech engine thinks a
word should sound like.



Words can have multiple pronunciations associated with them.



For example, the word “the” has at least two pronunciations in the U.S.
English language: “thee” and “thuh.”
Grammar


One must specify the words and phrases that users can say to your
application.



These words and phrases are defined to the speech recognition engine and
are used in the recognition process.



You can specify the valid words and phrases in a number of different ways,
but in application, you do this by specifying a grammar.



A grammar uses a particular syntax, or set of rules, to define the words and
phrases that can be recognized by the engine.



A grammar can be as simple as a list of words, or it can be flexible enough to
allow such variability in what can be said that it approaches natural language
capability.
How it works

Fig. 1 A basic working model of speech recognition
depicting how a general speech recognition works.
Acoustic Model


Acoustic models provide an estimate for the likelihood of the observed
features in a frame of speech given a particular phonetic context.



The features are typically related to measurements of the spectral
characteristics of a time-slice of speech.



While individual recipes for training acoustic models vary in their structure
and sequencing, the basic process involves aligning transcribed speech to
states within an existing acoustic model, accumulating frames associated with
each state, and re-estimating the probability distributions associated with the
state, given the features observed in those frames.
Grammar


A grammar can be as simple as a list of words, or it can be flexible enough to
allow such variability in what can be said that it approaches natural language
capability.



Grammars define the domain, or context, within which the recognition engine
works.



The engine compares the current utterance against the words and phrases in the
active grammars.



If the user says something that is not in the grammar, the speech engine will not
be able to decipher it correctly.



You can define a single grammar for your application, or you may have multiple
grammars.



Chances are, you will have multiple grammars, and you will activate each
grammar only when it is needed.
Recognized Text


It is an output of this model that produces a likelihood if what user has
provided as input commonly known as recognized text.



This text can be processed in two way by the application using this model:


As a command



As a dictation text



As a command, a further processing is made upon the text which serves as
output



As a dictation text, a stream of text is shown as output as a formatted text
just like a text can be shown on screen.
Performance Measurement


The performance of a speech recognition system is measurable.



Perhaps the most widely used measurement is accuracy.



Arguably the most important measurement of accuracy is whether the desired
end result occurred.



This measurement is useful in validating application design.



For example, if the user said "yes," the engine returned "yes," and the "YES"
action was executed, it is clear that the desired end result was achieved.



But what happens if the engine returns text that does not exactly match the
utterance?



For example, what if the user said "nope," the engine returned "no," yet the
"NO" action was executed?
Standard


There are a lot of standards by World Wide Web Consortium (W3C)



Some of them are enlisted below:


VoiceXML



SRGS (Speech Recognition Grammar Specification)



SISR (Semantic Interpretation for Speech Recognition)



SSML (Speech Synthesis Markup Language)



PLS (Pronunciation Lexicon Specification)



CCXML (Call Control eXtensible Markup Language)



MediaCTRL



MSML (Media Server Markup Language)



MSCML (Media Server Control Markup Language)
Google search by voice: A case study


Mobile web search is a rapidly growing area of interest.



Internet-enabled Smart phones account for an increasing share of the mobile
devices sold throughout the world, and most models offer a web browsing
experience that rivals desktop computers in display quality.



Users are increasingly turning to their mobile devices when doing web
searches, driving efforts to enhance the usability of web search on these
devices.



Although mobile device usability has improved, typing search queries can still
be cumbersome, error-prone, and even dangerous in some usage scenarios.
GSbV – A screenshot
GSbV – An example

Figure 3. An example of a speech recognizer app GSbV
Conclusions


Speech recognition will revolutionize the way people conduct business over
the Web and will, ultimately, differentiate world-class e-businesses.



VoiceXML ties speech recognition and telephony together and provides the
technology with which businesses can develop and deploy voice-enabled Web
solutions TODAY!



These solutions can greatly expand the accessibility of Web-based self-service
transactions to customers who would otherwise not have access, and, at the
same time, leverage a business’ existing Web investments.



Speech recognition and VoiceXML clearly represent the next wave of the Web.
References



http://cmusphinx.sourceforge.net/wiki/tutorialconcepts



ftp://public.dhe.ibm.com/software/pervasive/info/products/Introduction_to
_Speech_Reco gnition.pdf



https://www.google.co.in/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved
=0CDAQFjAB&url=http%3A%2F%2Fresearch.google.com%2Fpubs%2Farchive%2F3
6340.pdf&ei=13GVUoW5CY6FrAeZ04GgDg&usg=AFQjCNEWeZqQk1pIkXjtzrNG__
ZR-UUNbA&bvm=bv.57155469,d.bmk&cad=rja



http://en.wikipedia.org/wiki/VoiceXML

Más contenido relacionado

La actualidad más candente

Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySeminar Links
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition systemavinash raibole
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech RecognitionHugo Moreno
 
A seminar report on speech recognition technology
A seminar report on speech recognition technologyA seminar report on speech recognition technology
A seminar report on speech recognition technologySrijanKumar18
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognitionCharu Joshi
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognitionfathitarek
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech RecognitionAhmed Moawad
 
Artificial Intelligence for Speech Recognition
Artificial Intelligence for Speech RecognitionArtificial Intelligence for Speech Recognition
Artificial Intelligence for Speech RecognitionRHIMRJ Journal
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data miningJimit Rupani
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition systemAlok Tiwari
 
Artificial intelligence in speech recognition
Artificial intelligence in speech recognitionArtificial intelligence in speech recognition
Artificial intelligence in speech recognitionRajanivetha G
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionRichie
 
Voice morphing document
Voice morphing documentVoice morphing document
Voice morphing documenthimadrigupta
 
Speech recognition An overview
Speech recognition An overviewSpeech recognition An overview
Speech recognition An overviewsajanazoya
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technologyKalluri Madhuri
 
Abstract of speech recognition
Abstract of speech recognitionAbstract of speech recognition
Abstract of speech recognitionVinay Jaisriram
 

La actualidad más candente (20)

Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition system
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
A seminar report on speech recognition technology
A seminar report on speech recognition technologyA seminar report on speech recognition technology
A seminar report on speech recognition technology
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Artificial Intelligence for Speech Recognition
Artificial Intelligence for Speech RecognitionArtificial Intelligence for Speech Recognition
Artificial Intelligence for Speech Recognition
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data mining
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Artificial intelligence in speech recognition
Artificial intelligence in speech recognitionArtificial intelligence in speech recognition
Artificial intelligence in speech recognition
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Voice morphing document
Voice morphing documentVoice morphing document
Voice morphing document
 
Speech recognition An overview
Speech recognition An overviewSpeech recognition An overview
Speech recognition An overview
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
 
Voice recognition
Voice recognitionVoice recognition
Voice recognition
 
Speech Recognition System
Speech Recognition SystemSpeech Recognition System
Speech Recognition System
 
Abstract of speech recognition
Abstract of speech recognitionAbstract of speech recognition
Abstract of speech recognition
 

Destacado

Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by IqbalIqbal
 
Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemREHMAT ULLAH
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By MatlabAnkit Gujrati
 
Speech recognition project report
Speech recognition project reportSpeech recognition project report
Speech recognition project reportSarang Afle
 
Dev Days, Speech Recognition, LM Aubert
Dev Days, Speech Recognition, LM AubertDev Days, Speech Recognition, LM Aubert
Dev Days, Speech Recognition, LM Aubertaubertlm
 
Blackboard Pattern
Blackboard PatternBlackboard Pattern
Blackboard Patterntcab22
 
Blackboard architecture pattern
Blackboard architecture patternBlackboard architecture pattern
Blackboard architecture patternaish006
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition子毅 楊
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologyAamir-sheriff
 
Rajul computer presentation
Rajul computer presentationRajul computer presentation
Rajul computer presentationNeetu Jain
 
Speech recognition challenges
Speech recognition challengesSpeech recognition challenges
Speech recognition challengesAlexandru Chica
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Chiranjeevi Adi
 
Applying Blackboard Systems to First Person Shooters
Applying Blackboard Systems to First Person ShootersApplying Blackboard Systems to First Person Shooters
Applying Blackboard Systems to First Person Shootershbbalfred
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionboddu syamprasad
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition systemAlok Tiwari
 
AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYIJCERT
 

Destacado (19)

Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
 
Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition system
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
Speech recognition project report
Speech recognition project reportSpeech recognition project report
Speech recognition project report
 
Dev Days, Speech Recognition, LM Aubert
Dev Days, Speech Recognition, LM AubertDev Days, Speech Recognition, LM Aubert
Dev Days, Speech Recognition, LM Aubert
 
Blackboard Pattern
Blackboard PatternBlackboard Pattern
Blackboard Pattern
 
Blackboard architecture pattern
Blackboard architecture patternBlackboard architecture pattern
Blackboard architecture pattern
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition
 
fundamentals of speech recognition
fundamentals of speech recognitionfundamentals of speech recognition
fundamentals of speech recognition
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Rajul computer presentation
Rajul computer presentationRajul computer presentation
Rajul computer presentation
 
Speech recognition challenges
Speech recognition challengesSpeech recognition challenges
Speech recognition challenges
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks
 
Applying Blackboard Systems to First Person Shooters
Applying Blackboard Systems to First Person ShootersApplying Blackboard Systems to First Person Shooters
Applying Blackboard Systems to First Person Shooters
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEY
 

Similar a Speech recognition an overview

Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemIJERA Editor
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
IRJET- Voice based Billing System
IRJET-  	  Voice based Billing SystemIRJET-  	  Voice based Billing System
IRJET- Voice based Billing SystemIRJET Journal
 
ACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITIONACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITIONijistjournal
 
Paper on Speech Recognition
Paper on Speech RecognitionPaper on Speech Recognition
Paper on Speech RecognitionThejus Joby
 
Deciphering voice of customer through speech analytics
Deciphering voice of customer through speech analyticsDeciphering voice of customer through speech analytics
Deciphering voice of customer through speech analyticsR Systems International
 
Instant speech translation 10BM60080 - VGSOM
Instant speech translation   10BM60080 - VGSOMInstant speech translation   10BM60080 - VGSOM
Instant speech translation 10BM60080 - VGSOMsathiyaseelanm
 
NLP and its applications
NLP and its applicationsNLP and its applications
NLP and its applicationsUtphala P
 
Developing a hands-free interface to operate a Computer using voice command
Developing a hands-free interface to operate a Computer using voice commandDeveloping a hands-free interface to operate a Computer using voice command
Developing a hands-free interface to operate a Computer using voice commandMohammad Liton Hossain
 
Allin Qillqay A Free On-Line Web Spell Checking Service For Quechua
Allin Qillqay  A Free On-Line Web Spell Checking Service For QuechuaAllin Qillqay  A Free On-Line Web Spell Checking Service For Quechua
Allin Qillqay A Free On-Line Web Spell Checking Service For QuechuaAndrea Porter
 
NICE Speech Analytics - White Paper
NICE Speech Analytics - White PaperNICE Speech Analytics - White Paper
NICE Speech Analytics - White PaperClark Hill
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template MatchingIJORCS
 
Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server
Tulsa Techfest 2008 - Creating A Voice User Interface With Speech ServerTulsa Techfest 2008 - Creating A Voice User Interface With Speech Server
Tulsa Techfest 2008 - Creating A Voice User Interface With Speech ServerJason Townsend, MBA
 
Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...NALESVPMEngg
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfNohaGhoweil
 
NLP Techniques for Speech Recognition.docx
NLP Techniques for Speech Recognition.docxNLP Techniques for Speech Recognition.docx
NLP Techniques for Speech Recognition.docxKevinSims18
 

Similar a Speech recognition an overview (20)

Seminar
SeminarSeminar
Seminar
 
Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis System
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
IRJET- Voice based Billing System
IRJET-  	  Voice based Billing SystemIRJET-  	  Voice based Billing System
IRJET- Voice based Billing System
 
ACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITIONACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITION
 
Paper on Speech Recognition
Paper on Speech RecognitionPaper on Speech Recognition
Paper on Speech Recognition
 
Deciphering voice of customer through speech analytics
Deciphering voice of customer through speech analyticsDeciphering voice of customer through speech analytics
Deciphering voice of customer through speech analytics
 
Instant speech translation 10BM60080 - VGSOM
Instant speech translation   10BM60080 - VGSOMInstant speech translation   10BM60080 - VGSOM
Instant speech translation 10BM60080 - VGSOM
 
NLP and its applications
NLP and its applicationsNLP and its applications
NLP and its applications
 
Developing a hands-free interface to operate a Computer using voice command
Developing a hands-free interface to operate a Computer using voice commandDeveloping a hands-free interface to operate a Computer using voice command
Developing a hands-free interface to operate a Computer using voice command
 
Allin Qillqay A Free On-Line Web Spell Checking Service For Quechua
Allin Qillqay  A Free On-Line Web Spell Checking Service For QuechuaAllin Qillqay  A Free On-Line Web Spell Checking Service For Quechua
Allin Qillqay A Free On-Line Web Spell Checking Service For Quechua
 
NICE Speech Analytics - White Paper
NICE Speech Analytics - White PaperNICE Speech Analytics - White Paper
NICE Speech Analytics - White Paper
 
Google Voice-to-text
Google Voice-to-textGoogle Voice-to-text
Google Voice-to-text
 
10
1010
10
 
Ijetcas14 390
Ijetcas14 390Ijetcas14 390
Ijetcas14 390
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template Matching
 
Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server
Tulsa Techfest 2008 - Creating A Voice User Interface With Speech ServerTulsa Techfest 2008 - Creating A Voice User Interface With Speech Server
Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server
 
Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
 
NLP Techniques for Speech Recognition.docx
NLP Techniques for Speech Recognition.docxNLP Techniques for Speech Recognition.docx
NLP Techniques for Speech Recognition.docx
 

Más de Varun Jain

Mobile devices
Mobile devicesMobile devices
Mobile devicesVarun Jain
 
Java in web 2 0 presentation
Java in web 2 0 presentationJava in web 2 0 presentation
Java in web 2 0 presentationVarun Jain
 
Ready transmesia media
Ready transmesia mediaReady transmesia media
Ready transmesia mediaVarun Jain
 
Computer communications and networks
Computer communications and networksComputer communications and networks
Computer communications and networksVarun Jain
 
The brain computing
The brain computingThe brain computing
The brain computingVarun Jain
 
Data warehousing
Data warehousingData warehousing
Data warehousingVarun Jain
 
Billing software
Billing softwareBilling software
Billing softwareVarun Jain
 
Making power point slides
Making power point slidesMaking power point slides
Making power point slidesVarun Jain
 

Más de Varun Jain (8)

Mobile devices
Mobile devicesMobile devices
Mobile devices
 
Java in web 2 0 presentation
Java in web 2 0 presentationJava in web 2 0 presentation
Java in web 2 0 presentation
 
Ready transmesia media
Ready transmesia mediaReady transmesia media
Ready transmesia media
 
Computer communications and networks
Computer communications and networksComputer communications and networks
Computer communications and networks
 
The brain computing
The brain computingThe brain computing
The brain computing
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Billing software
Billing softwareBilling software
Billing software
 
Making power point slides
Making power point slidesMaking power point slides
Making power point slides
 

Último

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 

Último (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

Speech recognition an overview

  • 1. Speech Recognition An Overview Submitted To: Submitted By: Name: Dr. M Syamala Devi Name: Varun Jain Designation: Professor Roll No.: 34 Department: DCSA, Panjab University, Chandigarh Class: MCA 5th Semester (M) Panjab University, Chandigarh
  • 2. Table of Contents  Introduction  Introduction – A closer look  Terms and Concepts in Speech Recognition   Pronunciation   Utterances Grammar How it works  Acoustic Model  Grammar  Recognized Text  Performance Measurement  Standards  Google Search by Voice (GSbV) – A case study  GSbV – A screenshot  GSbV – An example  Conclusions  References
  • 3. Introduction  Have you ever talked to your computer?  I mean, have you really, really talked to your computer?  Where it actually recognized what you said and then did something as a result?  If you have, then you've used a technology known as speech recognition.  It is the process of converting spoken input to text.  Speech recognition allows you to provide input to an application with your voice.  Speech recognition is thus sometimes referred to as speech-to-text.  In the desktop world, you need a microphone to be able to do this.  The speech recognition process is performed by a software component known as the speech recognition engine.  The primary function of the speech recognition engine is to process spoken input and translate it into text that an application understands.
  • 4. Introduction – A closer look  Upon hearing a voice as input, a speech recognition application can do one of the following thing:    The application can interpret the result of the recognition as a command. In this case, the application is a command and control application. An example of a command and control application is one in which the caller says “check balance”, and the application returns the current balance of the caller’s account. If an application handles the recognized text simply as text, then it is considered a dictation application. In a dictation application, if you said “check balance,” the application would not interpret the result, but simply return the text “check balance”. For example, in response to hearing "Please say coffee, tea, or milk," you say "coffee" and the speech recognition application you're calling tells you what the flavor of the day is and then asks if you'd like to place an order.
  • 5. Terms and Concepts in Speech Recognition  There are basically three terms that are majorly used in context of speech recognition concept.  These three are:  Utterance  Pronunciation  Grammar
  • 6. Utterance  When the user says something, this is known as an utterance.  An utterance is any stream of speech between two periods of silence.  Utterances are sent to the speech engine to be processed.  Silence, in speech recognition, is almost as important as what is spoken, because silence delineates the start and end of an utterance.  The speech recognition engine is "listening" for speech input.  When the engine detects audio input - in other words, a lack of silence -- the beginning of an utterance is signaled.  Similarly, when the engine detects a certain amount of silence following the audio, the end of the utterance occurs.  If the user doesn’t say anything, the engine returns what is known as a silence timeout - an indication that there was no speech detected within the expected timeframe, and the application takes an appropriate action, such as re-prompting the user for input.
  • 7. Pronunciation  The speech recognition engine uses all sorts of data, statistical models, and algorithms to convert spoken input into text.  One piece of information that the speech recognition engine uses to process a word is its pronunciation, which represents what the speech engine thinks a word should sound like.  Words can have multiple pronunciations associated with them.  For example, the word “the” has at least two pronunciations in the U.S. English language: “thee” and “thuh.”
  • 8. Grammar  One must specify the words and phrases that users can say to your application.  These words and phrases are defined to the speech recognition engine and are used in the recognition process.  You can specify the valid words and phrases in a number of different ways, but in application, you do this by specifying a grammar.  A grammar uses a particular syntax, or set of rules, to define the words and phrases that can be recognized by the engine.  A grammar can be as simple as a list of words, or it can be flexible enough to allow such variability in what can be said that it approaches natural language capability.
  • 9. How it works Fig. 1 A basic working model of speech recognition depicting how a general speech recognition works.
  • 10. Acoustic Model  Acoustic models provide an estimate for the likelihood of the observed features in a frame of speech given a particular phonetic context.  The features are typically related to measurements of the spectral characteristics of a time-slice of speech.  While individual recipes for training acoustic models vary in their structure and sequencing, the basic process involves aligning transcribed speech to states within an existing acoustic model, accumulating frames associated with each state, and re-estimating the probability distributions associated with the state, given the features observed in those frames.
  • 11. Grammar  A grammar can be as simple as a list of words, or it can be flexible enough to allow such variability in what can be said that it approaches natural language capability.  Grammars define the domain, or context, within which the recognition engine works.  The engine compares the current utterance against the words and phrases in the active grammars.  If the user says something that is not in the grammar, the speech engine will not be able to decipher it correctly.  You can define a single grammar for your application, or you may have multiple grammars.  Chances are, you will have multiple grammars, and you will activate each grammar only when it is needed.
  • 12. Recognized Text  It is an output of this model that produces a likelihood if what user has provided as input commonly known as recognized text.  This text can be processed in two way by the application using this model:  As a command  As a dictation text  As a command, a further processing is made upon the text which serves as output  As a dictation text, a stream of text is shown as output as a formatted text just like a text can be shown on screen.
  • 13. Performance Measurement  The performance of a speech recognition system is measurable.  Perhaps the most widely used measurement is accuracy.  Arguably the most important measurement of accuracy is whether the desired end result occurred.  This measurement is useful in validating application design.  For example, if the user said "yes," the engine returned "yes," and the "YES" action was executed, it is clear that the desired end result was achieved.  But what happens if the engine returns text that does not exactly match the utterance?  For example, what if the user said "nope," the engine returned "no," yet the "NO" action was executed?
  • 14. Standard  There are a lot of standards by World Wide Web Consortium (W3C)  Some of them are enlisted below:  VoiceXML  SRGS (Speech Recognition Grammar Specification)  SISR (Semantic Interpretation for Speech Recognition)  SSML (Speech Synthesis Markup Language)  PLS (Pronunciation Lexicon Specification)  CCXML (Call Control eXtensible Markup Language)  MediaCTRL  MSML (Media Server Markup Language)  MSCML (Media Server Control Markup Language)
  • 15. Google search by voice: A case study  Mobile web search is a rapidly growing area of interest.  Internet-enabled Smart phones account for an increasing share of the mobile devices sold throughout the world, and most models offer a web browsing experience that rivals desktop computers in display quality.  Users are increasingly turning to their mobile devices when doing web searches, driving efforts to enhance the usability of web search on these devices.  Although mobile device usability has improved, typing search queries can still be cumbersome, error-prone, and even dangerous in some usage scenarios.
  • 16. GSbV – A screenshot
  • 17. GSbV – An example Figure 3. An example of a speech recognizer app GSbV
  • 18. Conclusions  Speech recognition will revolutionize the way people conduct business over the Web and will, ultimately, differentiate world-class e-businesses.  VoiceXML ties speech recognition and telephony together and provides the technology with which businesses can develop and deploy voice-enabled Web solutions TODAY!  These solutions can greatly expand the accessibility of Web-based self-service transactions to customers who would otherwise not have access, and, at the same time, leverage a business’ existing Web investments.  Speech recognition and VoiceXML clearly represent the next wave of the Web.