speech recognition,History of speech recognition,what is speech recognition,Voice recognition software , Advantages and Disadvantages speech recognition, voice recognition,Voice recognition in operating systems ,Types of speech recognition
2. History of speech
recognition:
1950s and 1960s: Baby Talk
The first speech recognition systems could understand only digits.
(Given the complexity of human language, it makes sense that
inventors and engineers first focused on numbers.) Bell
Laboratories designed in 1952 the "Audrey" system, which
recognized digits spoken by a single voice. Ten years later, IBM
demonstrated at the 1962 World's Fair its "Shoebox" machine,
which could understand 16 words spoken in English.
Labs in the United States, Japan, England, and the Soviet Union
developed other hardware dedicated to recognizing spoken
sounds, expanding speech recognition technology to support four
vowels and nine consonants.
They may not sound like much, but these first efforts were an
impressive start, especially when you consider how primitive
computers themselves were at the time
3. 1970S: SPEECH RECOGNITION
TAKES OFF
•Speech recognition technology made major strides in the 1970s, thanks to
interest and funding from the U.S. Department of Defense. The DoD's DARPA
Speech Understanding Research (SUR) program, from 1971 to 1976, was one of
the largest of its kind in the history of speech recognition, and among other
things it was responsible for Carnegie Mellon's "Harpy" speech-understanding
system.
• Harpy could understand 1011 words, approximately the vocabulary of an
average three-year-oldHarpy was significant because it introduced a more
efficient search approach, called beam search, to "prove the finite-state network
of possible sentences," according to Readings in Speech Recognition by Alex
Waibel and Kai-Fu Lee. (The story of speech recognition is very much tied to
advances in search methodology and technology, as Google's entrance into
speech recognition on mobile devices proved just a few years ago.)
•The '70s also marked a few other important milestones in speech recognition
technology, including the founding of the first commercial speech recognition
company, Threshold Technology, as well as Bell Laboratories' introduction of a
system that could interpret multiple people's voices.
4. 1980S: SPEECH RECOGNITION
TURNS TOWARD PREDICTION
Over the next decade, thanks to new approaches to understanding what
people say, speech recognition vocabulary jumped from about a few hundred
words to several thousand words, and had the potential to recognize an
unlimited number of words. One major reason was a new statistical method
known as the hidden Markov model.
Rather than simply using templates for words and looking for sound patterns,
HMM considered the probability of unknown sounds' being words. This
foundation would be in place for the next two decades (see Automatic Speech
Recognition—A Brief History of the Technology Development by B.H. Juang
and Lawrence R. Rabiner).
Equipped with this expanded vocabulary, speech recognition started to work
its way into commercial applications for business and specialized industry (for
instance, medical use). It even entered the home, in the form ofWorlds of
Wonder's Julie doll(1987), which children could train to respond to their voice.
("Finally, the doll that understands you.")
5. In 1990, Dragon launched the first consumer speech recognition
product,Dragon Dictate, for an incredible price of $9000. Seven years
later,the much-improved Dragon NaturallySpeaking arrived. The
applicationrecognized continuous speech, so you could speak, well,
naturally, atabout 100 words per minute. However, you had to train the
program for45 minutes, and it was still expensive at $695.
The advent of the first voice portal, VAL from BellSouth, was in
1996;VAL was a dial-in interactive voice recognition system that
wassupposed to give you information based on what you said on the
phone.VAL paved the way for all the inaccurate voice-activated menus
thatwould plague callers for the next 15 years and beyond.
6. 2000s: Speech Recognition Plateaus–Until Google Comes Along By 2001,
computer speech recognition had topped out at 80 percent accuracy,
and, near the end of the decade, the technology’s progress seemed to
be stalled. Recognition systems did well when the language universe
was limited–but they were still “guessing,” with the assistance of
statistical models, among similar-sounding words, and the known
language universe continued to grow as the Internet grew.
Did you know speech recognition and voice commands were built
into Windows Vista and Mac OS X? Manycomputer users weren’t aware
that those features existed. WindowsSpeech Recognition and OS X’s
voice commands were interesting, but notas accurate or as easy to use
as a plain old keyboard and mouse.
7. In 2010, Google added “personalized recognition” to Voice Search
on Android phones, so that thesoftware could record users’ voice searches
and produce a more accuratespeech model. The company also added
Voice Search to its Chrome browserin mid-2011. Remember how we
started with 10 to 100 words, and thengraduated to a few thousand?
Google’s English Voice Search system nowincorporates 230 billion words
from actual user queries.
And now along comes Siri. Like Google’s Voice Search, Siri relies oncloud-based
processing. It draws what it knows about you to generate
acontextual reply, and it responds to your voice input with personality.(As
my PCWorld colleague David Daw points out: “It’s not just fun butfunny.
When you ask Siri the meaning of life, it tells you ’42’ or ‘Allevidence to
date points to chocolate.’ If you tell it you want to hidea body, it helpfully
volunteers nearby dumps and metal foundries.”)
Speech recognition has gone from utility to entertainment. The
childseems all grown up.
8. THE FUTURE
Accurate, Ubiquitous Speech
The explosion of voice recognition apps indicates that
speechrecognition’s time has come, and that you can expect plenty
more appsin the future. These apps will not only let you control your PC
byvoice or convert voice to text–they’ll also support multiplelanguages,
offer assorted speaker voices for you to choose from, andintegrate into
every part of your mobile devices (that is, they’llovercome Siri’s
shortcomings).
The quality of speech recognition apps will improve, too. For
instance,Sensory’sTrulyhandsfreeVoice Control can hear and
understand you,even in noisy environments.
9. WHAT IS SPEECH RECOGNITION??
Speech recognition is the ability of a machine or program to identify
words and phrases in spoken language and convert them to a machine-readable
format.
Another definition
Speech recognition is an alternative to typing on a keyboard. Put
simply, you talk to the computer ,mobiles and your words appear on the
screen. The software has been developed to provide a fast method of
writing on a computer and can help people with a variety of disabilities.
It is useful for people with physical disabilities who often find typing
difficult, painful or impossible. Voice-recognition software can also help
those with spelling difficulties, including users with dyslexia, because
recognized words are almost always correctly spelled.
10. However, speech is more than sequences of phones that forms words
and
sentences. There are contents of speech that carries information, e.g.
the
prosody of the speech indicates grammatical structures, and the stress
of a
word signals its importance/topicality. This information is sometimes
called
the paralinguistic content of speech
11. Advantages
Speech is a very natural way to interact, and it is
not necessary to sit at a keyboard or work with a
remote control.
No training required for users!
Disadvantages
Even the best speech recognition systems
sometimes make errors. If there is noise or some
other sound in the room (e.g. the television or a
kettle boiling), the number of errors will increase.
Speech Recognition works best if the
microphone is close to the user (e.g. in a phone,
or if the user is wearing a microphone). More
distant microphones (e.g. on a table or wall) will
tend to increase the number of errors.
12. Voice recognition software
Voice-recognition software programs work by analyzing
sounds and converting them to text. They also use
knowledge of how English is usually spoken to decide
what the speaker most probably said. Once correctly set
up, the systems should recognize around 95% of what is
said if you speak clearly.
13. Voice recognition in
operating systems
Mobile Devices / Smart phones
Many cell phone handsets have basic dial-by-voice
features built in. Smartphones such as
iPhone or Blackberry also support this. A
number of 3rd party Apps have implemented
natural language speech recognition support,
including:
14.
15. Smart phones and mobile devices are in the middle
of major innovations in technology to provide
hands-free access to features and navigation, often
called voice commands, voice-enabled, voice
actions or speech recognition. This technology has
major implications for use by people who have
disabilities as assistive technology. As long as a user
has a strong, clear voice, these devices become
easier to use and give increased access to use of the
Internet, use of mobile devices and communication
accessibility.
16. Windows 7 built-in speech recognition
The Windows Speech Recognition by Microsoft is the speech recognition
system that comes built into Windows Vista andWindows 7. Windows
Vista and Windows 7 include version 8.0 of the Microsoft speech recognition
engine. Speech Recognition is available only in English, French, Spanish,
German, Japanese, Simplified Chinese, and Traditional Chinese and only in
the corresponding version of Windows. That means that you can not use the
French speech recognition engine if you use an English version of Windows.
Windows XP or 2000 only
e-Speaking – software for Windows XP that facilitates use of
the Microsoft Speech API by adding ability to create commands to perform
custom actions.
Microsoft Speech API – Speech recognition functionality included as part of
Microsoft Office and onTablet PCs running Microsoft Windows XP Tablet PC
Edition. It can also be downloaded as part of the Speech SDK 5.1 for
Windows applications, but since that is aimed at developers building speech
applications, the pure SDK form lacks any user interface, and thus is
unsuitable for end users.
Vestec Inc. - Specializing in Natural Language Understanding and Speech
Recognition solutions. ASR, NLU and TTS engines support 17 languages in
server, embedded (on low cost chip) or cloud based environments.
18. Types of speech recognition
1. Text-To-Speech:
As it sounds, Text-To-Speech (or TTS) will
manipulate a string of text into an audio clip.
It is useful for blind people to be able to use
computers but can also be used to simply
improve computer experience. There are
several programs available that perform TTS,
some of which are command-line based
(ideal for scripting) and others which provide
a handy GUI.
19. 2. Simple Voice Control/Commands:
This is the most basic form of Speech-To-Text
application. These are designed to recognize
a small number of specific, typically one-word
commands and then perform an action. This
is often used as an alternative to an
application launcher, allowing the user for
instance to say the word “firefox” and have
his OS open a new browser window.
20. 3.Full dictation/recognition:
Full dictation/recognition software allows the
user to read full sentences or paragraphs and
translates that data into text on the fly. This
could be used, for instance, to dictate an
entire letter into the window of an email
client. In some cases, these types of
applications need to be trained to your voice
and can improve in accuracy the more they
are used