Speech Recognition by Iqbal

IITT FFOORR MMAANNAAGGEERRSS
RREEPPOORRTT OONN
SSPPEEEECCHH RREECCOOGGNNIITTIIOONN SSYYSSTTEEMM
SSUUBBMMIITTTTEEDD TTOO DDRR.. RROOSSHHAANN AA.. SSHHEEIIKKHH
MMAARRCCHH,, 22000099
IQBAL S/O SHAHZAD
REGISTRATION # 9952
MBA(M) - SECTION A

Speech Recognition System IT Project
IQBAL P a g e | 1
AABBSSTTRRAACCTT
This report has been submitted to Dr. Roshan A. Sheikh of Iqra University Karachi, as a
requirement for the completion of the course , IT for Managers for MBA students. I have
prepared this brief report on Speech Recognition System after deep study and research on the
topic for two weeks. I have done by best in presenting, explaining the concepts and interpreting
the report in its proper form.
This report presents an overview of speech recognition technology, software,
development and applications. It begins with an introduction to Speech Recognition Technology
then it explains how such systems work, and the level of accuracy that can be expected.
Applications of speech recognition technology in education and beyond are then explored. A
brief comparison of the most common systems is presented, as well as notes on the main
centres of speech recognition research in the UK educational sector. The report concludes with
potential uses of speech recognition in education, probable main uses of the technology in the
future, and a selection of key web-based resources. It also includes software that are being
used for this purpose in homes and also in business environment.
A video is also presented with this report which shows an example of how we can use
speech recognition in windows vista. This video is prepared solely by me on my personal
computer. It is available in the soft copy of the project in attached CD.

IQBAL P a g e | 2
TTAABBLLEE OOFF CCOONNTTEENNTTSS
1. Introduction ………………………………………………………………………………………….......... 4
1.1 Introduction ………………………………………………………………………………………… 4
1.2 Closer Look …………………………………………………………………………………………. 4-5
2. Terms and Concepts ……………………………………………………………………………….……… 6
2.1 Utterances ………………………………………………………………………………….………. 6
2.2 Pronunciation …………………………………………………………………………….…….…. 6
2.3 Grammar …………………………………………………………………………………….……… 7
2.4 Speaker Dependence ……………………………………………………………….….……… 7
2.5 Accuracy …………………………………………………………………………………….………. 8
2.6 Training ………………………………………………………………………………….….………. 8-9
3. How Speech Recognition Works ………………………………………………………………….… 10
3.1 How Speech Recognition Works ……………………………………………………….… 10
3.2 Acceptance and Regection ……………………………………………………………….… 11-12
4. Types of Speech Recognition ………………………………………………………………………… 13
4.1 Isolated Words …………………………………………………………………………………… 13
4.2 Connected Words ………………………………………………………………………………. 13
4.3 Continuous Speech …………………………………………………………………………….. 13
4.4 Spontaneous Speech ………………………………………………………………………….. 13-14
4.5 Voice Verification / Identification ………………………………………………………. 14
5. Hardware ……………………………………………………………………………………………………... 15
5.1 Soud Cards …………………………………………………………………………………………. 15
5.2 Microphones ……………………………………………………………………………………… 15-16
5.3 Computers / Processors …………………………………………………………………….. 16
6. Uses / Applications of Speech Recognition ………………………………………………….. 17
6.1 Military ……………………………………………………………………………………………... 17
6.1.1 High Performance Fighter Aircrafts ………………………………………. 17
6.1.2 Helicopters ……………………………………………………………………………. 18
6.1.3 Training Air Traffic Controllers ……………………………………………… 18-19
6.2 People with Disabilities ………………………………………………………………………. 19
6.3 Speech Recognition in Telephony Environment ………………………………….. 20
6.3.1 Communications Management and Personal Assistants …………. 21

IQBAL P a g e | 3
6.3.2 General Information …………………………………………………….…………. 21
6.3.3 E-Commerce …………………………………………………………………………… 21
6.4 Potential Uses in Education ………………………………………………………………… 22-23
6.5 Computer and Video Games ………………………………………………………………. 23-24
6.6 Medical Transcription ………………………………………………………………………… 24-25
6.7 Mobile Devices …………………………………………………………………………………... 25-26
6.8 Voice Security Systems ……………………………………………………………………….. 26-27
7. Future Applications ………………………………………………………………………………………. 28
7.1 Home / Domestic Appliances …………………………………………………………….. 28-29
7.2 Wearable Computers ………………………………………………………………………… 29
7.3 Precision Surgery ………………………………………………………………………………. 30
8. Speech Recognition Software ………………………………………………………………………. 31
8.1 Free Software ……………………………………………………………………………………. 31-32
8.2 Commercial Software ……………………………………………………………………….. 32
8.2.1 Dragon Naturally Speeking ……………………………………………………. 32-33
8.2.2 IBM Via Voice ……………………………………………………………………….. 33
8.2.3 Microsoft Speech Recognition System …………………………………… 34
8.2.4 MacSpeech Dictate ……………………………………………………………….. 35
8.2.5 Philips Speech Engine ……………………………………………………………. 35-36
8.2.6 Other commercial software ………………………………………………….. 36
9. Conclusion …………………………………………………………………………………………………… 37

IQBAL P a g e | 4
11.. IINNTTRROODDUUCCTTIIOONN
Have you ever talked to your computer? I mean, have you really, really talked to your
computer? Where it actually recognized what you said and then did something as a result? If
you have, then you've used a technology known as speech recognition.
Designing a machine that understand human behavior, particularly the capability of
speaking naturally and responding properly to spoken language, has intrigued engineers and
scientists for centuries. Today speech technologies are commercially available for a limited but
interesting range of tasks. These technologies enable machines to respond correctly and
reliably to human voices, and provide useful and valuable services. While we are still far from
having a machine that converses with humans on any topic like another human, many
important scientific and technological advances have taken place, bringing us closer to the
machines that recognize and understand fluently spoken speech.
“Speech Recognition Simply is the process of converting spoken input to text. Speech
recognition is thus sometimes referred to as speech-to-text. Speech recognition, also referred
to as voice recognition, is software technology that lets the user control computer functions
and dictate text by voice. For example, a person can move the mouse cursor with a voice
command, such as “mouse up;” control application functions, such as opening up a file menu;
or create documents, such as letters or reports or start media player by saying “Music”.
1.2 A Closer Look
The speech recognition process is performed by a software component known as the
speech recognition engine. The primary function of the speech recognition engine is to process
spoken input and translate it into text that an application understands. The application can then
do one of two things:
 The application can interpret the result of the recognition as a command. In this case,
the application is a command and control application. An example of a command and
control application is one in which the caller says “check balance”, and the application
returns the current balance of the caller’s account.
 If an application handles the recognized text simply as text, then it is considered a
dictation application. In a dictation application, if you said “check balance,” the
application would not interpret the result, but simply return the text “check balance”.

IQBAL P a g e | 5
Speech recognition is an alternative to traditional methods of interacting with a
computer, such as textual input through a keyboard. An effective system can replace, or reduce
the reliability on, standard keyboard and mouse input. This can especially assist the following:
 People who have little keyboard skills or experience, who are slow typists, or do not
have the time or resources to develop keyboard skills.
 Dyslexic people, or others who have problems with character or word use and
manipulation in a textual form.
 People with physical disabilities that affect either their data entry, or ability to read (and
therefore check) what they have entered.
A speech recognition system consists of the following:
 A microphone, for the person to speak into.
 Speech recognition software.
 A computer to take and interpret the speech.
 A good quality soundcard for input and/or output.
 A proper and good pronunciation.
However, systems on computers meant for more individual use, such as for personal
word processing, usually require a degree of “training” before use. Here, an individual user
“trains” the system to understand words or word fragments (see section 2.6); this training is
often referred to as “enrolment”.

IQBAL P a g e | 6
22.. TTEERRMMSS AANNDD CCOONNCCEEPPTTSS
Following are a few of the basic terms and concepts that are fundamental to speech
recognition. It is important to have a good understanding of these concepts.
2.1 Utterances
When the user says something, this is known as an utterance. An utterance is any
stream of speech between two periods of silence. Utterances are sent to the speech engine to
be processed.
Silence, in speech recognition, is almost as important as what is spoken, because silence
delineates the start and end of an utterance. Here's how it works. The speech recognition
engine is "listening" for speech input. When the engine detects audio input - in other words, a
lack of silence -- the beginning of an utterance is signaled. Similarly, when the engine detects a
certain amount of silence following the audio, the end of the utterance occurs.
Utterances are sent to the speech engine to be processed. If the user doesn’t say
anything, the engine returns what is known as a silence timeout - an indication that there was
no speech detected within the expected timeframe, and the application takes an appropriate
action, such as reprompting the user for input.
An utterance can be a single word, or it can contain multiple words (a phrase or a
sentence). For example, “Word”, “Microsoft Word,” or “I’d like to run Microsoft Word” are all
examples of possible utterances. Whether these words and phrases are valid at a particular
point in a dialog is determined by which grammars are active. Note that there are small
snippets of silence between the words spoken within a phrase. If the user pauses too long
between the words of a phrase, the end of an utterance can be detected too soon, and only a
partial phrase will be processed by the engine.
2.2 Pronunciation
The speech recognition engine uses all sorts of data, statistical models, and algorithms
to convert spoken input into text. One piece of information that the speech recognition engine
uses to process a word is its pronunciation, which represents what the speech engine thinks a
word should sound like.
Words can have multiple pronunciations associated with them. For example, the word
“the” has at least two pronunciations in the U.S. English language: “thee” and “thuh”.

IQBAL P a g e | 7
2.3 Grammar
Grammars define the domain, or context, within which the recognition engine works.
The engine compares the current utterance against the words and phrases in the active
grammars. If the user says something that is not in the grammar, the speech engine will not be
able to understand it correctly. So usually speech engines have a very vast grammar.
Vocabularies (or dictionaries) are lists of words or utterances that can be recognized by
the Speech Recognition system. Generally, smaller vocabularies are easier for a computer to
recognize, while larger vocabularies are more difficult. Unlike normal dictionaries, each entry
doesn't have to be a single word. They can be as long as a sentence or two. Smaller
vocabularies can have as few as 1 or 2 recognized utterances (e.g."Wake Up"), while very large
vocabularies can have a hundred thousand or more!
2.4 Speaker Dependence
Speaker dependence describes the degree to which a speech recognition system
requires knowledge of a speaker’s individual voice characteristics to successfully process
speech. The speech recognition engine can “learn” how you speak words and phrases; it can be
trained to your voice.
Speech recognition systems that require a user to train the system to his/her voice are
known as speaker-dependent systems. If you are familiar with desktop dictation systems, most
are speaker dependent like IBM Via Voice. Because they operate on very large vocabularies,
dictation systems perform much better when the speaker has spent the time to train the
system to his/her voice.
Speech recognition systems that do not require a user to train the system are known as
speaker-independent systems. Speech recognition in the VoiceXML world must be speaker-
independent. Think of how many users (hundreds, maybe thousands) may be calling into your
web site. You cannot require that each caller train the system to his or her voice. The speech
recognition systemin a voice-enabled web application MUST successfully process the speech of
many different callers without having to understand the individual voice characteristics of each
caller.

IQBAL P a g e | 8
2.5 Accuracy
The ability of a recognizer can be examined by measuring its accuracy − or how well it
recognizes utterances. The performance of a speech recognition system is measurable. Perhaps
the most widely used measurement is accuracy. It is typically a quantitative measurement and
can be calculated in several ways. Arguably the most important measurement of accuracy is
whether the desired end result occurred. This measurement is useful in validating application
design. For example, if the user said "yes," the engine returned "yes," and the "YES" action was
executed, it is clear that the desired result was achieved. But what happens if the engine
returns text that does not exactly match the utterance? For example, what if the user said
"nope," the engine returned "no," yet the "NO" action was executed? Should that be
considered a successful dialog? The answer to that question is yes because the desired result
was acheived.
Another measurement of recognition accuracy is whether the engine recognized the
utterance exactly as spoken. This measure of recognition accuracy is expressed as a percentage
and represents the number of utterances recognized correctly out of the total number of
utterances spoken. It is a useful measurement when validating grammar design. Using the
previous example, if the engine returned "nope" when the user said "no," this would be
considered a recognition error. Based on the accuracy measurement, you may want to analyze
your grammar to determine if there is anything you can do to improve accuracy. For instance,
you might need to add "nope" as a valid word to your grammar. You may also want to check
your grammar to see if it allows words that are acoustically similar (for example,
"repeat/delete," "Austin/Boston," and "Addison/Madison"), and determine if there is any way
you can make the allowable words more distinctive to the engine.
Recognition accuracy is an important measure for all speech recognition applications. It
is tied to grammar design and to the environment of the user. Good ASR (Automatic Speech
Recognition) systems have an accuracy of 98% or more!
2.6 Training
Some speech recognizers have the ability to adapt to a speaker. When the system has
this ability, it may allow training to take place. An ASR (Automatic Speech Recognition) system
is trained by having the speaker repeat standard or common phrases and adjusting its
comparison algorithms to match that particular speaker. Training a recognizer usually improves
its accuracy.

IQBAL P a g e | 9
Training can also be used by speakers that have difficulty speaking, or pronouncing
certain words. As long as the speaker can consistently repeat an utterance, ASR systems with
training should be able to adapt.

IQBAL P a g e | 10
33.. HHOOWW SSPPEEEECCHH RREECCOOGGNNIITTIIOONN WWOORRKKSS
Now that we've discussed some of the basic terms and concepts involved in speech
recognition, let's put them together and take a look at how the speech recognition process
works.
As you can probably imagine, the speech recognition engine has a rather complex task
to handle, that of taking raw audio input and translating it to recognized text that an
application understands. As shown in the diagram below, the major components we want to
discuss are:
 Audio input - Transform of the digital audio into a better acoustic representation
 Apply a "grammar" so the speech recognizer knows what phonemes to expect. A
grammar could be anything from a context-free grammar to full-blown English.
 Acoustic Model
 Recognized text
The first thing we want to take a look at is the audio input coming into the recognition
engine. It is important to understand that this audio stream is rarely pristine. It contains not
only the speech data (what was said) but also background noise. This noise can interfere with

IQBAL P a g e | 11
the recognition process, and the speech engine must handle (and possibly even adapt to) the
environment within which the audio is spoken.
As we've discussed, it is the job of the speech recognition engine to convert spoken
input into text. To do this, it employs all sorts of data, statistics, and software algorithms. Its
first job is to process the incoming audio signal and convert it into a format best suited for
further analysis. Once the speech data is in the proper format, the engine searches for the best
match. It does this by taking into consideration the words and phrases it knows about (the
active grammars), along with its knowledge of the environment in which it is operating. The
knowledge of the environment is provided in the form of an acoustic model. Once it identifies
the most likely match for what was said, it returns what it recognized as a text string.
Most speech engines try very hard to find a match, and are usually very "forgiving." But
it is important to note that the engine is always returning it's best guess for what was said.
(This is an example of a digital audio)
3.2 Acceptance and Rejection
When the recognition engine processes an utterance, it returns a result. The result can
be either of two states: acceptance or rejection. An accepted utterance is one in which the
engine returns recognized text.
Whatever the caller says, the speech recognition engine tries very hard to match the
utterance to a word or phrase in the active grammar. Sometimes the match may be poor
because the caller said something that the application was not expecting, or the caller spoke
indistinctly. In these cases, the speech engine returns the closest match, which might be

IQBAL P a g e | 12
incorrect. Some engines also return a confidence score along with the text to indicate the
likelihood that the returned text is correct.
Not all utterances that are processed by the speech engine are accepted. Acceptance or
rejection is flagged by the engine with each processed utterance.

IQBAL P a g e | 13
44.. TTYYPPEESS OOFF SSPPEEEECCHH RREECCOOGGNNIITTIIOONN
Speech recognition systems can be separated in several different classes by describing
what types of utterances they have the ability to recognize. These classes are based on the fact
that one of the difficulties of ASR is the ability to determine when a speaker starts and finishes
an utterance. Most packages can fit into more than one class, depending on which mode
they're using.
4.1 Isolated Words
Isolated word recognizers usually require each utterance to have quiet (lack of an audio
signal) on BOTH sides of the sample window. It doesn't mean that it accepts single words, but
does require a single utterance at a time. Often, these systems have "Listen/Not−Listen" states,
where they require the speaker to wait between utterances (usually doing processing during
the pauses). Isolated Utterance might be a better name for this class.
4.2 Connected Words
Connect word systems (or more correctly 'connected utterances') are similar to Isolated
words, but allow separate utterances to be 'run−together' with a minimal pause between them.
4.3 Continuous Speech
Continuous recognition is the next step. Recognizers with continuous speech capabilities
are some of the most difficult to create because they must utilize special methods to determine
utterance boundaries. Continuous speech recognizers allow users to speak almost naturally,
while the computer determines the content. Basically, it's computer dictation.
4.4 Spontaneous Speech
There appears to be a variety of definitions for what spontaneous speech actually is. At
a basic level, it can be thought of as speech that is natural sounding and not rehearsed. An ASR

IQBAL P a g e | 14
system with spontaneous speech ability should be able to handle a variety of natural speech
features such as words being run together, "ums" and "ahs", and even slight stutters.
4.5 Voice Verification/Identification
Some ASR systems have the ability to identify specific users. This document doesn't
cover verification or security systems.

IQBAL P a g e | 15
55.. HHAARRDDWWAARREE
5.1 Sound Cards
Because speech requires a relatively low bandwidth, just about any medium−high
quality 16 bit sound card will get the job done. You must have sound enabled in your kernel,
and you must have correct drivers installed. Sound card quality often starts a heated discussion
about their impact on accuracy and noise.
Sound cards with the 'cleanest' A/D (analog to digital) conversions are recommended,
but most often the clarity of the digital sample is more dependent on the microphone quality
and even more dependent on the environmental noise. Electrical "noise" from monitors, pci
slots, hard−drives, etc. are usually nothing compared to audible noise from the computer fans,
squeaking chairs, or heavy breathing.
Some ASR software packages may require a specific sound card. It's usually a good idea
to stay away from specific hardware requirements, because it limits many of your possible
future options and decisions. You'll have to weigh the benefits and costs if you are considering
packages that require specific hardware to function properly.
5.2 Microphones
A quality microphone is key when utilizing ASR. In most cases, a desktop microphone
just won't do the job. They tend to pick up more ambient noise that gives ASR programs a hard
time.
Hand held microphones are also not the best choice as they can be cumbersome to pick
up all the time. While they do limit the amount of ambient noise, they are most useful in
applications that require changing speakers often, or when speaking to the recognizer isn't
done frequently (when wearing a headset isn't an option).
The best choice, and by far the most common is the headset style. It allows the ambient
noise to be minimized, while allowing you to have the microphone at the tip of your tongue all
the time. Headsets are available without earphones and with earphones (mono or stereo). I
recommend the stereo headphones, but it's just a matter of personal taste.
A quick note about levels: Don't forget to turn up your microphone volume. This can be
done with a program such as XMixer or OSS Mixer and care should be used to avoid feedback

IQBAL P a g e | 16
noise. If the ASR software includes auto−adjustment programs, use them instead, as they are
optimized for their particular recognition system.
5.3 Computers/Processors
ASR applications can be heavily dependent on processing speed. This is because a large
amount of digital filtering and signal processing can take place in ASR.
As with just about any cpu intensive software, the faster the better. Also, the more
memory the better. It's possible to do some SR with 100MHz and 16M RAM, but for fast
processing (large dictionaries, complex recognition schemes, or high sample rates), you should
shoot for a minimum of a 1 Ghz and 1 GB RAM. Because of the processing required, most
software packages list their minimum requirements.

IQBAL P a g e | 17
66.. UUSSEESS // AAPPPPLLIICCAATTIIOONNSS
6.1 Military
6.1.1 High-performance fighter aircraft
Substantial efforts have been devoted in the last decade to the test and evaluation of
speech recognition in fighter aircraft. Of particular note are the U.S. program in speech
recognition for the Advanced Fighter Technology Integration (AFTI)/F-16 aircraft, the program
in France on installing speech recognition systems on Mirage aircraft, and programs in the UK
dealing with a variety of aircraft platforms. In these programs, speech recognizers have been
operated successfully in fighter aircraft with applications including: setting radio frequencies,
commanding an autopilot system, setting steer-point coordinates and weapons release
parameters, and controlling flight displays. Generally, only very limited, constrained
vocabularies have been used successfully, and a major effort has been devoted to integration of
the speech recognizer with the avionics system.
Some important conclusions from the work were as follows:
1. Speech recognition has definite potential for reducing pilot workload, but this potential was
not realized consistently.
2. Achievement of very high recognition accuracy (95% or more) was the most critical factor
for making the speech recognition system useful — with lower recognition rates, pilots
would not use the system.
3. More natural vocabulary and grammar, and shorter training times would be useful, but only
if very high recognition rates could be maintained.
4. Laboratory research in robust speech recognition for military environments has produced
promising results which, if extendable to the cockpit, should improve the utility of speech
recognition in high-performance aircraft.
The Eurofighter Typhoon currently in service with the UK RAF employs a speaker-
dependent system, i.e. it requires each pilot to create a template. The system is not used for
any safety critical or weapon critical tasks, such as weapon release or lowering of the
undercarriage, but is used for a wide range of other cockpit functions. Voice commands are
confirmed by visual and/or aural feedback. The system is seen as a major design feature in the
reduction of pilot workload, and even allows the pilot to assign targets to himself with two
simple voice commands or to any of his wingmen with only five commands.

IQBAL P a g e | 18
6.1.2 Helicopters
The problems of achieving high recognition accuracy under stress and noise pertain
strongly to the helicopter environment as well as to the fighter environment. The acoustic noise
problem is actually more severe in the helicopter environment, not only because of the high
noise levels but also because the helicopter pilot generally does not wear a facemask, which
would reduce acoustic noise in the microphone. Substantial test and evaluation programs have
been carried out in the past decade in speech recognition systems applications in helicopters,
notably by the U.S. Army Avionics Research and Development Activity (AVRADA) and by the
Royal Aerospace Establishment (RAE) in the UK. Work in France has included speech
recognition in the Puma helicopter. There has also been much useful work in Canada. Results
have been encouraging, and voice applications have included: control of communication radios;
setting of navigation systems; and control of an automated target handover system.
As in fighter applications, the overriding issue for voice in helicopters is the impact on
pilot effectiveness. Encouraging results are reported for the AVRADA tests, although these
represent only a feasibility demonstration in a test environment. Much remains to be done
both in speech recognition and in overall speech recognition technology, in order to
consistently achieve performance improvements in operational settings.
6.1.3 Training Air Traffic Controllers
Training for military air traffic controllers (ATC) represents an excellent application for
speech recognition systems. Many ATC training systems currently require a person to act as a
"pseudo-pilot", engaging in a voice dialog with the trainee controller, which simulates the
dialog which the controller would have to conduct with pilots in a real ATC situation. Speech
recognition and synthesis techniques offer the potential to eliminate the need for a person to
act as pseudo-pilot, thus reducing training and support personnel. Air controller tasks are also
characterized by highly structured speech as the primary output of the controller, hence
reducing the difficulty of the speech recognition task.
The U.S. Naval Training Equipment Center has sponsored a number of developments of
prototype ATC trainers using speech recognition. Generally, the recognition accuracy falls short
of providing graceful interaction between the trainee and the system. However, the prototype
training systems have demonstrated a significant potential for voice interaction in these
systems, and in other training applications. The U.S. Navy has sponsored a large-scale effort in
ATC training systems, where a commercial speech recognition unit was integrated with a
complex training system including displays and scenario creation. Although the recognizer was
constrained in vocabulary, one of the goals of the training programs was to teach the
controllers to speak in a constrained language, using specific vocabulary specifically designed

IQBAL P a g e | 19
for the ATC task. Research in France has focused on the application of speech recognition in
ATC training systems, directed at issues both in speech recognition and in application of task-
domain grammar constraints.
Another approach to ATC simulation with speech recognition has been created by
Supremis. The Supremis system is not constrained by rigid grammars imposed by the underlying
limitations of other recognition strategies.
6.2 People with Disabilities
It has been suggested that one of the most promising areas for the application of speech
recognition is in helping handicapped people (Leggett and Williams, 1984). Speech recognition
technology helps people with disabilities interact with computers more easily. People with
motor limitations, who cannot use a standard keyboard and mouse, can use their voices to
navigate the computer and create documents. For example, Braille input/output devices touch
screen systems and trackballs have all been used successfully in the classrooms. The technology
is also useful to people with learning disabilities who experience difficulty with spelling and
writing. Some individuals with speech impairments may use speech recognition as a therapeutic
tool to improve vocal quality. People with overuse or repetitive stress injuries also benefit from
using speech recognition to operate their computers hands free. Speech recognition technology
has great potential to provide people with disabilities greater access to computers and a world
of opportunities.
Mr. Jones is a reporter who must submit his articles in HTML for publishing in an on-line
journal. Over his twenty-year career, he has developed repetitive stress injury (RSI) in his hands
and arms, and it has become painful for him to type. He uses a combination of speech
recognition and an alternative keyboard to prepare his articles, but he doesn't use a mouse. It
took him several months to become sufficiently accustomed to using speech recognition to be
comfortable working for many hours at a time. There are some things he has not worked out
yet, such as a sound card conflict that arises whenever he tries to use speech recognition on
Web sites that have streaming audio. (Source : http://www.w3.org/WAI/EO/Drafts/PWD-Use-
Web/).

IQBAL P a g e | 20
6.3 Speech Recognition in Telephony Environment
William Meisel, who holds a Ph.D. in Electrical Engineering, ran a speech recognition
company for ten years. He is president of the speech industry consulting firm TMA Associates
and publisher and editor of Speech Recognition Update newsletter. According to him
Telephone speech recognition creates a Voice Web. Sites that support speech
recognition constitute the Voice Web. Most sites today have individual phone numbers
(typically toll-free). Such sites are often called "voice portals". There are, however, likely to be
more popular voice portals than Web portals; every wireless and landline telephone service
provider will eventually be a voice portal, and there will be independent, corporate, and
specialized voice portals. VoiceXML, a new standard, created by the VoiceXML Form
(www.voicexml.org) and the W3C Voice Browser working group (www.w3.org/voice), is a way
that companies can provide a voice-interactive application on a Web server without needing
speech engines or telephone line interface hardware. The VoiceXML code is downloaded to the
voice portal and executed by a VoiceXML interpreter, much as a Web browser on a PC
interprets HTML.
(Source : William Meisel’s Guide Book on The Voice Web)
The Voice Web is not just an extension of the Internet, although information on existing
Web sites can be used to support interactive voice services. It can run applications totally unlike
visual Web applications and totally independent of the HTML-based Web. Some of the
applications that the Voice Web is supporting are listed here.

IQBAL P a g e | 21
6.3.1 Communications management and personal assistants
Communications management usually includes dialing by name using a personal
directory. Personal-assistant functionality includes call screening, taking and accessing voice
messages, and one-number access to the subscriber (scanning several subscriber numbers
based on subscriber instructions). Other personalized features include maintaining a schedule
and delivering reminders. Unified messaging includes features such as reviewing email or fax
headers by phone using text-to-speech. Since subscribers will make calls through their personal
assistant, the voice portal can potentially get additional revenues from providing bundled local
and/or long-distance service.
Enterprise applications, such as voice-activated auto attendants that direct calls by
name, can be a corporate voice portal. Corporate voice portals can also provide such services as
reservations for a conference, location of a local store outlet, or a connection to customer
service.
6.3.2 General information
General information includes weather, sports scores, horoscopes, general news,
financial news, stock quotes, traffic conditions, and driving directions. Such information is
intended to make a voice-enabled service part of a subscriber’s daily habit. Information can be
customized, using, for example, the user’s personal stock portfolio or the user’s current
location. As voice portals evolve, the caller will be able to "voicemark" specialized voice-
equipped Web sites.
6.3.3 E-commerce
V-commerce supports a variety of transactions that can result in product or service
sales. These include transactions similar to ordering from a Web sites or telephone catalog
service. They also include finding a business by saying its trade name or its category.
Entertainment is part of e-commerce, and it will be part of the Voice Web. For example,
the caller can use speech recognition to choose audio channels to listen to.
(Source : Receiver Magazine, Vodafone - 2001)

IQBAL P a g e | 22
6.4 Potential uses in education
Contact with a number of practitioners and researchers in the field of speech
recognition led to some interesting speculation regarding the feasible use of this technology in
education.
No. Applications Problems and Likelihood
1 Teaching students of foreign languages to
pronounce vocabulary correctly.
Unlikely in near future on a large scale, due
to the software training currently involved.
2 Teaching overseas students to pronounce
English correctly.
3 Making notes of observations during
scientific experiments, so the
scientist/research can focus on
the observation without needing
to view the monitor or keyboard.
Similar to how a coroner verbally
records notes during an autopsy.
Likely, and is probably already used in
individual circumstances. Noise from the
experiment, the researcher need to rapidly
record some observations, and a
vocabulary that understands the scientific
terms present some issues.
4 Enabling students who are physically
handicapped and unable to use a
keyboard to enter text verbally.
Used already, though becoming
increasingly widespread.
5 Enabling people with textual interpretive
problems e.g. Dyslexia, to enter text
verbally.
Used already, though becoming
increasingly widespread.
6 Restrictive access on a high security
computer, where a keyboard or other
input device may be used by hackers.
Interest from a number of people, though a
lack of “proof of concept” research hinders
further development. Unlikely to be
available in the near future.
7 Narrative-oriented research, where
transcripts are automatically generated.
This would remove the time to manually
generate the transcript, and human error.
Likely in the near future. Current speech
recognition technology places unacceptable
c ompromise between accuracy and
inhibiting the interviewee. Quicker and
easier training systems for the interviewee
will help, as will increases in portable
computing processing power.
8 Capturing the speech of a lecturer or
tutor.
Unlikely on a large scale, due to vocabulary,
training and interpretive issues. In addition,
filming of the lecture results in audio and
visual content combined which may be
more useful.
9 Using a speech recognition system in an
examination.
Very likely. Technically, this is possible, and
within current UK examination guidelines

IQBAL P a g e | 23
this appears to be acceptable
(Source : http://www.becta.org.uk/technology/speechrecog/docs/finalreport.pdf - the
final report (June 2000) from a experimental project to see how effective speech recognition
technologies could be to people with special educational needs.)
6.5 Computer and Video Games
Speech input has been used in a limited number of computer and video games, on a
variety of PC and console-based platforms, over the past decade. For example, the game
Seaman24 involved growing and controlling strange half-man half fish characters in a virtual
aquarium. A microphone, sold with the game, allowed the player to issue one of a pre-
determined list of command words and questions to the fish. The accuracy of interpretation, in
use, seemed variable; during gaming sessions colleagues with strong accents had to speak in an
exaggerated and slower manner in order for the game to understand their commands.
Microphone-based games are available for two of the three main video game consoles
(Playstation 2 and Xbox). However, these games primarily use speech in an online player to
player manner, rather than spoken words being interpreted electronically. For example, a
MotoGP for the Xbox allows online players to ride against each other in a motorbike race
simulation, and speak (via microphone headset) to the nearest players (bikers) in the race.
There is currently interest, but less development, of video games that interpret speech.
The Microsoft Xbox, Nintendo GameCube, and Sony PlayStation 2 consoles all offer
games with speech input/output. Currently, most games are war-action-shooter games. In
these, speech recognition provides high-level commands to virtual teammates who respond
with a variety of recorded quips. Lets take examples of two games i.e. graphically-realistic,
tactical squad-based, shooter games Ghost Recon 2 and SOCOM II: U.S. Navy Seals. Both these
games are available in Sony Playstation 2. The speech recognition systems for these games are
provided by Fonix and ScanSoft, respectively.
In Ghost Recon 2, the user is the leader of a team of
three secret Special Forces soldiers who must capture various
military targets in North Korea in the year 2007. The team is
critical to the user’s survival from enemy gunfire. Saying “Move
out!” directs the team to move ahead of you as you make your
way through the virtual, hilly terrain toward various objectives.
The speech commands (“Move out,” “Covering fire,”
“Grenade,” “Take point,” “Hold position,” “Regroup”) are

IQBAL P a g e | 24
easily-recalled, high-level instructions to the team members. The commands that can be
obeyed depend on the immediate situation. If you say, “Take point,” and the hostile fire is too
great the designated team member may say, “No can do, Captain.” Occasionally, the retort is
somewhat less respectful.
In SOCOM II: U.S. Navy Seals, a team of four men
including the first person leader attempts to stop an arms
smuggling group in rural Albania. The team has to avoid the
enemy, meet an informant, blow up weapons caches, and make
their escape. The speech commands in this game are spoken in
three parts, using a simple grammar. The commands may be
addressed to “Fireteam” (all other team members) or
individuals like, “Able” (your partner). Then there are
approximately 12 action commands including “Fire at will,”
“Deploy,” “Move to,” “Get down,” and others. The third part of
the command includes nine letters of the military alphabet
(“Charlie,” “Delta,” etc.) indicating where the “Move to” and
similar commands are intended. They represent the specific
locations of game objectives.
(Source: Article from The Speech Technology Magazine Apr 2005,
http://www.speechtechmag.com/Articles/ReadArticle.aspx?ArticleID=29432)
6.6 Medical Transcription
Medical transcription, also known as MT, is an allied
health profession, which deals in the process of transcription,
or converting voice-recorded reports as dictated by physicians
and/or other healthcare professionals, into text format.
Every day, doctors scour the market looking for new
ways to help simplify their office routines and reduce their
costs. Medical Transcription software saves their time and
money. The speech recognition product produces accurate
and fully formatted transcriptions from clinicians' dictations.
The goal is to minimize editing time by MTs and, as a result,
increase MT productivity. It interprets and formats a

IQBAL P a g e | 25
document, so that it is close to a final product.
Benefits:
 Organized and formatted document sections
 Punctuation inserted even if not spoken
 Numbers interpreted and presented appropriately. This includes dosages, measurements,
lists, etc.
 Formatting based on each organization’s preferences and specifications
 Inserts speech-activated ‘normals’
 No explicit training required
 Continually learns and improved from MT edits
Examples:
When a clinician dictates: "Exam…vital signs…two twelve…eighty eight and
regular…thirteen…BP one forty one hundred and one thirty five ninety five"
Speech Recognition software can output: PHYSICAL EXAMINATION: VITAL SIGNS: Weight
212, pulse 88 and regular, respiration 13, blood pressure is 140/100, 135/95.
When a provider says: "The following problems were reviewed…hypertension …please
enter my hypertension template…use my normal cad"
Speech Recognition software can output: PROBLEMS: The following problems were
reviewed:
 Hypertension: No headache, visual disturbance, chest pain, palpitation, focal neurologic
complaint, dyspnea, edema, claudication, or complaint from current medication.
 Coronary artery disease: No chest pain, dyspnea, PND, orthopnea, palpitation, weakness,
syncope, or obvious problems related to medications.
6.7 Mobile Devices
The growth of cellular telephony combined with recent advances in speech recognition
technology results in sizeable potential opportunities for mobile speech recognition
applications. Speech recognition in mobile phone have already been introduced but there is a

IQBAL P a g e | 26
lot of work to be done in this particular field. First time when speech recognition was
introduced in mobiles, it was used to call a contact by saying its name. In that case first the user
needed to record voice clips of the names of each contact and associate them with their
respective contacts. So when the user said the name the mobile compared it with already
recorded sounds for each contact and then called the person whose name was spoken.
New smart mobile phones are introducing every month. These mobiles don’t require
recording the names first. They have their own speech system, which can read the names
written in English. So when the user says a name, it uses its speech system to compare the user
spoken sound with saved contacts and then calls the contact whose name is being spoken.
Nuance Communications has launched Nuance Mobile Speech Platform that will
improve the text-to-speech and speech recognition abilities of mobile devices. Through this
platform, end users will be able to perform searches, dictate emails and SMS messages, and
have any incoming emails and messages read out to them, which will improve the usability and
efficiency of mobile devices.
The Nuance Mobile Speech Platform can be used to speech-enable a mobile application,
and specifically offers pre-built components for the following:
 Nuance Local Search - search business names and categories, residential listings, weather,
dining and entertainment, movies, etc.
 Nuance Mobile Navigation - voice destination entry (including street addresses, businesses
and points of interest) and spoken turn-by-turn directions.
 Nuance Content Search - search catalogs with items in music, video, games and more.
 Nuance Mobile Web Search - search the Web from a mobile device.
 Nuance Mobile Communications - compose email, SMS, and IM messages by speaking.
(Source: Nuance Communications http://www.nuance.com)
6.8 Voice Security Systems
Voice Security Systems technology uses a person's voice print to uniquely identify
individuals using biometric speaker verification technology. Speech is processed through a non-
contact method; you do not need to see or to touch the person to be able to recognize them.
The popularity of speaker verification is swiftly growing because speech is easy to obtain
without the addition of dedicated hardware. Improved, robust speech recognition algorithms
and PC hardware have also brought this one-time futuristic idea into the present.

IQBAL P a g e | 27
At Voice Security Systems, a decade of research
and development has lead them to believe that the
explosive speech processing market is here to stay.
Their Voice Protect® method of biometric voice
authentication is ideally suited for low memory,
database independent applications using smart cards
or other physical devices such as cell phones. Due to
the value of biometric security for use in fraud
prevention, and the added convenience of knowing a
person is who they claim to be, they believe speaker
verification will be more widely accepted by the
consumer market before speech recognition.
Voice Security Systems can deliver biometric security technology to the market at a
lower cost than anyone else in the industry, with no reoccurring maintenance costs such as
database management or complicated user training. Once the Voice Protect® technology is
built into a product it will continue to function independently for the life of the product.
Voice Security Systems can be applied in our daily lives, for example it can be
successfully applied in Garage Door openers, Computers and laptops, Automobiles, PDA and
handheld devices, Smartcard applications, Cell phones, Door access and ATM Machines.
(Source: Voice Security Systems Inc. http://www.voice-security.com/)

IQBAL P a g e | 28
77.. FFUUTTUURREE AAPPPPLLIICCAATTIIOONNSS
There are a number of scenarios where speech recognition is either being delivered,
developed for, researched or seriously discussed. As with many contemporary technologies,
such as the Internet, online payment systems and mobile phone functionality, development is
at least partially driven.
IBM intends to have better-than-human Automatic Speech Recognition by 2010. Bill
Gates predicted that by 2011 the quality of ASR will catch up to humans. Justin Rattner from
Intel said in 2005 that by 2015, computers will have "strong capabilities" in speech-to-text.
At some point in the future, speech recognition may become speech understanding. The
statistical models that allow computers to decide what a person just said may someday allow
them to grasp the meaning behind the words. Although it is a huge leap in terms of
computational power and software sophistication, some researchers argue that speech
recognition development offers the most direct line from the computers of today to true
artificial intelligence. We can talk to our computers today. In 25 years, they may very well talk
back.
7.1 Home Appliances
Designers have developed very convenient user interfaces to consumer appliances.
What could be easier than pressing buttons on a remote control to select television channels or
flipping a switch to turn on a light? These types of direct manipulation user interfaces will
continue to be widely used. However, because current buttons and switches are not intelligent,
you cannot ask your remote control when "Star Trek" is on, and you must walk to the light
switch before turning the light on. Speech enables consumer appliances to act intelligently,
responding to speech commands and answering verbal questions. For example, speech
enhances consumer appliances by enabling the user to say instructions such as:
1. To the VCR: "Record tonight's 'Star Trek'."
2. To the coffeepot: "Start at 6:30 a.m. tomorrow."
3. To the light switch: "Turn on the lights one half-hour before sunset."
There is, inevitable, interest in the use of speech recognition in domestic appliances
such as ovens, refrigerators, dishwashers and washing machines. One school of thought is that,

IQBAL P a g e | 29
like the use of speech recognition in cars, this can reduce the number of parts and therefore the
cost of production of the machine. However, removal of the normal buttons and controls would
present problems for people who, for physical or learning reasons, cannot use speech
recognition systems.
7.2 Wearable Computers
Perhaps the most futuristic application is in the use and functionality of wearable
computers i.e. unobtrusive devices that you can wear like a watch, or are even embedded in
your clothes. These would allow people to go about their everyday lives, but still store
information (thoughts, notes, to-do lists) verbally, or communicate via email, phone or
videophone, through wearable devices. Crucially, this would be done without having to interact
with the device, or even remember that it is there; the user would just speak, the device would
know what to do with the speech, and would carry out the appropriate task.
The rapid miniaturization of computing
devices, the rapid rise in processing power, and
advances in mobile wireless technologies, are
making these devices more feasible. There are
still significant problems, such as background
noise and the idiosyncrasies of an individual’s
language, to overcome. However, it is
speculated that reliable versions of such devices
will become commercially available during this
decade.
The conventional human-computer interface such as GUI, which assumes a keyboard,
mouse, and bit-map display, is insufficient for the Wearable environment, especially for the
Wearables. Although handwritten character recognizers and keyboards that can be used with
one hand have been developed as input devices for computers, speech recognition has recently
received more interest. The main reason for this is that it permits both hands and eyes to be
kept free and therefore is less restricted in its use and can achieve quicker communication. In
addition, speech can convey not only linguistic information but also the emotion and identity of
speakers. IBM’s wearable PC described above has a microphone in its controller and can
recognize speech as soon as the Via Voice has been installed.

IQBAL P a g e | 30
7.3 Precision Surgery
Developments in keyhole and micro surgery have clearly shown that an approach of as
little invasive or non-essential surgery as possible increases success rates and patient recovery
times. There is occasional speculation in various medical for a regarding the use of speech
recognition in precision surgery, where a procedure is partially or totally carried out by
automated means.
For example, in removing a tumour or blockage without damaging surrounding tissue, a
command could be given to make an incision of a precise and small length e.g. 2 millimetres.
However, the legal implications of such technology are a formidable barrier to significant
developments in this area. If speech was incorrectly interpreted and e.g. a limb was accidentally
sliced off, who would be liable – the surgeon, the surgery system developers, or the speech
recognition software developers?

IQBAL P a g e | 31
88.. SSPPEEEECCHH RREECCOOGGNNIITTIIOONN SSOOFFTTWWAARREE
Modern speech recognition software enables a single computer user to speak text
and/or commands to the computer, largely, but not entirely, bypassing the use of the keyboard
and mouse interface.
The idea has been portrayed in science fiction for many decades, quite frequently
depicting computers that do not even have keyboards or mice. Such computers are also
typically depicted as being able to keep up no matter how fast a person speaks, and without
regard to who the speaker is, the language spoken, or even how many speakers there are. In
other words, they're depicting a computer that hears in like manner as a multilingual person.
Attempts to develop usable speech recognition software began in the mid-1900s, and
proved to be far more daunting than anyone had imagined. It also turned out to require so
much computing power that only the most modern computers are now able to perform the
functions required in real time (i.e., as fast as you can speak).
The first commercially practical products became available around 1990, (e.g. the Voice
Navigator, a standalone computer dedicated 100% to speech recognition) and used up all the
available computing power of the machine, which would send its output to a second computer.
They weren't particularly accurate and could only understand a single person at a time,
requiring retraining, not of the operator but of the machine itself, to work for another person.
Despite these limitations, they could type so rapidly that even after taking time to make
corrections, a person with disabilities could easily accomplish more work with the machine than
without it. For persons with physical disabilities, the ability to simply talk to your computer
could be a priceless asset. Consider for instance, an author with Parkinson's disease who can
barely control his hands, yet is conveniently able to create an article.
8.1 Free Softwares
There are many software that are used for speech recognition. Many of them are free of
cost. Some free software are:
 XVoice
(http://www.compapp.dcu.ie/~tdoris/Xvoice/
http://www.zachary.com/creemer/xvoice.html)

IQBAL P a g e | 32
 CVoiceControl/kVoiceControl
(http://www.kiecza.de/daniel/linux/index.html)
 Ears
(ftp://svr−ftp.eng.cam.ac.uk/comp.speech/recognition/)
 NICO ANN Toolkit
(http://www.speech.kth.se/NICO/index.html)
 Myers' Hidden Markov Model Software
(http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/myers.hmm.html)
 Jialong He's Speech Recognition Research Tool
(http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/jialong.html)
 Open Mind Speech
(http://freespeech.sourceforge.net)
 GVoice
(http://www.cse.ogi.edu/~omega/gnome/gvoice/)
 ISIP
(http://www.isip.msstate.edu/project/speech/)
 CMU Sphinx
(http://www.speech.cs.cmu.edu/sphinx/Sphinx.html)
8.2 Commercial Software
8.2.1 Dragon Naturally Speaking
Dragon NaturallySpeaking is almost universally regarded in reviews as the best voice-
recognition software, with the potential for 99.8 percent accuracy (reviews say 95 percent is
more realistic). NaturallySpeaking integrates easily with Microsoft productivity software. The
Preferred version can also be used with a compatible digital-audio recorder, MP3
player/recorder or PDA for recording voice notes or lectures on the go; NaturallySpeaking will
later transcribe your recordings. Reviews say Dragon NaturallySpeaking is the most

IQBAL P a g e | 33
sophisticated product on the market, but that if you have
Windows Vista or plan to buy a new computer with it, you
should try the voice-recognition capabilities included with
Vista, which by most accounts are nearly as robust as Dragon
NaturallySpeaking.
(Source: http://www.nuance.com/naturallyspeaking/)
8.2.2 IBM Via Voice
IBM ViaVoice is a range of language-specific continuous speech synthesis software
products offered by IBM. The current version is designed primarily for use in embedded
devices.
Individual language editions may have different features, specifications, technical
support, and microphone support. Some of the products or editions available are:
 Advanced Edition,
 Standard Edition,
 Personal Edition,
 ViaVoice for Mac OS X Edition,
 Pro USB Edition,
 Simply Dictation for Mac.
Prior to the development of ViaVoice, IBM developed
a product named VoiceType. In 1997, ViaVoice was first
introduced to the general public. Two years later, in 1999,
IBM released a free of charge version of ViaVoice.
I didn't find a single review that recommends ViaVoice
over Dragon NaturallySpeaking, but ViaVoice is the only
program that will run on older or less powerful computers.
Dragon NaturallySpeaking is extremely demanding (you need
at the very least 512 MB RAM, a recent processor and 1 GB
free hard-drive space). However, reviews say ViaVoice isn't as
accurate as Dragon NaturallySpeaking, and mistakes aren't as easy to correct. ViaVoice hasn't
been updated in years.
(Source: http://www.ibm.com/software/speech/)

IQBAL P a g e | 34
8.2.3 Microsoft Speech Recognition System
In 1993, Microsoft hired Xuedong Huang from CMU to lead its speech efforts. Microsoft
has been involved in research on speech recognition and text to speech.[2] The company's
research eventually led to the development of the Speech API (SAPI).
Speech recognition technology has been used in some of Microsoft's products, including
Microsoft Dictation (a research prototype that ran on Windows 9x). It was also included in
Office XP, Office 2003[3], Microsoft Plus! for Windows XP, Windows XP Tablet PC Edition, and
Windows Mobile (as Microsoft Voice Command)[4]. However, prior to Windows Vista, speech
recognition was not mainstream. In response, Windows Speech Recognition was bundled with
Windows Vista and released in 2006, making the operating system the first mainstream version
of Microsoft Windows to offer fully-integrated support for speech recognition.
Windows Speech Recognition in
Windows Vista empowers users to
interact with their computers by voice. It
was designed for people who want to
significantly limit their use of the mouse and keyboard while maintaining or increasing their
overall productivity. You can dictate documents and emails in mainstream applications, use
voice commands to start and switch between applications, control the operating system, and
even fill out forms on the Web.
Windows Speech Recognition is a new feature in Windows Vista, built using the latest
Microsoft speech technologies. Windows Vista Speech Recognition provides excellent
recognition accuracy that improves with each use as it adapts to your speaking style and
vocabulary. Speech Recognition is available in English (U.S.), English (U.K.), German (Germany),
French (France), Spanish (Spain), Japanese, Chinese (Traditional), and Chinese (Simplified).
Early reviews say it rivals Dragon NaturallySpeaking 9 for accuracy. If you buy a new
computer, you'll get Vista by default, so you can try out its voice-recognition features before
buying other software. You can also upgrade an older computer to Vista, but the system
requirements are demanding. Reviewers say Dragon NaturallySpeaking has a slight edge, but
cite no compelling reason to buy it if you have or plan to buy Vista.
(Source: http://www.microsoft.com/speech/speech2007/default.mspx)

IQBAL P a g e | 35
8.2.4 MacSpeech Dictate
MacSpeech is a company that develops speech
recognition software for Apple Macintosh computers.
In 2008, its previous flagship product, iListen, was
replaced by Dictate, which is now built around
Nuance's licensed Dragon NaturallySpeaking engine.
MacSpeech was established in 1996 by current CEO
Andrew Taylor. MacSpeech is currently the only
company that develops voice dictation systems for the
Macintosh. Its full product line is devoted to speech
recognition and dictation.
Reviews say Dictate, introduced in early 2008, is based on the Dragon NaturallySpeaking
engine. In tests, it is as accurate as Dragon NaturallySpeaking, and much better than the
previous MacSpeech program, iListen. Dictate comes with a microphone headset. No products
directly compete with Dictate.
(Source: http://www.macspeech.com/dictate/)
8.2.5 Philips Speech Magic
SpeechMagic is an industrial grade platform for
capturing information in a digital format. It has been
developed by Philips Speech Recognition Systems of Vienna,
Austria. SpeechMagic features large-vocabulary speech
recognition as well as a number of services aimed at supporting “accurate, convenient and
efficient” information capturing in healthcare IT applications. The technology is mainly used in
the healthcare sector, however, applications are also available for the legal market as well as
for tax consultants.
SpeechMagic supports 25 recognition languages and provides more than 150 ConTexts
(industry-specific vocabularies). More than 8,000 healthcare sites in 45 nations use
SpeechMagic to capture information and create professional documents. The world’s largest
site that is powered by SpeechMagic is in the United States with more than 60,000 authors,
more than 3,000 editors and a throughput of 400 million lines per year.

IQBAL P a g e | 36
Growth consulting company Frost & Sullivan has recognized SpeechMagic in 2005 with
the Market Leadership Award in European Healthcare. In 2007, Frost & Sullivan presented
Philips Speech Recognition Systems with the Global Excellence Award in Speech Recognition.
(Source: http://www.myspeech.com/)
8.2.6 Other Commercial Software
There are many other commercial software used for speech recognition. Some of them
are:
 HTK
(http://htk.eng.cam.ac.uk/)
 CSLU Toolkit
(http://cslu.cse.ogi.edu/toolkit/)
 Simmortel Voice
(http://www.simmortel.com)
 Quack.com by AOL
(http://www.quack.com)
 SpeechWorks
(http://www.speechworks.com)
 Bable Technologies
(http://www.babeltech.com)
 Vocalis Speechware
(http://www.vocalisspeechware.com)
 Entropic
(http://htk.eng.cam.ac.uk)

IQBAL P a g e | 37
99.. CCOONNCCLLUUSSIIOONN
Speech recognition will revolutionize the way people conduct business over the Web
and will, ultimately, differentiate world-class e-businesses. VoiceXML ties speech recognition
and telephony together and provides the technology with which businesses can develop and
deploy voice-enabled Web solutions TODAY! These solutions can greatly expand the
accessibility of Web-based self-service transactions to customers who would otherwise not
have access, and, at the same time, leverage a business’ existing Web investments. Speech
recognition and VoiceXML clearly represent the next wave of the Web. In near future people
will be using their home and business computers by speech not by keyboard or mouse. Home
automation will be completely based on speech recognition system.

Speech Recognition by Iqbal

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (7)

Similar a Speech Recognition by Iqbal

Similar a Speech Recognition by Iqbal (20)

Más de Iqbal

Más de Iqbal (10)

Último

Último (20)

Speech Recognition by Iqbal