2. Bell Laboratories designed in 1952 the
"Audrey" system, which recognized digits
spoken by a single voice.
IBM demonstrated at the 1962 World's Fair its
“Shoebox" machine, which could understand
16 words spoken in English and solve
arithmetic on voice command
3.
4. U.S. Department of Defense’s DARPA
Speech Understanding Research (SUR)
program in the 1970’s was responsible for
Carnegie Melon’s Harpy system.
Harpy could understand 1011 words
Harpy was significant because it was the
first to use beam search technology
A predetermined number of best partial
solutions are kept as candidates and it
predicts how close it is to complete solution
5. In the 1980’s speech recognition vocabulary
jumped dramatically due to a new statistical
method called the Hidden Markov Model
Instead of using templates for words and
looking for sound patterns, HMM took the
probability of unknown sounds being words
This gave the potential for speech recognition
programs recognize an unlimited number of
words
6. Introduced in 1987 children could train the
doll to respond to their voice
http://www.youtube.com/watch?feature=play
er_embedded&v=UkU9SbIictc
7. In the 1990’s faster computers made it
possible for ordinary people to have speech
recognition software
In 1990 Dragon Dictate came out for $9000
Seven years later Dragon Naturally Speaking
arrived for $695
Could understand words at a natural speed
but you had to train it for 45 minutes
8. By 2001, computer speech recognition had
topped out at 80 percent accuracy and
progress seemed to stall until the end of the
decade
Google’s voice search app for the iPhone and
Apple’s Siri brought speech recognition back
to the forefront
9. Interact with the calendar.
Search contacts.
Read and write messages
(text and email).
Interact with the Maps
app and location services.
Utilize search providers
Can understand English
(US, UK, Australia),
French, German, and
Japanese
10. Mobile App that allows
the user to speak one
language into the
phone and produces a
verbal translation
iPhone and Android
Thai, Chinese, French,
German, Iraqi,
Japanese, Korean,
Spanish,
TagologEnglish
German-Spanish
11. Ford SYNC technology
◦ Music
◦ Directions
◦ Handsfree Calling
◦ http://www.youtube.com/watch?v=My
IgbcdOliw
Nuance Dragon NaturallySpeaking
◦ Audi, BMW, Fiat, Hyundai, Mercedes,
Jaguar, Porsche, Volkswagen
◦ “One-Shot Destination Entry” and full
control of the “infotainment system”
12. Microphone on Kinect
Start by saying “Xbox,”
and then saying one of
the commands on
screen
Understands English,
French, German, Italian,
Spanish, Japanese
Minimal background
noise, clarity important
13. Medical
◦ Allow doctors to talk into patient’s file to record
notes during examinations
Court of Law
◦ Record and digitize court proceedings in real time
◦ Reduce time and cost, increase efficiency
Educational
◦ Rapid text-to-speech, aiding kids with disabilities
◦ http://tinyurl.com/5wtl8wv
14. Speeds up “writing”
Improvements in spelling
Beneficial for the
handicapped
Physically or Mentally
Ability to multitask
Frees up physical limitations of
using one’s hands
16. Do you feel that the pros outweigh the cons?
Is it worth investing in this software despite
current limitations?
Does anyone have Siri? Does it actually help
you?
Would anyone prefer to use the speech-to-
text software to write papers?
Notas del editor
- Avgvocab of a 3 year old-Beam search is an optimization of best-first search that reduces its memory requirements