The document discusses voice technology, providing an overview of key concepts and systems like Alexa, Google Home, and Cortana. It describes how voice technology works using natural language processing and machine learning to understand commands. Examples are given of intents, synonyms, slots and discoverability features that enhance the voice user experience. Stats on the growing voice industry are also presented, along with tips for creating skills and actions using platforms from Amazon and Google.
2. Hello world!
• Started in 2017
• Over 20 years experience in web
content and social media
• Social media content, strategy,
advertising, training and
campaign management
• Video editing and motion
graphics
• Chatbots and voice technology
3. What we’ll cover
• What is voice technology
• How does voice technology
work?
• Why should we use it?
• Who’s using it well?
• Best practice in voice design
• Practical demonstrations
4. What is voice technology?
• Amazon Alexa
• Google Home
• Microsoft Cortana
• Apple Siri
• Samsung Bixby
11. A little bit of history
• Alan Turing was a pioneer of
modern computing
• He devised the Turing Test in
1950
12.
13. MIT AI Laboratory
• Professor Marvin Minsky set
up the research group in the
early 1960s to explore
artificial intelligence, machine
learning and natural
language processing
14.
15. ELIZA
• One major project that
emerged from the MIT AI
Laboratory was ELIZA in 1964
• Essentially this was an early
chatbot where individuals had
a conversational with a
computer
• They were not told they were
talking to a machine
16. ELIZA meet DOCTOR
• ELIZA simulated
conversations using pattern
matching and substitution
methodology, but did not
understand the context of
words
• One of the most popular
scripts ELIZA ran was
DOCTOR, that simulated a
psychotherapist
18. Back to the future
• Bringing things back up to
date, AI, natural language
processing and technology
can now understand context
• The Turing Test has still not
been passed, but we are
getting closer
19. Google Duplex
• Google recently
demonstrated their Duplex
technology that links voice
technology to cloud services
such as Google Calendar
• A sophisticated DenseNet in
TensorFlow can process
complex interactions, and
understand context
20.
21.
22. Making progress
• Duplex is said to be effective
in 80% of situations so
doesn’t yet pass the Turing
Test
• Deep Learning expert
Andrew Ng predicts that
once speech recognition is
99% accurate voice will be
the primary way we interact
with computers
23. The final 4%
• Estimates suggest we are at
around 95% currently
• The final 4% is very
challenging!
24.
25.
26.
27.
28.
29.
30.
31.
32. Adding functionality
• Amazon Alexa and Google
Home devices can add new
functionality via Skills and
Actions
• These give the devices new
capabilities, and anyone can
build them
33. Powerfully simple
• It is fairly quick and simple to
create content for these
devices
• There are now over 40,000
Alexa Skills available with an
active developer community
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46. How does voice technology work?
• Voice technology uses
Natural Language Processing
to understand and interpret
voice commands
• This is underpinned by
machine learning techniques
47. Voice technology in action
Device listens
for invocation
User gives
wake word
Device returns
welcome message
Users gives
intent
Device returns
response
48. Intents
• An intent is used to trigger a
response
• For example a Skill / Action
could ask where you want to
go on holiday - New York,
Paris or Tokyo?
• Each of these choices would
be a separate intent and
produce different responses
49. Synonyms
• Intents are really powerful
and can include synonyms,
so if users have a different
name for something this can
be handled gracefully
• Eg Pavement / sidewalk
• AI is used with NLP so
phrases don’t have to be
exact
50. Slots
• You can also add slots to
intents that request specific
data be captured in a set
order
• This is particularly useful for
retail / ecommerce
51. Explicit and implicit invocations
• Explicit invocation
Alexa open Coffee Wizard
• Implicit invocation
Alexa recommend a coffee
for a sunny day
52. Discoverability
• It’s not always appropriate to
use explicit intents, as it can
feel less conversational and
mechanistic
• Alexa uses HypRank, a
neural network to rank Skills
using natural language
53. HypRank
• It’s not always appropriate to
use explicit intents, as it can
feel less conversational and
mechanistic
• Alexa uses HypRank, a
neural network that uses
contextual signals to rank
Skills using natural language
55. A few stats
• Voice technology will be a $601
million industry by 2019
Source: Technavio
• Over 21 million smart speakers
in the US by 2020
Source: Activate
• Google Assistant now available
to over 95% of Android devices
and majority of iOS
Source: Alpine AI
56. Creating Skills and Actions
• Amazon and Google provide
developer friendly tools for
building content
• AWS with Lambda
• Dialogflow with Firebase
• Work with a variety of
languages (Node.js, JAVA,
Python, Go, etc)
57. Using SSML
• SSML (Speech Synthesis
Markup Language) can be
used to control the
pronunciation, speed and
pitch of phrases
• For example you can make
Alexa pause, whisper or
place emphasis on specific
words