This document discusses Amazon's artificial intelligence services, including Amazon Polly for text-to-speech, Amazon Lex for conversational interfaces, and Amazon Rekognition for image and video analysis. It provides overviews of the capabilities and features of each service, such as Polly's 47 text-to-speech voices across 24 languages, Lex's tools for building conversational bots, and Rekognition's face detection, analysis, and recognition tools. Examples and demos of each service are presented to illustrate their functionality.
1. Hands-on with Amazon AI
Julien Simon"
Principal Technical Evangelist
julsimon@amazon.fr
@julsimon
2. Artificial Intelligence At Amazon
Thousands Of Employees Across The Company Focused on AI
Discovery &
Search
Fulfilment &
Logistics
Enhance
Existing Products
Define New
Categories Of
Products
Bring Machine
Learning To All
3.
4. Amazon AI: Three New Deep Learning Services
Polly
Rekognition
Lex
Life-like Speech
Image Analysis
Conversational
Engine
6. What is Amazon Polly
• A service that converts text into lifelike speech
• Offers 47 lifelike voices across 24 languages
• Low latency responses enable developers to build real-
time systems
• Developers can store, replay and distribute generated
speech
7. Amazon Polly: Quality
Natural sounding speech
A subjective measure of how close TTS output is to human speech.
Accurate text processing
Ability of the system to interpret common text formats such as abbreviations, numerical
sequences, homographs etc.
Today in Las Vegas, NV it's 54°F.
"We live for the music", live from the Madison Square Garden.
Highly intelligibile
A measure of how comprehensible speech is.
”Peter Piper picked a peck of pickled peppers.”
8. Amazon Polly: Language Portfolio
Americas:
• Brazilian Portuguese
• Canadian French
• English (US)
• Spanish (US)
A-PAC:
• Australian English
• Indian English
• Japanese
EMEA:
• British English
• Danish
• Dutch
• French
• German
• Icelandic
• Italian
• Norwegian
• Polish
• Portuguese
• Romanian
• Russian
• Spanish
• Swedish
• Turkish
• Welsh
• Welsh English
9. Amazon Polly features: SSML
Speech Synthesis Markup Language
is a W3C recommendation, an XML-based markup language for speech
synthesis applications
<speak>
My name is Kuklinski. It is spelled
<prosody rate='x-slow'>
<say-as interpret-as="characters">Kuklinski</say-as>
</prosody>
</speak>
10. Amazon Polly features: Lexicons
Enables developers to customize the pronunciation of words or
phrases
My daughter’s name is Kaja.
<lexeme>
<grapheme>Kaja</grapheme>
<grapheme>kaja</grapheme>
<grapheme>KAJA</grapheme>
<phoneme>"kaI.@</phoneme>
</lexeme>
11. TEXT
Market grew by > 20%.
WORDS
PHONEMES
{
{
{
{
{
ˈtwɛn.ti
pɚ.ˈsɛnt
ˈmɑɹ.kət
ˈgɹu
baɪ
ˈmoʊɹ
ˈðæn
PROSODY CONTOUR
UNIT SELECTION AND ADAPTATION
TEXT PROCESSING
PROSODY MODIFICATION
STREAMING
Market
grew
by
more
than
twenty
percent
Speech units
inventory
17. Lex Bot Structure
Utterances
Spoken or typed phrases that invoke
your intent
BookHotel
Intents
An Intent performs an action in
response to natural language user input
Slots
Slots are input data required to fulfill the
intent
Fulfillment
Fulfillment mechanism for your intent
18. Utterances
I’d like to book a hotel
I want to make my hotel reservations
I want to book a hotel in New York City
Can you help me book my hotel?
19. Slots
Destination
City
New York City, Seattle, London, …
Slot
Type
Values
Check In
Date
Valid dates
Check Out
Date
Valid dates
20. Slot Elicitation
I’d like to book a hotel
What date do you check in?
New York City
Sure what city do you want to book?
Nov 30th
Check In
11/30/2016
City
New York City
21. Fulfillment
AWS Lambda Integration
Return to Client
User input parsed to derive
intents and slot values. Output
returned to client for further
processing.
Intents and slots passed to
AWS Lambda function for
business logic
implementation.
22. “Book a Hotel”
Book
Hotel
NYC
“Book a Hotel in
NYC”
Automatic Speech
Recognition
Hotel Booking
New York City
Natural Language
Understanding
Intent/Slot
Model
Utterances
Hotel Booking
City New York City
Check In Nov 30th
Check Out Dec 2nd
“Your hotel is booked for Nov
30th”
Polly
Confirmation: “Your hotel is
booked for Nov 30th”
a
in
“Can I go ahead with
the booking?
23. Amazon Lex - Technology
Amazon Lex
Automatic Speech
Recognition (ASR)
Natural Language
Understanding (NLU)
Same technology that powers Alexa
Cognito
CloudTrail
CloudWatch
AWS Services
Action
AWS Lambda
Authentication
& Visibility
Speech
API
Language
API
Fulfillment
End-Users
Developers
Console
SDK
Intents,
Slots,
Prompts,
Utterances
Input:
Speech
or Text
Multi-Platform Clients:
Mobile, IoT, Web,
Chat
API
Response:
Speech (via Polly TTS)
or Text
26. Amazon Rekognition
Deep learning-based image recognition service
Search, verify, and organize millions of images
Object and Scene
Detection
Facial
Analysis
Face
Comparison
Facial
Recognition