Slides from my talk at the AWSLoft in London
https://awsloft.london/session/2017/a5da881d-67f8-4af5-8ace-4f8adcf579db
"In this talk, we will show the audience how to build and deploy serverless AI-powered applications on AWS. In particular, two demos will be analysed in depths. The first demo is a simple mobile web app that allows a user to upload or take a picture with their mobile phone. The result is then spoken out loud using Amazon Polly. This demo is deployed using the AWS CLI (command line interface) with scripting techniques. The second demo is a podcast generator which connects to any RSS feed and converts that feed into a podcast. The result can then be played on iTunes or any podcast player. This demo uses AWS Lambda and Amazon Polly and is deployed using the Serverless framework. We will go through the architecture, the APIs, the code itself and the deployment of those two applications using Amazon Rekognition, Amazon Polly, AWS Lambda, Amazon S3, Amazon Route53, Elasticsearch, and more."
2. • Technical Evangelist, Developer Advocate,
… Software Engineer
• Own bed in Finland
• Previously:
• Solutions Architect @AWS
• Lead Cloud Architect @Dreambroker
• Director of Engineering, Software Engineer, DevOps, Manager, ... @Hdm
• Researcher @Nokia Research Center
• and a bunch of other stuff.
• Climber, like Ginger shots.
3. What to Expect from the Session
1. A little bit history & theory never kills
2. AI in AWS
3. Building AI-powered apps x3
4.
5. No servers to provision
or manage
Scales with usage
Never pay for idle Availability and fault
tolerance built in
Serverless means…
16. Text In, Life-like Speech Out
Amazon Polly
“Today in Seattle, WA
it’s 11°F”
“Today in Seattle Washington
it’s 11 degrees Fahrenheit”
47 lifelike voices spread across 24 languages
17. “Today in Seattle, WA, it’s 11°F”
‘"We live for the music" live from the Madison Square Garden.’
1. Automatic, Accurate Text Processing
A Focus On Voice Quality & Pronunciation
18. 2. Intelligible and Easy to Understand
1. Automatic, Accurate Text Processing
A Focus On Voice Quality & Pronunciation
19. 2. Intelligible and Easy to Understand
3. Add Semantic Meaning to Text
“Richard’s number is 2122341237“
“Richard’s number is 2122341237“
Telephone Number
A Focus On Voice Quality & Pronunciation
1. Automatic, Accurate Text Processing
20. 2. Intelligible and Easy to Understand
3. Add Semantic Meaning to Text
4. Customized Pronunciation
“My daughter’s name is Kaja.”
“My daughter’s name is Kaja.”
A Focus On Voice Quality & Pronunciation
1. Automatic, Accurate Text Processing
21. A Focus On Voice Quality & Pronunciation
https://www.w3.org/TR/speech-synthesis/<speak>
The spelling of my last name is
<prosody rate='x-slow'>
<say-as interpret-as="characters">Adrian</say-as>
</prosody></speak>
22. Duolingo voices its language learning service Using Polly
Duolingo is a free language learning service where
users help translate the web and rate translations.
With Amazon Polly our users
benefit from the most lifelike
Text-to-Speech voices
available on the market.
Severin Hacker
CTO, Duolingo
”
“
• Spoken language crucial for
language learning
• Accurate pronunciation matters
• Faster iteration thanks to TTS
• As good as natural human speech
23.
24. <API>
Amazon Polly
</API>
aws polly synthesize-speech
--text "It was nice to live such a wonderful live show"
--output-format mp3
--voice-id Joanna
--text-type text johanna.mp3
28. Amazon Rekognition
Deep learning-based image recognition service
Search, verify, and organize millions of images
Object and Scene
Detection
Facial
Analysis
Face
Comparison
Facial
Recognition
36. Amazon Rekognition
Customers
• Digital Asset Management
• Media and Entertainment
• Travel and Hospitality
• Influencer Marketing
• Systems Integration
• Digital Advertising
• Consumer Storage
• Law Enforcement
• Public Safety
• eCommerce
• Education
40. Cognito support for Identity
Username
Password
Sign In
SAML
Identity Provider
Amazon Cognito2. Get AWS credentials
API Gateway
DynamoDB S3
Lambda
Cognito User Pools
Rekognition
Polly
45. S e l e c t i m a g e
c o n v e rt e r
RA W t o J P E G RA W t o P NGRA W t o TI FF
L o a d i n Da t a b a se
Start
End
Un s u p p or te d i m a g e
t yp eParallel Steps
AWS Step Functions
46. P r o c e s s p h o t o
Re s i ze i m a g e
Start
End
E xt r a c t m e t a d a ta Fa c i a l r e c o g n it i on
L o a d i n Da t a b a se
Branching Steps
AWS Step Functions
you have a lot to cover and you are happy to field questions after the talk.
A trillion is 1,000,000,000,000, also know as 10 to the 12th power, or one million million. It’s such a large number it’s hard to get your head around it, so sometimes trillion just means “wow, a lot.”
AWS is an AI enabler .. For all the reason mentioned here –
When AWS was established in 2006, one core premise was to allow anyone, even a student in his door-room, to get access to the same technologies that Fortune 500 companies have – we called it democratization of technology.
And the result of this is that we see a ton of machine learning up on AWS today, literally from A through to Z. So everything from Ancestry, who are using machine learning and deep learning to be able to process genomic information and build out family trees, all the way through to Zillow, who use machine learning to do house-price estimation up on the website.
Amazon Web Services provides a rich ecosystem to help you build smarter applications. In this context, it is worth highlighting the higher level AI services based on deep learning algorithms, like Amazon Rekognition, an image recognition service, Amazon Polly, a text to speech synthesizer, and Amazon Lex, a voice and text chatbot service.
We also provide the infrastructure including GPU EC2 instances for fast parallel processing which you can use in combination with any of the popular deep learning libraries like Apache mxnet, Tensorflow, Theano, etc, all of which are available on the AWS deep learning AMI.
For your general machine learning purposes, you can also use EC2, Amazon Elastic MapReduce and Spark with SparkML to run any machine learning algorithm. Another popular library is the python scikit-learn, which you can deploy on AWS Lambda or containers, or EC2.
So what I am trying to convey is that there is a lot of choice, which basically boils down to picking the right tool for the right job, where you can make trade-offs between ‘do your own’ with all the flexibility, or picking a managed solution which allows you to get results fast without having to do the heavy lifting.
The basics are pretty simple, but the service has deep functionality.
You can send the service a simple string of text, and it will generate the life like voice in your choice of 47 different voices.
But it’s not naive of the context of the text. For example, the text here - ‘WA’ and ‘degree F’, that would sound strange if it were spoken out loud.
Instead, Polly will automatically expand the text strings ‘WA’ and ‘degree F’, to ‘Washington’ and ‘degrees fahrenheit’, to create more life like speech. The developer doesn’t have to do anything - just send the text, and get life like voice back.
30
24
a fully managed deep learning based image recognition service.
Designed from the get-go to run at scale. It comprehends scenes, objects, concepts and faces.
Given an image, it will return a list of labels. Given an image with one or more faces,it will return bounding boxes for each face, along with face attributes.
Given two images with faces, it will compare the largest face from the source image and find similarity with faces found in the tagret image.
Rekognition provides quality face recognition at scale, and supports creation of collection of millions of faces and search of similar faces in the collection.
Now lets dive into each of these features and look at the API that support these features.
Image moderation
Rekognition automatically detects explicit or suggestive adult content in your images, and provides confidence scores.