BDA 301 An Introduction to Amazon Rekognition, for Deep Learning-based Computer Vision

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
David Pearson, AWS AI Services
April 2017
Amazon Rekognition
Extract Rich Image Metadata from Visual Content

Amazon AI
Intelligent Services Powered By Deep Learning

Rich Metadata Index
objects, scenes, facial attributes, persons
Amazon Rekognition
Deep Learning-Based Image Recognition Service

Deer 98.8%
Wildlife 95.1%
Conifer 95.1%
Spruce 95.1%
Wood 78.3%
Tree 63.5%
Forest 63.5%
Vegetation 61.9%
Pine 60.6%
Outdoors 54.0%
Flower 53.9%
Plant 52.9%
Nature 50.7%
Field 50.7%
Grass 50.7%

{
"Image": {
"Bytes": blob,
"S3Object": {
"Bucket": "string",
"Name": "string",
"Version": "string"
}
},
"MaxLabels": number,
"MinConfidence": number
}
DetectLabels
Amazon S3
Image Bucket

DetectLabels
"Labels": [
{
"Confidence": 98.9294204711914,
"Name": "Moss"
},
{
"Confidence": 98.9294204711914,
"Name": "Plant"
},
{
"Confidence": 97.35887908935547,
"Name": "Creek"
},
{
"Confidence": 97.35887908935547,
"Name": "Outdoors"
},
{
"Confidence": 97.35887908935547,
"Name": "Stream"
},
{
"Confidence": 97.35887908935547,
"Name": "Water"
},

Age Range 38-59
Beard: False 84.3%
Emotion: Happy 86.5%
Eyeglasses: False 99.6%
Eyes Open: True 99.9%
Gender: Male 99.9%
Mouth Open: False86.2%
Mustache: False 98.4%
Smile: True 95.9%
Sunglasses: False 99.8%
Bounding Box
Height: 0.36716..
Left: 0.40222..
Top: 0.23582..
Width: 0.27222..
Landmarks
EyeLeft
EyeRight
Nose
MouthLeft
MouthRight
LeftPupil
RightPupil
LeftEyeBrowLeft
LeftEyeBrowRight
LeftEyeBrowUp
:
Quality
Brightness 52.5%
Sharpness 99.9%

"BoundingBox": {
"Height": 0.3449999988079071,
"Left": 0.09666666388511658,
"Top": 0.27166667580604553,
"Width": 0.23000000417232513
},
"Confidence": 100,
"Emotions": [
{"Confidence": 99.1335220336914,
"Type": "HAPPY" },
{"Confidence": 3.3275485038757324,
"Type": "CALM"},
{"Confidence": 0.31517744064331055,
"Type": "SAD"}
],
"Eyeglasses": {"Confidence": 99.8050537109375,
"Value": false},
"EyesOpen": {Confidence": 99.99979400634766,
"Value": true},
"Gender": {"Confidence": 100,
"Value": "Female”}
DetectFaces
smart cropping
& ad overlays
sentiment
capture
demographic
analysis
face editing
& pixelation

"FaceMatches": [
{"Face": {"BoundingBox": {
"Height": 0.2683333456516266,
"Left": 0.5099999904632568,
"Top": 0.1783333271741867,
"Width": 0.17888888716697693},
"Confidence": 99.99845123291016},
"Similarity": 96
},
{"Face": {"BoundingBox": {
"Height": 0.2383333295583725,
"Left": 0.6233333349227905,
"Top": 0.3016666769981384,
"Width": 0.15888889133930206},
"Confidence": 99.71249389648438},
"Similarity": 0
}
],
"SourceImageFace": {"BoundingBox": {
"Height": 0.23983436822891235,
"Left": 0.28333333134651184,
"Top": 0.351423978805542,
"Width": 0.1599999964237213},
"Confidence": 99.99344635009766}
}
CompareFaces

Collection
IndexFaces
SearchFacesbyImage
Nearest neighbor
search
FaceID: 4c55926e-69b3-5c80-8c9b-78ea01d30690
Similarity: 97
FaceID: 02e56305-1579-5b39-ba57-9afb0fd8782d
Similarity: 92
FaceID: 02e56305-1579-5b39-ba57-9afb0fd8782d
Similarity: 85

Collections and Access Patterns
Logging (public events; daily visitor logs; digital libraries)
• One potentially large collection per event / time period
• Enables wide searches
Social Tagging (photo storage and sharing)
• One collection per application user
• Enables automated friend tagging
Person Verification (employee gate check)
• One collection for each person to be verified
• Enables detection of stolen/shared IDs

Collection and Access Patterns
# Collections
# Faces per Collection
Person
Verification
Social Friend
Tagging
Event Logging /
Wide Search
1M

Amazon Rekognition Console
https://console.aws.amazon.com/rekognition/home

Amazon Rekognition Customers
• Law Enforcement and Public Safety
• Travel and Hospitality
• Digital Marketing and Advertising
• Media and Entertainment
• Internet of Things (IoT)

Law Enforcement and Public Safety
Washington County Sheriff (OR)
To follow leads from citizens & security cameras, a person
spends days manually searching thousands of images
The mobile and web app powered by Amazon Rekognition
compares new images with photos of previous offenders:
• Helps identify unknown theft suspects from security footage
• Provides leads by identifying possible witnesses & accomplices
• Identifies persons of interest who do not have identification

Travel and Hospitality
Anticipatory guest experiences for hotels using Amazon
Rekognition for facial recognition and sentiment capture
Kaliber is using Amazon Rekognition to help front desk agents
enhance relationships with guests:
• Recognize guests early for instant and personalized service
• Receive rich, contextualized guest information in real time
• Track guest sentiment throughout their stay
• Drive an 80% increase in guest satisfaction scores

Guest Workflow
Walk in Be recognized Be greeted
Capture sentiment to
trigger actionsEnjoy personalized serviceLeave with a fond farewell
“Kaliber allows us to bond with our guests from the
second they walk in my hotel.” – GM of a 5-star property

hotel
Simplified Architecture
One master guest collection
enables single-workflow
deployment across all
properties
Guest recognition triggers
real-time information retrieval
Automated pipeline
processing in AWS
improves reliability
Automated image
sampling constantly
improves recognition
quality

Influencer Marketing
Associate influencers with objects and scenes in social media
images in order to create high impact campaigns for clients
Using Amazon Rekognition for metadata extraction:
• Create rich media indexes of images from social media feeds, which
the application associates with influencers
• Enable analytics to profile environments where influence is strongest
• Connect client brands with the influencers most likely to have impact

Media and Entertainment
Identify who is on camera for each of 8 networks so
that recorded video can be indexed and searched
Video frame-sampling facial recognition solution
using Amazon Rekognition:
• Indexed 97,000 people into a face collection in 1 day
• Sample frames every 6 secs and test for image variance
• Upload images to Amazon S3 and call Amazon Rekognition
to find best facial match
• Store time stamp and faceID metadata

C-SPAN Indexing Architecture
Video feeds encoded from
8 locations (3 networks and
5 federal courthouses)
Frames extracted into
JPGs and hosted in
Amazon S3
Amazon SQS provides
asynchronous decoupling
Search Amazon Rekognition
collection for high similarity
matches
Results cache
drives search and
discovery requests
R3 hashing detects if a
scene significantly changes

Amazon Rekognition
Customers
• Digital Asset Management
• Media and Entertainment
• Travel and Hospitality
• Influencer Marketing
• Systems Integration
• Digital Advertising
• Consumer Storage
• Law Enforcement
• Public Safety
• eCommerce
• Education

Amazon Rekognition
for Media Metadata Generation
Shane Murphy, Cloud Solutions Engineer
Mark Kelly, Director Cloud Operations

Company Background
• Scripps Networks Interactive – Lifestyle Media
• Develop web and video content for distribution to
international audiences in 6 continents
• 190 million+ consumers each month
• Dozens of digital platforms, hundreds of thousands of
images, and petabytes of video.
• 2016: Digital content grew 700%
• 2017: Will produce 2,500 hours of linear television content

Media Metadata Attributes
• Easy – Size, resolution, name, etc.
• Harder – Classification. Room type, color
scheme, brand category, furniture style, etc.
• Must be fast
• Must be good (enough)

Problem Description
• Media management is core to our business.
• Manually creating metadata is time intensive, tedious,
and expensive.
• Automation is amazing!
• But how?

Classification - Current State
• Cutting edge – neural networks
• Example – MIT Places for Scene Recognition
http://places.csail.mit.edu/
• Complicated, bloated, computationally infeasible, static
• Only one problem type, but we have many classes to
identify

Let’s Simplify!
Our Strategy – Divide and Conquer
1. Use Amazon Rekognition to generate text labels for easy processing
2. Use supervised machine learning to train multiple predictive models
3. Set up multiple fan-out pipelines for automated classification
workloads

Amazon
Rekognition
Step 1: Generate Labels
Python (boto3) example
for img in training_images:
labels= rekognize.detect_labels ( Image = { 'S3Object' : { 'Bucket' :
SOURCE_BUCKET, 'Name' : img} }, MinConfidence =
MIN_CONFIDENCE)['Labels']
labels[0] = 'Plant Potted Plant Furniture Indoors Interior Design
Room Kitchen’

Step 2a: Transform Labels
Plant Room Table Lamp Furniture AttributeN
Image0 2 1 0 1 2 …
Image1 1 3 1 1 0 …
Image2 1 1 1 0 1 …
Bag of Words

Step 2b – Derive Relationships
• Split the training data, use most of it to train, some to test
• Options - Decision trees, random forest, k nearest
neighbors, multinomial logistic regression
• Specifics determined by problem description and tuning
(art and science)

Step 3 – Predict New Metadata
Input Labels = “Plant Potted Plant Indoors Interior Design Room
Bedroom Lamp Lampshade Table Lamp Apartment Housing Lighting
Dining Room Shelf Furniture Table Tabletop”
 “Dining Room”

Let’s Simplify!
Strategy – Transform to easier use case
• Sample video frames -> feed through Amazon Rekognition,
classifiers, and other analysis engines and parsers

Use Case – Fanout Video Pipeline
Amazon
Rekognition
Amazon
Elasticsearch
Service
Amazon S3

So what?
• Room type classification initial results – 75% accurate
• Immediate savings in image and video classification:
$500,000
• Time to market – thousands of hours saved per year
• Content Grouping and Dynamic Generation

Challenges in Our Approach
• Integration with Amazon Machine Learning
• Lack of Optical Character Recognition
• Model Management and Lifecycles
• Real time generation

Future Directions
• Revenue Opportunities!!! Product placement, logos, etc.
• Facial Recognition
• Landmark Detection
• Cultural Sensitivity
• Compliance and Terms of Service

Amazon Rekognition Availability and Pricing
Free Tier: 5000 images processed per month for first 12 months
General Availability in 3 regions:
US East (N. Virginia), US West (Oregon); EU (Ireland)
Image Analysis Tiers Price per 1000
images processed
First 1 million images processed* per month $1.00
Next 9 million images processed* per month $0.80
Next 90 million images processed* per month $0.60
Over 100 million images processed* per month $0.40

Developer Resources and more…
https://aws.amazon.com/blogs/ai/
https://aws.amazon.com/rekognition

IoT Use Case
real-time facial recognition at the edge
AWS Advanced Consulting Partner
• Migrations
• DevOps
• Managed Services
• Software & Hardware Engineering
• User Experience & Visual Design
• Rapid Prototyping
AWS Competencies: DevOps, IoT, Healthcare

NERF CS-18 N-Strike Elite Rapidstrike
Adafruit 2.8”
PiTFT display
Raspberry Pi 3
Amazon Rekognition
Training Image

Thank You!
pearsond@amazon.com

BDA 301 An Introduction to Amazon Rekognition, for Deep Learning-based Computer Vision

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a BDA 301 An Introduction to Amazon Rekognition, for Deep Learning-based Computer Vision

Similar a BDA 301 An Introduction to Amazon Rekognition, for Deep Learning-based Computer Vision (20)

Más de Amazon Web Services

Más de Amazon Web Services (20)

Último

Último (20)

BDA 301 An Introduction to Amazon Rekognition, for Deep Learning-based Computer Vision