Perform near-real-time analysis on faces (emotions, gender, age, etc.), taken from a live video stream with Azure Cognitive Services and AWS Rekognition.
Biometric Systems - Automate Video Streaming Analysis with Azure and AWS
1. Biometric Systems -
Automate Video Streaming Analysis
with Microsoft Azure
and Amazon Web Services
S. CLINCIU, R. FALCONI, F. GUIDI, C. NAVARRA
Sapienza – University of Rome
MSc in Engineering in Computer Science
Prof. M. De Marsico, Course of Biometric Systems
A.Y. 2019/20
2. OVERVIEW
• Introduction
• Ideas
• Technologies: .NET Core, Azure APIs, AWS SDK, OpenCV and Visual Studio
• Microsoft Azure: Dashboard, Cognitive Services, Face API, Computer Vision API and pricing
• Face Detection and Face Recognition according to Microsoft
• Experiments: Microsoft Azure compared to AWS
• AWS: Management Console, AWS Lambda, S3, Amazon Rekognition, Kinesis Video Streaming and pricing
• Face Detection and Face Recognition according to Amazon
• .NET Core and C# implementation
• Conclusions
• Appendices: how to run the code, references and useful links
3. INTRODUCTION
• The goal of our project is to:
• Perform near-real-time analysis on faces (emotion, gender,
age, etc.) taken from a live video stream with the OpenCV
.NET SDK
• Acquire frames from a video source
• Select which frames to analyze
• Submit these frames to the Microsoft Azure Face, Computer
Vision and Emotion APIs
• Consume each analysis result that is returned from the API
call
• Return positive or negative results basing the test on Face
IDs
• Experimental comparison between Microsoft Azure and
AWS
4. IDEAS
BIOMETRIC SYSTEMS
MICROSOFT AZURE VS AWS
USING FACE ID AS A UNIQUE
IDENTIFIER STRING FOR EACH
DETECTED AND ANALYSED FACE
DISTIL ACTIONABLE INFO FROM
IMAGES OF THE REAL WORLD
DETECT, IDENTIFY, ANALYSE,
ORGANIZE, AND TAG FACES IN
PHOTOS WITH BOTH MICROSOFT
AZURE AND AWS
5. TECHNOLOGIES
• To develop our software, to write the code and to build the final
project, we have used the following tools and technologies:
• Microsoft .NET Core context and C# programming language
• Visual Studio Enterprise 2019 integrated development
environment (IDE)
• OpenCV (Open Source Computer Vision Library) to provide an
infrastructure for biometric systems and apps
• Microsoft Azure with Cognitive Services, Computer Vision,
Emotion and Face APIs
• AWS with Amazon Rekognition and Amazon Kinesis
6. TECHNOLOGIES:
.NET CORE
• We have started developing our software in C# programming
language using Microsoft .NET Core environments and
libraries
• .NET Core is a free and open-source, managed computer
software framework for Windows, Linux, and macOS
operating systems
• It is a cross-platform successor to .NET Framework
• The project is primarily developed by Microsoft and released
under the MIT License, but it is widely supported by
developers and competitors (such as Amazon)
7. TECHNOLOGIES: OPENCV
• OpenCV (Open source computer vision) is a library of
programming functions mainly aimed at real-time computer
vision.
• The library is cross-platform and free for use under the open-
source BSD license.
• We used it to develop a near-real-time video streaming linked
to Azure Cognitive Services.
8. TECHNOLOGIES:
AZURE
• Microsoft Azure is a cloud computing
service created by Microsoft for
building, testing, deploying, and
managing applications and services
through Microsoft-managed data
centers.
• It provides software as a service
(SaaS), platform as a service (PaaS)
and infrastructure as a service (IaaS)
and supports many different
programming languages.
• In our project, we used Cognitive
Services as described later.
9. AMAZON WEB SERVICES (AWS)
• Amazon Web Services (AWS) provides on-demand cloud
computing platforms and APIs to individuals, companies, and
governments, on a metered pay-as-you-go basis.
• Cloud computing web services provide a set of primitive
abstract technical infrastructure and distributed computing
building blocks and tools.
• In our project, we used AWS to be compared with Microsoft
Azure, one of its major competitors.
10. TECHNOLOGIES:
VISUAL STUDIO
• Microsoft Visual Studio is an
integrated development environment
(IDE) from Microsoft.
• Team Explorer is used to integrate the
capabilities of Azure DevOps (either
Azure DevOps Services or Azure
DevOps Server) into the IDE.
11. MICROSOFT AZURE
DASHBOARD
•Microsoft Azure Dashboard is a focused
and organized view of your cloud
resources, in the Azure portal.
•We used the dashboard as a workspace
where you can quickly launch tasks for
day-to-day operations and monitor
resources.
12. MICROSOFT AZURE
COGNITIVE SERVICES
• A comprehensive family of AI
services and cognitive APIs to help
you build intelligent apps
• Cognitive Services bring AI within
reach of every developer—
without requiring machine-
learning expertise. All it takes is an
API call to embed the ability to
see, hear, speak, search,
understand, and accelerate
decision-making into your apps.
13. MICROSOFT AZURE
RELEVANT SERVICES
• Cognitive Services
• Face and Computer Vision APIs
• Face Detection
• Face Recognition:
Verification
• Face Recognition:
Identification
• Custom Vision
• Train your model in a web
app
• Tag and describe your
dataset
14. MICROSOFT AZURE
FACE AND COMPUTER VISION
APIS
• Explore the Azure services to get started with
Computer Vision and Face
• Get the API Key and endpoint to authenticate
your applications and start sending calls to the
service: All Computer Vision calls, and Docker
container activations require a key. Specify the
key either in the request header (Web API),
the Computer Vision client (SDK) or through
the command-line (Docker container)
• Try the service in the API console - requires an
API Key and selecting your location: west
Europe
• Make a web API call - requires your API Key
and endpoint
15. FACE DETECTION
• Face detection is the action of locating
human faces in an image and returning
different kinds of face-related data.
• Each detected face corresponds to a face-
rectangle field in the response as a set of
pixel coordinates.
• Using these coordinates, you can get the
location of the face and its size. In the API
response, faces are listed in size order
from largest to smallest.
• If you're detecting faces from a video feed,
you may be able to improve performance
by adjusting certain settings on your video
camera.
16. FACE DETECTION
• The face ID is a unique identifier string for
each detected face in an image. You can
request a face ID in your Face – Detect API
call.
• Face landmarks are a set of easy-to-find
points on a face, such as the pupils or the tip
of the nose. By default, there are 27
predefined landmark points.
• Attributes are a set of features that can
optionally be detected by the Face -
Detect API. The following attributes can be
detected: age, gender, glasses, hair, noise and
smile.
17. FACE RECOGNITION
Verify and Identify face recognition
operations and the underlying data
structures
Recognition describes the work of
comparing two different faces to
determine if they're similar or belong to
the same person
Verify operation takes two face ID and
determines whether they belong to the
same person
Identify operation takes one or several
face IDs and returns faces might belong
to the IDs
18. FACE RECOGNITION
• It is possible to write C# code by using
the Azure Cognitive Services Face API
client library to apply Verify operation.
• Every call to the Face API requires a
subscription key. This key can be either
passed through a query string
parameter or specified in the request
header. To pass the subscription key
through a query string.
• To get the subscription key, it is
required to go in the Azure
Marketplace from the Azure portal.
19. AZURE PRICING
TIER
The cost of your cognitive services depends on the
actual usage and the options you choose below.
Both Face API and Vision API, in the Cognitive
Services of Azure, cost 0.84 EUR / 1000 Calls as the
day of writing this document in Europe.
20. CUSTOM VISION
At the beginning, we used Custom Vision
for tests and as a workbench.
Custom Vision is a fully available web
app powered by Azure to:
• Easily train your models
• Analyze its performance and make
quickly predictions
• Use Computer Vision API in a user-
friendly way without use of code
21. AWS MANAGEMENT
CONSOLE
• Access and manage Amazon Web
Services through a simple and
intuitive web-based user interface.
• Administer your AWS account: the
Console facilitates cloud management
for all aspects of your AWS account.
• Finding Services in the AWS Console:
there are several ways for you to
locate and navigate to the services
you need thanks to the AWS Console.
• We used AWS IAM, Lambda, S3,
Rekognition and Kinesis.
22. AWS IAM
• AWS Identity and Access
Management (IAM) enables you to
manage access securely, because you
can create and manage AWS users,
groups and permissions to allow and
deny their access to any AWS
resources.
• IAM is a feature of your AWS account
offered at no additional charge. You
will be charged only for use of other
AWS services by your users.
23. AWS LAMBDA AND
AMAZON S3
• AWS Lambda is an event-driven, serverless computing platform. It runs
code in response to events and automatically manages the computing
resources required.
• Amazon Simple Storage Service (S3) provides object storage through a
web user interface.
• We put together Lambda and S3 because the first is hosted by the second.
24. AMAZON REKOGNITION AND
AMAZON KINESIS
• Amazon Rekognition is a cloud-based Software as a service (SaaS)
computer vision platform, it provides several computer vision
capabilities.
• Amazon Kinesis makes it easy to collect and analyze real-time,
streaming data so you can react quickly to new information.
• We put together Rekognition and Kinesis because they work in close
contact.
25. AWS LAMBDA
PRICING TIER
• AWS Lambda Prices are clearly
declared, but are not easy to
understand, at least not as much as
Azure does.
• While Azure declares a cost of 0.84
EUR / 1000 APIs calls for Cognitive
Services, AWS explain costs in a less
easy to understand way as the
following image shows.
26. AMAZON S3
PRICING TIER
• Other than AWS Lambda, you must
pay also for both you store in S3 (the
Lambda function itself and the video
streaming) and for any request or data
retrieval.
• It follows that the cost is a sum of
Lambda and S3 cost, each one is a
result of a complex calculation, as
shown in the following images.
27. .NET CORE CLI
• In a console window (such as cmd,
PowerShell, or Bash), the dotnet
new command lets you create a
new console app with the name
face-quickstart.
• This command creates a simple
"Hello World" C# project with a
single source file: Program.cs.
28. USED LIBRARIES
• Once we had the Program, we used
the following directives, for the AWS
and Azure APIs respectively.
• In the application's Main method,
create variables for your resource’s
Azure and AWS endpoints and keys.
• Within the application directory, we
installed the Face Client Library for
.NET, AWS .NET SDK, AWS Toolkit for
Visual Studio and OpenCV Sharp in the
NuGet. library
29. FROM A SIMPLE APPROACH
TO PARALLELIZING API CALLS
• The simplest design for a near-real-
time analysis system is an infinite loop.
• However, when analysis happens in
the cloud, the latency involved means
that an API call might take several
seconds.
• The solution to this lag problem is to
allow the long-running API calls to
execute in parallel with the frame-
grabbing. In C#, we could achieve this
using Task-Based Parallelism.
30. A PRODUCER-CONSUMER DESIGN
• With C# Task-Based Parallelism, Multiple API calls
might occur in parallel.
• The new problem is that results might get
returned in the wrong order.
• Final step is to add a "consumer" thread that will
raise exceptions, kill long-running tasks and
ensure that the results get consumed in the
correct order.
31. AWS LAMBDA FUNCTION:
TWO CONSTRUCTORS
• The first is used when Lambda
invokes your function, it determine
acceptable confidence levels and
creates S3 and Rekognition service
clients and gets the AWS
credentials for these clients from
the IAM.
• AWS Region is set to the region
your Lambda function is running
in.
• The second constructor is used for
testing purposes and to check if
everything is running well.
32. AWS LAMBDA
FUNCTIONHANDLER
• FunctionHandler is the method
Lambda calls after it constructs
the instance.
• The S3Event contains all the
information about the event
triggered in Amazon S3.
• The function loops through all the
S3 objects that were part of the
event and tells Rekognition to
detect labels.
• After the labels are detected, they
are added as tags to the S3 object.
33. AWS TOOLKIT AND
AWS EXPLORER
• The AWS Toolkit for Visual Studio is
an extension for Microsoft Visual
Studio running on Microsoft
Windows that makes it easier for
developers to develop, debug, and
deploy .NET applications using
Amazon Web Services.
• With the AWS Toolkit for Visual
Studio, you'll be able to get started
faster and be more productive when
building AWS applications using
AWS Explorer.
34. UPLOAD
AWS LAMBDA FUNCTION
• This launches the deployment
process, which builds and packages
the Lambda project and then creates
the Lambda function.
• Once publishing is complete, the
Function view in the AWS Explorer
window is displayed. From here, you
can invoke a test function.
35. OPENCV SHARP
• NuGet is a free and open-source
package manager, distributed as a
Visual Studio extension.
• OpenCVSharp is an essential
library used in our project with
Azure to the cloud providers and
the APIs.
36. EXPERIMENTS DATASET
• We used Extended Cohn-Kanade Dataset (CK+), which is released under a Creative Commons
Attribution license, it contains 123 subjects and 593 image sequences (327 sequences having discrete
emotion labels) of both posed and spontaneous smiles, at a resolution of 640*490.
• A facial expression database is a collection of images or video clips with facial expressions of a range of
emotions.
• Well-annotated (emotion-tagged) media content of facial behavior is essential for training, testing, and
validation of algorithms for the development of expression recognition systems.
• The emotion annotation can be done in discrete emotion labels or on a continuous scale.
37. EXPERIMENTS
• AWS lets you upload to S3 an image
and to analyse it (with Rekognition),
thanks to a Lambda function.
• Rekognition send back its results.
• We uploaded to our AWS Bucket and
Azure Cognitive Services different
sets of images from the CK+ Dataset.
40. ROC CURVE
• Receiver operating characteristic (or
ROC) is a plot of the correctly
classified labels vs. the incorrectly
classified labels for a particular
model.
ROC
False Match Rate
GenuineAcceptRate
Azure - AWS
41. CMC CURVE
• Cumulative Match Characteristic (or
CMC) is a plot of the rank at which a
true match occurs vs. the
identification accuracy.
CMC
Rank
Identificationaccuracy
Azure - AWS
42. CONCLUSIONS
• By numbers, Microsoft Azure Cognitive Services is quite
more accurate than Amazon Rekognition:
• Azure has clear over fitting and over confidence
problems but…
• …overall, AWS even if does not suffer by these
problems, has quite bad results
• By us, Microsoft Azure provides:
• Best User-Friendly experiences (Dashboard is clearer
than Management Console and almost all services
provides the same UI).
• Clearer names (Cognitive, Speech Services, Face,
Emotion APIs etc. are speaking names and they are
easier to remember while compared with Amazon
Rekognition, Polly, Kinesis).
• Clearer prices (Azure just says EUR / Calls, while AWS
provides very complex tables and calculations to do).
• Hybrid cloud approaches, mixing local and web
infrastructure, fundamental for Italian Public
Administration.
43. FUTURE WORK
• It would be very interesting to make more
comparison and discover new services:
• Compare more services between AWS
and Microsoft Azure, such as their
respective Amazon Polly and Speech
Services.
• Compare more Cloud Provider to AWS
and Microsoft Azure, such as Google
Cloud and IBM Cloud.
44. HOW TO RUN
THE CODE
We have tried to make it easy to run
the project.
Get your own Cognitive Services API
keys on microsoft.com/cognitive, for
video frame analysis the applicable
APIs are Computer Vision API and Face
API.
Open the sample in Visual Studio, build
and run the application inserting the
API keys in the settings using IIS
(Internet Information Services).