Hi My name is Kanchan Waikar and I am a Senior specialist architect from AWS and today, I am going to share how you can add differentiating features backed by machine learning to your applications.
First I will talk about different types of data that you typically extract insights from and then I will share how you can use hundreds of pre-trained ml models to build differentiating features. And finally I will share a sample use-case and Joseph and I would do a demo of how you would integrate ml models into your kafka applications
Often there are three types of datasets that you use. In house data sitting in your data lake in your S3 buckets clickstream or real-time data, also known as data in motion and the third-party data that you procure from your data vendors
Amazon S3) is the largest and most performant object storage service for structured and unstructured data and the storage service of choice to build a data lake. And there are several tools and services that you can use to build your data lake in s3.
For your data in motion, Amazon Kinesis offers a variety of key capabilities such as data streams for your clickstream data, firehose for persisting data in S3, kinesis analytics for real-time analytics, and video streams for video data. And then there is a managed service for Kafka which you can use to build and run applications that use Apache Kafka to process streaming data.
Many customers have in-house ml capabilities and time, and these customers need external, real-world, high quality data However, they need to worry about moving this data into their AWS cloud, not only once but on regular intervals since the third party data vendors often produce and share data at regular intervals. And this is exactly the problem AWS Data Exchange solves.
AWS Data Exchange makes it easy for you to procure third-party from your data vendors. AWS Data Exchange contains over three thousand data products which you can procure from and once procured, the data can be loaded into your S3 bucket. Once its in your data lake, you can use tools you wish to use to perform analytics.
AWS Data exchange also supports incremental data delivery from sellers so whenever a new revision of the dataset you have subscribed to becomes available, you get a cloudwatch event notification that you can use to consume incremental data as part of your application.
Once you have data in place, you can perform machine learning and extract insights your business needs.
And Amazon SageMaker can help you with that.
You can use Amazon SageMaker to build train and deploy ML models. It provides several features that help you with the end-to-end delivery and management of machine learning models.
If you prefer something out-of-the box, I recommend checking AI services suite.
Amazon Rekognition makes it easy to add image and video analysis to your applications.
Amazon Transcribe provides speech-to-text capability .
Amazon Translate provides high-quality language translation.
Amazon Comprehend is a natural language processing (NLP) service.
Amazon Lex is a service for building conversational chatbots.
Forecast helps you train forecasting ml models
And you can use personalize to build recommendation systems.
In short, AWS Offer a range of AI, ML, and analytics services.
And Apart from first party services, you also get a large selection of third-party AI & ML solutions in AWS Marketplace.
AWS Marketplace is where you find, try, buy, and deploy third-party software from. It contains over 10,000 different catagories of software such as machine learning, analytics Data and several other categories of products. They are easy to deploy and a lot of products are even available as SaaS offerings, so you have nothing to deploy.
And the billing of these third party product is consolidated on your AWS bill.
The AWS Marketplace offers a wide variety of pricing options to fit your specific need, with many of the sellers providing free trials.
AWS Marketplace supports standard pricing options, such as hourly, monthly, and annually. If you migrate to AWS and want to take some of your existing tools, you can bring your own license as well.
When you have a relationship with the Seller or Consultant, you can negotiate the price with them and generate a private offer for you via AWS Marketplace.
Infact, the kafka via Confluent Cloud SaaS product that Joseph spoke about can be procured via AWS Marketplace.
Confluent Cloud helps you manage Apache Kafka, Schema Registry, Connect, and ksqlDB so you can effectively focus on development and delivery for your real-time streaming and analytics use cases.
Now I am going to tell you about pre-rtained machine learning models which you can use to instantly add ML backed differentiating features to your application
A pre-trained ML model is an entity that accepts an input payload and returns you a prediction. A pre-trained ml model typically solves a type of a problem
E.g. there is a ML model that accepts a car’s picture as an input and performs a prediction returning the make, model, and year of the car.
There is another pre-trained machine learning model that identifies whether a person is wearing a mask or not.
And customers like using pre-trained models because they let you get around the heavy lifting of hiring ML resources and training and tuning ml models from scratch.
So users look for high-quality pre-trained ml models, typically developed by machine learning vendors, however, it can be tricky to use third-party ML solutions
There are plenty of high quality models available from technology companies but during the initial discussion phase itself, an important question arises –where to evaluate and qualify the model?
Does it happen in the seller's environment or does it happen in buyer’s environment?
Seller wants to protect IP
And buyer wants to protect the data (which is often sensitive to their business)
You also need to learn seller specific interface to interact the with model
We heard all these challenges from customers and decided to solve them via AWS Marketplace
You can find hundreds of third party machine learning models and algorithms that you can try, buy, and deploy in your AWS environment, via amazon SageMaker
You can try without having to learn different API interfaces just to be able to evaluate them.
AWS Marketplace contains a large set of models from leading machine learning ISVs, (Independent software vendors) and some open source frameworks from AWS too.
There are 2 types of Amazon SageMaker compatible machine learning products -
First is Model packages - An ML model is an entity that accepts an input payload and returns you a prediction.
And second one is an algorithm, that you use to train a custom ml model.
E.g. say you have a large dataset you want to train a regression model on, well you can use high-performance AutoML algorithms such as AutoGluon-Tabular from AWS Marketplace and train a high quality ml model without having to learn a whole lot about machine learning
The AWS Marketplace for machine learning models works very much similar to other AWS Marketplace products. You can browse and choose the model you like. Many models have the option that lets you try a demo with your data for free without you having to subscribe.
<click 1>
These models are deployed using Amazon SageMaker in your AWS account with network isolation to protect your data.
<click 1>
You can easily perform real-time or batch inference on these models via REST API.
AWS Marketplace enables customers to use third party models securely via four key features.
When sellers list the models, static and dynamic scans are performed by Amazon SageMaker for vulnerabilities to help you secure your data.
Amazon SageMaker encrypts algorithm and model artifacts and other system artifacts in transit and at rest and it isolates the deployed algorithm/model artifacts from internet access, helping you to secure your data
These containers are deployed in internet free environment . You can even choose to deploy them in your own private VPC and control access to the same. You can also configure and monitor VPC flow logs to see whats going in and what's coming out of the container.
Requests to the Amazon SageMaker API are over a secure (SSL) connection
Amazon SageMaker requires AWS Identity and Access Management credentials to access resources and data on your deployment via an IAM execution role
And you can also use private marketplace and AWS Service catalog to further control procurement and distribution of the model
Developers with little data science expertise and Business users use models to easily build AI/ML backed solutions
You can see multiple sample use-cases on the slide for example you can use an ml model to perform optical character recognition to extract characters from an image.
Data Analysts have to manually identify and use only good quality data and Often this process can be expediated by using an ML model.
E.g.
Background noise classifier can be used to identify whether an audio file contains background noise not. With ML models you can go one step further and use a Source separation model that separates the audio from background sounds. And then use the audio files for training your ml models
There is an ML models in image and text catagory too
during feature engineering step, data analysts, data engineers identify and create multiple features that can be used.
There are ML models that you can use to generate additional synthetic features.
E.g. there is a model that accepts text and returns emotion, which can be a powerful synthetic feature for your book-genre classification ML model.
Now let me show you how to explore and deploy an ml model.
Now let me show you how to browse and perform inference on an ml model.
Here is a quick summary of what I did during my demo today.
I chose a model, i subscribed to it and then i executed cloudformation template to deploy the model.
Once model was deployed i used AWS CLI to perform inference
You can see how application developers without any ML knowledge can integrate ML in solutions
Now before I start talking about how you can integrate ML in your kafka application I want to quickly cover a little more about Amazon Managed Kafka service
Amazon MSK makes it easy for you to build and run production applications on Apache Kafka without needing Apache Kafka infrastructure management expertise. That means you spend less time managing infrastructure and more time building applications.
With a few clicks in the Amazon MSK console you can create highly available Apache Kafka clusters with settings based on best practices. MSK automatically provisions and runs your clusters.
It continuously monitors cluster health and automatically replaces unhealthy nodes with no downtime to your application.
now let me show you how you can add machine learning to your applications via a hypothetical use-case.
Imagine that you work for a construction site SAFETY surveliance PRODUCT company.
YOUR customers have installed cameras provided by your company and your company supposed to monitor to help them improve compliance to safety standards.
So, you need to ensure that onsite workers are wearing personal protective equipment and hard hats which help avoid serious injuries. You know that you need a solution which helps you identify non-compliance early in the game. You also need a system which scales to accommodate large number of cameras.
Let me show you what I am talking about via this video. We see that there are two workers and one of them is not wearing a hard-hat, which means a guideline has not been followed.
We would like to identify this non-compliance incidence so that It can be fixed to reduce probability of occurrence any accidents or head injuries.
Ideally we want a system which summarizes actions happening via a summary log. And whenever non-compliance happens, it should get detected.
On this slide, you can snapshot by snapshot the progress happeneing at the worksite.
And you can see that around 7.5 seconds into the video, you see the driver who is handling excavator getting detected and an alarm getting generated which can potentially inform driver to wear PPE.
To build such an architecture, Here are the components we are going to need.
Amazon S3 – is a scalable object storage service.
Detection of a hard hat, PPE is a machine learning task – you need what we call a machine learning model which can take an image and return a prediction.
Next you need Amazon Sagemaker – A Machine learning platform on which you can build, train and deploy ML models, For building solution for the problem I just discussed, we would deploy computer vision models which would help us identify whether a person is in picture and whether he or she is wearing high visibility vest and hard hat.
Whenever we detect non-compliance we would want to notify administrators and for that we would use SNS – simple notification service which lets us send email, text message and other kinds of notifications.
We also need an event streaming platform which would help us scale the analysis of the summary logs generated for hundreds and thousands of cameras – We would use kafka topics created using confluent cloud.
Similiary we will use ksqldb to transfer non-compliance messages into a separate topic for further processing.
Once store din S3, we can use tools such as Quicksight, EMR, and Athena to transform, analyse and visualize data.
Now let me show you architecture as well as the demo of how we implemented the solution for the problem.
Here is how the architecture that uses ml models and confluent cloud from AWS Marketplace would look like.
The feed from the camera can be fed to kinesis video stream and then store it into S3. S3 bucket can be configured to trigger lambda via event notification which would perform inference and generate summary logs as well as identify non-compliance alarms.
We want the architecture to be elastic, pay-as-you-go, and scalable, to be able to scale to hundreds and thousands of such cameras. I also didn’t want to manage kafka infrastructure, and so I chose to use Kafka Confluent cloud from aws marketplace. So we would push summary logs into a kafka topic and KSQLDb query would transport the alarm topics into another topic and generate SNS notification via a lambda function.
Now let us switch into AWS and confluent console, and show you how this architecture works.
Ok, now let me show you how you can add machine learning to your applications via a hypothetical use-case.
Thanks Joseph,
To give you a little more insight into the architecture, I have listed some relevant models on this slide.
In our todays use case, we used a machine learning model that detects specialized construction machinery such as a forklift.
We also used a machine learning model that identifies whether a person is wearing a hard hat and a high-visibility protective vest – Which have been mandated for construction workers under OSHA guidelines to minimize exposure to hazards that cause serious workplace injuries and illnesses.
So as you can see, with pre-trained ML models and AI services offered by AWS, it becomes really easy to add machine learning backed features to your kafka applications.
I recommend you to explore AWS Marketplace and identify ML models that are suitable for you. And If you need any customizations in the ML model, get in touch with AWS Markteplace team.
To summarize, Do, experiment with different tools such as confluent cloud to see which tools or service can help you scale your architectures, evaluate them and see if they fit your needs.
Use pre-trained machine learning models to make your workflows intelligent and to add differentiating features which help you stand out.
Use managed solutions such as confluent cloud from AWS Marketplace
And most importantly, innovate on behalf of your organization.
Feel free to reach out to me If you have any questions