We are collecting tons of sensor data from billions of devices. How do you get the value from your IoT data sources? In this session, we will explore different strategies for collecting and ingesting data, understanding its frequency, and leveraging the potential of the cloud to analyze and predict trends and behavior to get most out of your deployed devices.
2. What to Expect from the Session
• Understand different kinds of data relevant to the IoT
• Learn how the AWS platform can help turn data into
insights & actions
• Ideas & advice on how to integrate various AWS
services with the Internet of Things
5. Source: McKinsey & Company 2013
Rapid Growth from 1B to
50B Connectable “Things”
All these “Things” generate data:
• Status information
• Sensor readings
• User interactions
• State changes
• Operational events
• …
6. One of the big challenges with the IoT is to
Collect Analyze Act on
data from devices to generate insights.
7. Three Ways to Analyze Data
Retrospective
analysis and
reporting
Past Data
8. Three Ways to Analyze Data
Retrospective
analysis and
reporting
Here-and-now
real-time processing
and dashboards
Present DataPast Data
9. Three Ways to Analyze Data
Retrospective
analysis and
reporting
Here-and-now
real-time processing
and dashboards
Predictions
to enable smart
applications
Past Data Present Data “Future Data”
10. Three Ways to Analyze Data
Retrospective
analysis and
reporting
Here-and-now
real-time processing
and dashboards
Predictions
to enable smart
applications
Amazon Kinesis
AWS Lambda
Amazon DynamoDB
Amazon EC2
Amazon Redshift
Amazon RDS
Amazon S3
Amazon EMR
Amazon Machine
Learning
12. IoT Requires Quick Processing
- Discover patterns in live sensor data
- Correlate events as they happen
- Enrich live data with additional info
Why?
- Trigger quick reactions
- Adapt to usage of Things
- Users want quick reaction & feedback
Here-and-now
real-time processing
and dashboards
13. IoT Requires Past Context
- Provide context for current events
- Keep information of past events to
determine long-term trends
Why?
- Enables learning from past data
- Enable reporting & explorative analysis
to understand usage
- Usage monitoring and billing (Long-
term storage of usage & billing
metrics)
Retrospective
analysis and
reporting
14. Predictions
to enable smart
applications
IoT Benefits from “Smart” Devices
- Detect patterns in event data
- Learn 'rules' / distributions in the data
Why?
- Predict future events
- Problems that are likely to appear
- Anticipate user actions (or desired
outcomes)
- Actionable predictions: what to do next
16. Indoor Temperature / Climate Sensors
• Fleet of indoor air conditioning units with 3 sensors each
• Deliver updates on temperature, humidity & pressure
every couple of seconds
• Connected to the cloud
• “Semi-reliable”
17. Sample Message As Sent By Device
{
"temperature" : "100",
"humidity" : "92",
"pressure" : "8”
}
18. MQTT Topic
MQTT Topic
MQTT Topic
• Each device uses
certificate authentication
• Send messages via MQTT
• One topic per device:
rooms/ac/${deviceID}DeviceID 1
DeviceID 2
DeviceID 3
AWS IoT
Service
(Pub/Sub
Broker)
19. Questions we might ask our Example Data:
- How many sensors are
connected right now?
- Is the current
temperature in line
with yesterday's / last
year's data?
- How did temperatures
change over time?
- What is the
relationship between
pressure and temp?
- Are our sensor
readings plausible?
- How can we tell a
broken sensor from a
good one?
- Do I have to wear a
sweater to work?
21. Highly scalable
Pub Sub Broker
MQTT
Subscribers
Publishers
Secure by Default
Connect securely via X509 Certs and
TLS v1.2 Client Mutual Auth
Multi-protocol Message Gateway
Millions of devices and apps can connect
over MQTT or HTTP.
topic
Elastic Pub Sub Broker
Go from 1 to 1-billion long-lived
connections with zero provisioning
AWS IoT: Securely Connect Devices
22. AWS IoT: Front Door to AWS
Device Registry
Cloud alter-ego of a physical device. Persists
metadata about the device.
Rules and Actions
Match patterns and take actions to send data
to other AWS services or republish
Device Shadows
Apps and devices can access “RESTful”
Shadow (state) that is in sync with
the device
Device
Thing Name
Sensor Temp
Actuator Servo
GetTemp()
Output LED
Rules Engine
Shadow
Registry
S3
Lambda,
Kinesis
Kinesis Firehose
DynamoDB
SNS
…
Mobile App
23. AWS IoT Rules Engine
Rules Engine evaluates inbound
messages published into AWS
IoT, transforms and delivers to the
appropriate endpoint based on
business rules.
External endpoints can be
reached via AWS Lambda and
Amazon Simple Notification
Service (SNS).
Invoke a Lambda function
Put object in an S3 bucket
Insert, Update, Read from
a DynamoDB table
Publish to an SNS Topic
or Endpoint
Publish to a Kinesis stream /
Actions
Amazon Kinesis Firehose
Republish to AWS IoT
24. Flexibility of Rules – An Example
SQL-like syntax
Where operators
Inline functions
Actions
"SELECT *,
clientId() as MQTTClientId
FROM 'room/ac/+'
WHERE temperature > 85",
"actions": [
{
”sns": {
"roleArn":
"arn:aws:iam::123456789012:role/SNSPutRole",
"topicArn": "arn:aws:sns:us-east-
1:123456789012:TempWarningNotification"
}]
26. Example: Receiving & Storing Data
- Devices set up as Things in Device Registry
- Each device sends data as JSON via MQTT
- One MQTT topic per device: rooms/ac/{deviceID}
- Each device has a certificate and access rights to use its
topic (already set up)
27. Our Goal:
• Move all (?) incoming data into permanent storage
• Make data available for later analysis:
- Reporting
- Billing / metering
- Explorative analysis
- Machine Learning
28. Our Approach:
1. Set up a Rule to Capture & Transform Incoming Data
2. Define an Action to Store the Data
3. Query & Analyze the Stored Data
29. 1) Set up a Rule to capture all sensor readings
{ "ruleName" : "Capture sensor readings",
"topicRulePayload" : {
"sql" : "SELECT *, clientId() as MQTTClientId
FROM 'rooms/ac/+' ",
"description": "capture data from all
sensors",
"actions" : [What goes here?],
"ruleDisabled" : false
}
}
30. 2) Define an Action to Store the Data
But where should we store it?
32. Storage Options: Amazon S3
Amazon S3
• Actions can directly write into (JSON) files on S3
• Very simple to configure, just provide bucket name
• Results in 1 file per event
• Lots of small files can be hard to handle
• Inefficient when processing with Hadoop / Amazon
EMR or when importing into Redshift
• Useful when you have a very low frequency of events,
e.g. when you only want to log outliers to S3
33. Storage Options: Amazon S3 (cont'd)
Amazon S3
• Buffer data using Amazon Kinesis or Amazon Kinesis
Firehose to get fewer, larger files
• Buffering, compression & output to S3 is built into
Firehose – no other infrastructure needed!
• Kinesis Connector Library can be extended to perform
transformation, filter or serialize data
• Additional Control over Buffering & Output Formats
• Added complexity: Requires Amazon EC2 workers
running Kinesis Connector Library
Amazon Kinesis
Firehose
34. Storage Options: Amazon Redshift
• Actions can forward data Amazon Kinesis Firehose
• Buffering & output to Redshift is built into Firehose
• Very easy to setup
• Fully managed
• Use Amazon Kinesis as an alternative
• More control: Use Kinesis Connector Library to
perform transformation, filter or serialize data
• Added complexity: Requires Kinesis Connector
Library etc. to execute on Amazon EC2
Amazon Kinesis
Firehose
Amazon Redshift
35. Storage Options: Amazon DynamoDB
• Actions can directly write into Amazon DynamoDB
• Creates one row per event, can define:
• Hash Key, Range Key and attributes to store
• E.g. Hash Key = deviceID, range key=timestamp…
• Very simple to configure, just provide table & field names
• Adding GSIs and LSIs provides additional flexibility and
enables different queries
• SELECTs can read from DynamoDB for fast lookups
Amazon
DynamoDB
37. Storage Options: Amazon DynamoDB (cont'd)
• AWS Lambda function provides additional flexibility:
• Transform data
• Write into different/multiple tables
• Enrich data with contextual information pulled in
from other sources
• Only able to process one event at a time! (i.e., AWS
Lambda –when called from AWS IoT– cannot aggregate
events before writing to DynamoDB)
Amazon
DynamoDB
AWS
Lambda
38. 3) Query & Analyze the Stored Data
How can we query the data?
41. Recommendations
Want to run a lot of queries constantly?
Use Kinesis Firehose to write into Amazon Redshift
Need fast lookups, e.g., in Rules or Lambda functions?
write into DynamoDB, add indices if necessary
Have a need for heavy queries but not always-on?
Use Kinesis Firehose & S3, process with Amazon EMR.
43. 1) Set up a Rule to capture all sensor readings
{ "sql" : "SELECT *, topic(3) as deviceID,
timestamp() as reading_time,
clientId() as MQTTClientId
FROM 'rooms/ac/+' ",
"description": "Forward sensor data to Firehose",
"actions" : [{
"firehose" : {
"deliveryStreamName": "sensors-firehose",
"roleArn": "string"
}
}],
"ruleDisabled" : false }
44. 2) Pump Data through Firehose into Redshift
sensors/devices
In a farm sending (Temp, Pressure, Humidity)
PolicyPrivate Key
& Certificate
Thing/Device
Rule
IAM Role
Policy
SDK
AWS IoT AWS Services
Actions
Publish
Store data from all
the field sensors in database
Amazon
Kinesis
Firehose
Amazon
Redshift
Rule: SELECT * FROM ‘rooms/ac/+’
45. 3) Analyze Data using Amazon QuickSight
PolicyPrivate Key
& Certificate
Thing/Device
Rule
IAM Role
Policy
SDK
AWS IoT AWS Services
Amazon
Kinesis
Firehose
Amazon
Redshift
Amazon
QuickSight
48. Our Goal:
• Alert on big temperature changes
• Collect & Visualize metrics current sensor readings
49. 1) Set up Rule to react to relevant sensor data
{ "ruleName" : "Notify on high temperatures",
"topicRulePayload" : {
"sql" : "SELECT *, clientId() as MQTTClientId
FROM 'rooms/ac/+'
WHERE temperature > 95 ",
"description": "Notify when temp exceeds 95",
"actions" : [What goes here?],
"ruleDisabled" : false
}
}
50. 1) Set up Rule to react to relevant sensor data
AWS IoT Rules
• only have access to the current event
• cannot take contextual information into account
Consider passing all the data to the Action for evaluation.
51. 2) Process the Data
What's the best way to
process this data?
53. Processing Options
AWS Lambda
• Processes a single event at a time (no batching)
• Enrich data with context information from other sources
• Perform transformations
• Run any node.js / Java function
• No infrastructure to manage!
54. Processing Options
• Great for alerts: Sends push notifications, emails and SMS
• Call other systems via HTTP POST / webhooks
(on AWS or on-premises)
• SNS Topics support multiple subscribers, incl. AWS
Lambda and Amazon SQS
Amazon SNS
55. Processing Options
• Great when events arrive with varying frequency
• Buffer data for asynchronous processing
• Ensure that no event data is lost
• SNS Topics support multiple subscribers, incl. AWS
Lambda and Amazon SQS
• Easily deploy SQS workers on AWS Elastic Beanstalk (or
Amazon EC2)
Amazon SQS
56. Processing Options
• Provides access to a "rolling window" of event data
• Scalable, can consume events from a multitude of different
rules / topics / devices
• Supports many independent, concurrent readers (&writers)
• Multiple processing options:
Amazon Kinesis
KCL
application
AWS
Lambda
57. Processing Options
• Scalable way to connect many different systems to the
stream of events, e.g., custom KCL code, Complex Event
Processing (CEP) products
• Amazon Kinesis is a hub for all stream processing needs
Amazon Kinesis
58. Example:
1. Read last N events from stream
2. Determine maximum and rate of increase since beginning
3. Decide if alert should be sent
Amazon Kinesis
59. Recommendations
Only care about individual events?
Invoke an AWS Lambda Function via Rule / Action
For sliding window analysis and more flexibility
Stream into Kinesis and Run AWS Lambda function
Use Amazon Kinesis as a Hub for all incoming events.
60. 3) Visualize the Current Metrics
• Managed Amazon Elasticsearch as a service
• Easy & fast indexing of data – well suited for lookups on
streaming data
• Easy to use visualization / dashboards using Kibana
Amazon
Elasticsearch
Service
63. Machine learning and smart devices
Machine learning is the technology that
automatically finds patterns in your data and
uses them to make predictions for new data
points as they become available
64. Machine learning and smart devices
Machine learning is the technology that
automatically finds patterns in your data and
uses them to make predictions for new data
points as they become available
Your devices + machine learning = smart devices
65. IoT Use Cases for Machine Learning
- Find potential problems by looking for patterns
- Identify engines that are about to break down
- Predict when supplies will run out
- Spot sensors that report implausible data
- Predict next movement / direction of a connected vehicle
- Based on driving parameters & observations from other cars
- Predict traffic jams before they occur
66. Amazon Machine Learning
Amazon
Machine Learning
• Real-time predictions (and batch)
• Training & evaluation of machine learning models
• Picks the right model & parameters, helps build training
data
67. Basic Approach
1. Collect / build training data
- Take past data for sensor readings (temperature, humidity,
pressure) –not the deviceID or timestamp– as input
- Target: we define which readings are 'correct' or incorrect and
add the target variable's value to the training data.
Amazon S3 Amazon Redshift
70. Basic Approach
4. Get predictions for events as they come in
Amazon
Machine LearningAmazon KinesisAmazon IoT AWS Lambda
Prediction
71. Basic Approach
1. Collect / build training data
- Determine input variables & target
- Evaluate the data to pick the target value for each set of
inputs in the data
2. Train a Machine Learning Model
- Builds a model based on the information in the training data
3. Create a real-time prediction endpoint for the model
- Outputs a prediction based on the input variables provided
4. Get predictions for events as they come in
72. Example Use Case: Filter out bad readings
1. Create a training data set based on past data & human
evaluation of the data
i.e., manually review the data and mark incorrect values
2. Train a Amazon ML model on this data to predict which
combinations are (in)correct
3. Invoke ML model on incoming data to predict
correctness
4. Alert staff via Amazon SNS push notification
74. Lambda Function
public String handleRequest(String input, Context context)
{
// Create AML client and cache endpoint
client = new AmazonMachineLearningClient(credentials);
// look up and cache the realtime endpoint for ML model
getRealtimeEndpoint();
PredictRequest request = new PredictRequest();
request.setMLModelId(mlModelId);
request.setPredictEndpoint(endpoint);
75. Lambda Function (continued)
// Populate record with relevant data
request.setRecord(jsonToMap(input));
PredictResult result = client.predict(request);
String label =
result.getPrediction().getPredictedLabel();
Float prob = result.getPrediction()
.getPredictedScores().get(label) * 100;
76. Lambda Function (continued)
String outputString = "Device is performing "
+ label + " with a probability of " + prob + "
%";
//publish to an SNS topic
PublishRequest publishRequest = new
PublishRequest(snsTopic, outputString);
PublishResult publishResult =
snsClient.publish(publishRequest);
return output.toString();
}
77. Recommendations
Rely on past data / context rather than defining 'rules'
Use Amazon Machine Learning for an easy start
Let real-time predictions drive reaction to patterns in
events
79. What Have We Built?
Amazon
Machine Learning
Amazon Kinesis
Amazon IoT
AWS Lambda
Amazon Kinesis
Firehose
Amazon
Redshift
Amazon
Elasticsearch
Service
AWS Lambda
80. Outlook: Where Do We Go From Here?
- Automated reactions to events: feeding back into the
system, i.e., enrich data based on correlated data,
predictions and past data, then react on predictions
- Complex Event Processing (CEP)
- Unsupervised learning…?
81. Related Sessions
MBL203 State of the Union – San Polo 3501B 11:00 AM
MBL203 Everything about AWS IoT – Venetian H 12:15 PM
MBL311 AWS IoT Security - Palazzo A 1:30 PM
MBL312 Rules and Shadow - Palazzo A 2:45 PM
MBL313 Devices SDK and Kits - Palazzo A 4:15 PM
MBL303 Mobile Devices and IoT - Delfino 4005 4:15 PM
MBL203 Devices in Motion - Delfino 4005 Friday 10:15 AM
MBL305 IoT Data and Analytics - Delfino 4005 Friday 11:30