Real-time event processing monitors the incoming data stream and initiates action based on detected events like fraud, error or performance degradation. These events are often used to issue alerts and notifications, take responsive action, or to populate a monitoring dashboard. In this session, we will walk through different use cases for event processing and demonstrate how to build a scalable pipeline for tracking IoT device status. AWS services to be covered include: AWS Lambda and the Kinesis Client Library (KCL).
2. Agenda Overview
10:00 AM Registration
10:30 AM Introduction to Big Data @ AWS
12:00 PM Lunch + Registration for Technical Sessions
12:30 PM Use Case Technical Deep Dive Sessions
• Data Collection and Storage
• Real-time Event Processing
• Analytics
3. Collect
Process
Analyze
Store
Data Collection
and Storage
Data
Processing
Data
Analysis
Event
Processing
Primitive Patterns
S3
Kinesis
DynamoDB
RDS (Aurora)
MySQL
AWS Lambda
KCL Apps
EMR Redshi?
Machine
Learning
10. KCL Design Components
KCL restarts the processing of the shard at the last known processed
record if a worker fails
11. Processing with Kinesis Client Library
• Connects to the stream and enumerates the shards
• Instantiates a record processor for every shard it manages
• Checkpoints processed records in Amazon DynamoDB
• Balances shard-worker associations when the worker instance count
changes
• Balances shard-worker associations when shards are split or merged
27. How can you use these features?
“I want to send
customized
messages to
different users”
SNS + Lambda
“I want to send an
offer when a user
runs out of lives in
my game”
Amazon Cognito
+ Lambda + SNS
“I want to
transform the
records in a click
stream or an IoT
data stream”
Amazon Kinesis +
Lambda
30. Read Data Directly into Hive, Pig,
Streaming and Cascading
Real time sources into Batch Oriented Systems
Multi-Application Support & Check-pointing
Amazon EMR integration
31. CREATE
TABLE
call_data_records
(
start_time
bigint,
end_time
bigint,
phone_number
STRING,
carrier
STRING,
recorded_duration
bigint,
calculated_duration
bigint,
lat
double,
long
double
)
ROW
FORMAT
DELIMITED
FIELDS
TERMINATED
BY
","
STORED
BY
'com.amazon.emr.kinesis.hive.KinesisStorageHandler'
TBLPROPERTIES("kinesis.stream.name"=”MyTestStream");
Amazon EMR integration: Hive