Extracting real-time information from streaming data generated by mobile devices, sensors, and servers used to require distributed systems skills and writing custom code. This presentation will introduce Kinesis Streams and Kinesis Firehose, the AWS services for real-time streaming big data ingestion and processing.
We’ll provide an overview of the key scenarios and business use cases suitable for real-time processing, and how Kinesis can help customers shift from a traditional batch-oriented processing of data to a continual real-time processing model. We’ll explore the key concepts, attributes, APIs and features of the service, and discuss building a Kinesis-enabled application for real-time processing. This talk will also include key lessons learnt, architectural tips and design considerations in working with Kinesis and building real-time processing applications.
In this webinar, we will also provide an overview of Amazon Kinesis Firehose. We will then walk through a demo showing how to create an Amazon Kinesis Firehose delivery stream, send data to the stream, and configure it to load the data automatically into Amazon S3 and Amazon Redshift.
4. INDEX
A. What is real-time?
B. Examples and Challenges
C. Kinesis and beyond
1. Kinesis Stream
2. Kinesis Firehose
3. Kinesis Analytics (SQL)
4. DynamoDB Streams
5. ElasticSearch
6. IoT Rule Engine
D. Demo
5. v
Examples
• Algorithmic Trading < 10 msec
• Real time bidding < 100 msec
• Common IoT scenarios < 5 to 10 sec
• Infrastructure Monitoring Dashboard < 1 min
• Google Maps Traffic < 5 mins
• Social Network and Media recommendation < 15 min to a Day
• Most Business Analytics Scenarios < 30 mins
• Social Network listening < Depends on how fast you want to respond>!
6. INDEX
A. What is real-time?
B. Examples and Challenges
C. Kinesis and Beyond
1. Kinesis Stream
2. Kinesis Firehose
3. Kinesis Analytics (SQL)
4. DynamoDB Streams
5. ElasticSearch
6. IoT Rule Engine
D. Demo
8. v
Examples
• Algorithmic Trading < 10 msec
• Real time bidding < 100 msec
• Common IoT scenarios < 5 to 10 sec
• Infrastructure Monitoring Dashboard < 1 min
• Google Maps Traffic < 5 mins
• Social Network and Media recommendation < 15 min to a Day
• Most Business Analytics Scenarios < 30 mins
• Social Network listening < Depends on how fast you want to respond!
10. v
Challenges
A. Speed of Analytics and Response
B. Volume of data
A. Maturity or Capabilities of Analytics Framework
B. Storing and Presentation of results
12. v
Some statistics about what AWS Data Services
• Metering service
• 10s of millions records per second
• Terabytes per hour
• Hundreds of thousands of sources
• Auditors guarantee 100% accuracy at month end
• Data Warehouse
• 100s extract-transform-load (ETL) jobs every day
• Hundreds of thousands of files per load cycle
• Hundreds of daily users
• Hundreds of queries per hour
14. v
Internal AWS Metering Service
Workload
• 10s of millions records/sec
• Multiple TB per hour
• 100,000s of sources
Pain points
• Doesn’t scale elastically
• Customers want real-time
alerts
• Expensive to operate
• Relies on eventually consistent
storage
15. v
Our Big Data Transition
Old requirements
• Capture huge amounts of data and process it in hourly or daily batches
New requirements
• Make decisions faster, sometimes in real-time
• Scale entire system elastically
• Make it easy to “keep everything”
• Multiple applications can process data in parallel
16. A General Purpose Data Flow
Many different technologies, at different stages of evolution
Client/Sensor Aggregator Continuous
Processing
Storage Analytics +
Reporting
?
17. v
Kinesis
Movement or activity in response to a stimulus.
A fully managed service for real-time processing of high-
volume, streaming data. Kinesis can store and process
terabytes of data an hour from hundreds of thousands of
sources. Data is replicated across multiple Availability
Zones to ensure high durability and availability.
19. Scenarios Accelerated Ingest-Transform-Load Continual Metrics/ KPI Extraction Responsive Data Analysis
Data Types
IT infrastructure, Applications logs, Social media, Fin. Market data, Web Clickstreams, Sensors, Geo/Location data
Software/
Technology
IT server , App logs ingestion IT operational metrics dashboards Devices / Sensor Operational
Intelligence
Digital Ad Tech./
Marketing
Advertising Data aggregation Advertising metrics like coverage,
yield, conversion
Analytics on User engagement with
Ads, Optimized bid/ buy engines
Financial Services Market/ Financial Transaction order data
collection
Financial market data metrics Fraud monitoring, and Value-at-Risk
assessment, Auditing of market order
data
Consumer Online/
E-Commerce
Online customer engagement data
aggregation
Consumer engagement metrics like
page views, CTR
Customer clickstream analytics,
Recommendation engines
Customer Scenarios across Industry Segments
1 2 3
20. What Biz. Problem needs to be solved?
Mobile/ Social Gaming Digital Advertising Tech.
Deliver continuous/ real-time delivery of game insight
data by 100’s of game servers
Generate real-time metrics, KPIs for online ad performance
for advertisers/ publishers
Custom-built solutions operationally complex to
manage, & not scalable
Store + Forward fleet of log servers, and Hadoop based
processing pipeline
• Delay with critical business data delivery
• Developer burden in building reliable, scalable
platform for real-time data ingestion/ processing
• Slow-down of real-time customer insights
• Lost data with Store/ Forward layer
• Operational burden in managing reliable, scalable platform
for real-time data ingestion/ processing
• Batch-driven real-time customer insights
Accelerate time to market of elastic, real-time
applications – while minimizing operational overhead
Generate freshest analytics on advertiser performance to
optimize marketing spend, and increase responsiveness to
clients
21. INDEX
A. What is real-time?
B. Examples and Challenges
C. Kinesis and Beyond
1. Kinesis Stream
2. Kinesis Firehose
3. Kinesis Analytics (SQL)
4. DynamoDB Streams
5. ElasticSearch
6. IoT Rule Engine
D. Demo
22. INDEX
A. What is real-time?
B. Examples and Challenges
C. Kinesis and Beyond
1. Kinesis Stream
2. Kinesis Firehose
3. Kinesis Analytics (SQL)
4. DynamoDB Streams
5. ElasticSearch
6. IoT Rule Engine
D. Demo
23. v
Amazon Kinesis Streams
Build your own data streaming applications
• Easy administration: Simply create a new stream, and set the desired level of capacity
with shards. Scale to match your data throughput rate and volume.
• Build real-time applications: Perform continual processing on streaming big data using
Kinesis Client Library (KCL), Apache Spark/Storm, AWS Lambda, and more.
• Low cost: Cost-efficient for workloads of any scale.
24. Kinesis Architecture
Amazon Web Services
AZ AZ AZ
Durable, highly consistent storage replicates data
across three data centers (availability zones)
Aggregate and
archive to S3
Millions of
sources producing
100s of terabytes
per hour
Front
End
Authentication
Authorization
Ordered stream
of events supports
multiple readers
Real-time
dashboards
and alarms
Machine learning
algorithms or
sliding window
analytics
Aggregate analysis
in Hadoop or a
data warehouse
Inexpensive: $0.028 per million puts
Run code in response to an event and
automatically manage compute.
26. Kinesis Stream:
Managed ability to capture and store data
• Streams are made of Shards
• Each Shard ingests data up to
1MB/sec, and up to 1000 TPS
• Each Shard emits up to 2 MB/sec
• All data is stored for 24 hours
• Scale Kinesis streams by adding or
removing Shards
• Replay data inside of 24Hr. Window
27. Putting Data into Kinesis
Simple Put interface to store data in Kinesis
• Producers use a PUT call to store data in a Stream
• PutRecord {Data, PartitionKey, StreamName}
• A Partition Key is supplied by producer and used to
distribute the PUTs across Shards
• Kinesis MD5 hashes supplied partition key over the hash
key range of a Shard
• A unique Sequence # is returned to the Producer upon a
successful PUT call
29. Building Kinesis Processing Apps: Kinesis Client Library
Client library for fault-tolerant, at least-once, Continuous Processing
o Java client library, source available on Github
o Build & Deploy app with KCL on your EC2 instance(s)
o KCL is intermediary b/w your application & stream
Automatically starts a Kinesis Worker for each shard
Simplifies reading by abstracting individual shards
Increase / Decrease Workers as # of shards changes
Checkpoints to keep track of a Worker’s location in the
stream, Restarts Workers if they fail
o Integrates with AutoScaling groups to redistribute workers to
new instances
30. Amazon Kinesis Connector Library
Customizable, Open Source code to Connect Kinesis with S3, Redshift, DynamoDB
S3
DynamoDB
Redshift
Kinesis
ITransformer
• Defines the
transformation
of records
from the
Amazon
Kinesis
stream in
order to suit
the user-
defined data
model
IFilter
• Excludes
irrelevant
records from
the
processing.
IBuffer
• Buffers the set
of records to
be processed
by specifying
size limit (# of
records)&
total byte
count
IEmitter
• Makes client
calls to other
AWS services
and persists
the records
stored in the
buffer.
31. v
USE Cases
Ultra Low Latency Analytics (seconds)
Complex Computations
• => Complex algorithm execution
• => Tuple Processing – every bit of data processed independently vs.
aggregation where it goes from 1st row to last row.
• => Moving Window Analysis – moving car from 2nd to 3rd min and then
5th to 6th min.
32. INDEX
A. What is real-time?
B. Examples and Challenges
C. Kinesis and Beyond
1. Kinesis Stream
2. Kinesis Firehose
3. Kinesis Analytics (SQL)
4. DynamoDB Streams
5. ElasticSearch
6. IoT Rule Engine
D. Demo
33. v
Amazon Kinesis Firehose
Load massive volumes of streaming data into Amazon S3 and Amazon Redshift
• Zero administration: Capture and deliver streaming data into S3, Redshift, and other
destinations without writing an application or managing infrastructure.
• Direct-to-data store integration: Batch, compress, and encrypt streaming data for
delivery into data destinations in as little as 60 secs using simple configurations.
• Seamless elasticity: Seamlessly scales to match data throughput w/o intervention
Capture and submit streaming
data to Firehose
Firehose loads streaming data
continuously into S3 and Redshift
Analyze streaming data using your favorite BI
tools
34. v
Amazon Kinesis Firehose to Redshift
A two-step process
• Use customer-provided S3 bucket as an intermediate destination
• Still the most efficient way to do large scale loads to Redshift.
• Never lose data, always safe, and available in your S3 bucket.
• Firehose issues customer-provided COPY command synchronously. It
continuously issues a COPY command once the previous COPY
command is finished and acknowledged back from Redshift.
1
2
35. v
USE Cases
Kinesis Firehose used when needed to do batch with more frequency. As
long as analysis can be done with SQL.
Micro-batching scenarios with latencies more 60 second tolerable
In case of Redshift Target – Analytics that can be achieved with standard SQL
and User Defined Functions (UDFs)
Most “Real-Time Business Insights”
kind of scenarios can be easily supported with
Kinesis Firehose + Redshift!
36. INDEX
A. What is real-time?
B. Examples and Challenges
C. Kinesis and Beyond
1. Kinesis Stream
2. Kinesis Firehose
3. Kinesis Analytics (SQL)
4. DynamoDB Streams
5. ElasticSearch
6. IoT Rule Engine
D. Demo
37. v
Amazon Kinesis Analytics
Analyze data streams continuously with standard SQL
• Apply SQL on streams: Easily connect to data streams and apply existing SQL skills.
• Build real-time applications: Perform continual processing on streaming big data
with sub-second processing latencies
• Scale elastically: Elastically scales to match data throughput without any operator
intervention.
Announcement Only!
Amazon Confidential
Connect to Kinesis streams,
Firehose delivery streams
Run standard SQL queries against
data streams
Kinesis Analytics can send processed data to
analytics tools so you can create alerts and
respond in real-time
38. v
USE Cases
Low latency time series analytics
Analytics that can be achieved with confines of supported SQL
• - Running Totals
• - Moving Averages
• - Number of people entering a stadium
39. INDEX
A. What is real-time?
B. Examples and Challenges
C. Kinesis and Beyond
1. Kinesis Stream
2. Kinesis Firehose
3. Kinesis Analytics (SQL)
4. DynamoDB Streams
5. ElasticSearch
6. IoT Rule Engine
D. Demo
40. v
Amazon DynamoDB Streams – time-ordered sequence of item-
level changes
• Time and partition ordered log
• Provides a stream of inserts, deletes, updates
• Old item
• New item
• Primary key
• Change type
• Stream items delivered exactly once
• Streams are asynchronous
• Scales with your table
DynamoDB DynamoDB Streams
41. v
USE Cases
Ultra Low Latency Analytics (seconds) when data is available in
Kinesis and DynamoDB Stream, e.g.
Energy meters data coming into Kinesis, to continuously update
billing info.
Changes to social network profile stored in DynamoDB, to transmit
updates to connection immediately (e.g. user adds a new job to his
profile).
42. INDEX
A. What is real-time?
B. Examples and Challenges
C. Kinesis and Beyond
1. Kinesis Stream
2. Kinesis Firehose
3. Kinesis Analytics (SQL)
4. DynamoDB Stream and Kinesis Stream processing using Lambda
5. ElasticSearch
6. IoT Rule Engine
D. Demo
43. v
How Elasticsearch can help
• Combined with Logstash and Kibana, the ELK stack provides a tool for
real-time analytics and data visualization
53. v
USE Cases
Real-Time Dashboards (Kibana)
Alerting (Percolator API)
Real-Text Analytics, as in Social Media Listening
Real-Time Geospatial Queries and Geospatial Analysis
54. INDEX
A. What is real-time?
B. Examples and Challenges
C. Kinesis & Beyond
1. Kinesis Stream
2. Kinesis Firehose
3. Kinesis Analytics (SQL)
4. DynamoDB Stream and Kinesis Stream processing using Lambda
5. ElasticSearch
6. IoT Rule Engine
D. Demo
55. v
AWS IoT
“Securely connect one or one-billion devices to AWS,
so they can interact with applications and other devices”
56. v
AWS IoT
DEVICE SDK
Set of client libraries to connect,
authenticate and exchange
messages
DEVICE GATEWAY
Communicate with devices via
MQTT and HTTP
AUTHENTICATION
AUTHORIZATION
Secure with mutual
authentication and encryption
RULES ENGINE
Transform messages
based on rules and route
to AWS Services
AWS Services
- - - - -
3P Services
DEVICE SHADOW
Persistent thing state during
intermittent connections
APPLICATIONS
AWS IoT API
DEVICE REGISTRY
Identity and Management of
your things
57. v
USE Cases
Processing sensor data (millions of data points from hundreds of thousands of
sensors) in real time for Alerting
Redirecting sensor data for multi-data-point analysis to Kinesis, DynamoDB
60. INDEX
A. What is real-time?
B. Examples and Challenges
C. Kinesis & Beyond
1. Kinesis Stream
2. Kinesis Firehose
3. Kinesis Analytics (SQL)
4. DynamoDB Stream and Kinesis Stream processing using Lambda
5. ElasticSearch
6. IoT Rule Engine
D. Demo
61. v
Demo Time.
Website - https://secure.amitksh.net/cdn/webinarWeek.html
Real time updates from Kinesis - https://secure.amitksh.net/rtChart.html
65. Online Labs & Training
Gain confidence and hands-on
experience with AWS.
Watch free Instructional Videos and
explore Self-Paced Labs
Instructor Led Classes
Learn how to design, deploy and
operate highly available, cost-effective
and secure applications on AWS in
courses led by qualified AWS instructors
Validate your technical expertise
with AWS and use practice exams
to help you prepare for AWS
Certification
AWS Certification
More info at http://aws.amazon.com/training