SlideShare una empresa de Scribd logo
1 de 41
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
25th of October 2016
Real-Time Streaming
Intro to Amazon Kinesis
Roy Ben-Alta
Architect and Business Development Manager
Amazon Web Services - Big Data, IoT Analytics Solutions
Agenda
Streaming data on AWS and Customer scenarios
Amazon Kinesis platform overview
Demo
Q&A
Big Data
Storage
Data
Warehousing
Real-time
Streaming
Distributed Analytics
(Hadoop, Spark, Presto)
NoSQL
Databases
Business
Intelligence
Relational
Databases
Internet of
Things (IoT)
Machine
Learning
Server-less
Compute
AWS provides the broadest platform for big data
analytics in the market today.
AnalyzeStore
Glacier
S3
DynamoDB
RDS, Aurora
AWS Big Data Portfolio
Data Pipeline
CloudSearch
EMR EC2
Redshift
Machine
Learning
ElasticSearch
Database
Migration
QuickSight
New
Kinesis Analytics -
SQL over Streams
New
Kinesis
FirehoseAmazon Snowball
Direct Connect
Collect
Kinesis Stream
Mobile Apps Web Clickstream Application Logs
Metering Records IoT Sensors Smart Buildings
[Wed Oct 11 14:32:52
2000] [error] [client
127.0.0.1] client
denied by server
configuration:
/export/home/live/ap/h
tdocs/test
Most Data is produced continuously
What is streaming data? Data that is…
• Moving – captured and processed with low latency (in
ms to low single-digit sec)
• Small (typically < 5KB) with high frequency (from
hundreds to millions of data records per sec)
• Sequenced – Data is produced, captured, and
processed by sequence, by time or a derivative
Scenarios
Accelerated Ingest-
Transform-Load to final
destination
Continual Metrics/
KPI Extraction
Responsive Data
Analytics
Ad Tech/
Marketing Analytics
Advertising data aggregation Advertising metrics like coverage,
yield, conversion, scoring
webpages
User activity engagement
analytics, optimized bid/ buy
engines
Consumer Online/
Gaming
Online customer engagement data
aggregation
Consumer/ app engagement
metrics like page views, CTR
Customer clickstream analytics,
recommendation engines,
Financial Services Market/ financial transaction order
data collection
Financial market data metrics Fraud monitoring, and value-at-
risk assessment, auditing of
market order data
IoT / Sensor Data Fitness device , vehicle sensor,
telemetry data ingestion
Wearable sensor operational
metrics, and dashboards
Devices / sensor operational
intelligence
1 2 3
Streaming Data on AWS: Customer Scenarios
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis Platform
Amazon Kinesis
Streams
• For Technical Developers
• Build your own custom
applications that process
or analyze streaming
data
Amazon Kinesis
Firehose
• For ETL, Data Engineer
• Easily load massive
volumes of streaming data
into S3, Amazon Redshift
and Amazon Elasticsearch
service
Amazon Kinesis
Analytics
• For all developers, data
scientists
• Easily analyze data
streams using standard
SQL queries
Amazon Kinesis: Streaming Data Made Easy
Services make it easy to capture, deliver, process streams on AWS
Amazon Kinesis Streams
Build your own data streaming applications
Easy administration: Simply create a new stream, and set the desired level of
capacity with shards. Scale to match your data throughput rate and volume.
Build real-time applications: Perform continual processing on streaming big data
using Amazon Kinesis Client Library (KCL), Apache Spark/Storm, AWS Lambda, and
more.
Low cost: Cost-efficient for workloads of any scale.
Data
Sources
App.4
[Machine
Learning]
AWSEndpoint
App.1
[Aggregate &
De-Duplicate]
Data
Sources
Data
Sources
Data
Sources
App.2
[Metric
Extraction]
Amazon S3
Amazon Redshift
App.3
[Sliding
Window
Analysis]
Availability
Zone
Shard 1
Shard 2
Shard N
Availability
Zone
Availability
Zone
Amazon Kinesis Streams
Managed service for real-time streaming
AWS Lambda
Amazon EMR
• Streams are made of shards
• Each shard ingests up to 1 MB/sec, and
1000 records/sec
• Each shard emits up to 2 MB/sec
• All data is stored for 24 hours by
default; storage can be extended for
up to 7 days
• Scale Kinesis streams using scaling util
• Replay data inside of 24-hour window
Amazon Kinesis Streams
Managed Ability to Capture and store Data
Time-based
seek
HTTP
POST
AWS SDK
Sending Reading
Sending & Reading from Amazon Kinesis
Get* APIs
Amazon Kinesis
Client Library
Amazon EMR
AWS Lambda
Amazon
Kinesis
Agent
Amazon
Kinesis
Producer
Library
AWS IoT
Amazon Kinesis Streams 3rd Party Connectors
Amazon Kinesis Customer Base Diversity
1 billion events/wk from
connected devices | IoT
17 PB of game data per
season | Entertainment
80 billion ad
impressions/day, 30 ms
response time | Ad Tech
300 GB/day click streams
from 300+ sites |
Enterprise
50 billion ad
impressions/day sub-50
ms responses | Ad Tech
Sleep tracker sensor analysis
| IoT
Amazon Kinesis as Databus
Funnel all
production events
through Amazon
Kinesis
Scenarios
Accelerated Ingest-
Transform-Load to final
destination
Continual Metrics/
KPI Extraction
Responsive Data
Analytics
Ad Tech/
Marketing Analytics
Advertising data aggregation Advertising metrics like coverage,
yield, conversion, scoring
webpages
User activity engagement
analytics, optimized bid/ buy
engines
Consumer Online/
Gaming
Online customer engagement data
aggregation
Consumer/ app engagement
metrics like page views, CTR
Customer clickstream analytics,
recommendation engines,
Financial Services Market/ financial transaction order
data collection
Financial market data metrics Fraud monitoring, and value-at-
risk assessment, auditing of
market order data
IoT / Sensor Data Fitness device , vehicle sensor,
telemetry data ingestion
Wearable sensor operational
metrics, and dashboards
Devices / sensor operational
intelligence
1 2 3
Streaming Data on AWS: Customer Scenarios
Fault Tolerant ingestion and delivery at scale is difficult
A. All of the challenges associated with capturing the data
B. Anti-pattern to send data to many destinations with large
amounts of small records
A. Issues downstream create backpressure on producers
B. Do not want to stop processing if one portion of pipeline
is down
C. Need to perform data prep before persistence
(compression, encryption, etc.)
Amazon Kinesis Firehose
Load massive volumes of streaming data into Amazon S3, Amazon Redshift, Amazon
Elasticsearch service
Zero administration: Capture and deliver streaming data into Amazon S3, Amazon Redshift
and Amazon Elasticsearch without writing an application or managing infrastructure.
Direct-to-data store integration: Batch, compress, and encrypt streaming data for
delivery into data destinations in as little as 60 secs using simple configurations.
Seamless elasticity: Seamlessly scales to match data throughput w/o intervention
Capture and submit
streaming data to Firehose
Analyze streaming data using your
favorite BI tools
Firehose loads streaming data
continuously into S3, Amazon Redshift
and Amazon Elasticsearch
How does Amazon Kinesis Firehose help?
A. Provides scalable and durable ingest like Kinesis Streams (but
not ordering)
B. Configurable aggregation of data before writing to final
destination
C. Provides fully managed buffer with zero administration if
destinations have issues
D. Independently delivers data to different destinations; separates
out processing and delivery in the case of Kinesis Analytics
integration
E. Configurable data-prep options like compression and encryption
Amazon Kinesis Firehose vs. Amazon Kinesis
Streams
Amazon Kinesis Streams is for use cases that require custom
processing, per incoming record, with sub-1 second processing
latency, and a choice of stream processing frameworks.
Amazon Kinesis Firehose is for use cases that require zero
administration, ability to use existing analytics tools based on
Amazon S3, Amazon Redshift and Amazon Elasticsearch
Service and a data latency of 60 seconds or higher.
Scenarios
Accelerated Ingest-
Transform-Load to final
destination
Continual Metrics/
KPI Extraction
Responsive Data
Analytics
Ad Tech/
Marketing Analytics
Advertising data aggregation Advertising metrics like coverage,
yield, conversion, scoring
webpages
User activity engagement
analytics, optimized bid/ buy
engines
Consumer Online/
Gaming
Online customer engagement data
aggregation
Consumer/ app engagement
metrics like page views, CTR
Customer clickstream analytics,
recommendation engines,
Financial Services Market/ financial transaction order
data collection
Financial market data metrics Fraud monitoring, and value-at-
risk assessment, auditing of
market order data
IoT / Sensor Data Fitness device , vehicle sensor,
telemetry data ingestion
Wearable sensor operational
metrics, and dashboards
Devices / sensor operational
intelligence
1 2 3
Streaming Data on AWS: Customer Scenarios
Amazon Kinesis Analytics (New)
Apply SQL on streams: Easily connect to an Amazon Kinesis stream or Firehose
delivery Stream and apply SQL skills.
Build real-time applications: Perform continual processing on streaming big data
with sub-second processing latencies.
Easy Scalability: Elastically scales to match data throughput.
Connect to Amazon Kinesis
streams,
Firehose delivery streams
Run standard SQL queries
against data streams
Amazon Kinesis Analytics can send processed
data to analytics tools so you can create alerts
and respond in real-time
Use SQL To Build Real-Time Applications
Easily write SQL code to process
streaming data
Connect to streaming source
Continuously deliver SQL results
Real-time analytical patterns
• Pre-processing: filtering, transformations
• Basic Analytics: Simple counts, aggregates over
windows
• Advanced Analytics: Detecting anomalies, event
correlation
• Post-processing: Alerting, triggering, final filters
Demo
Building Serverless IoT Analytics stack
Amazon Kinesis
Streams
Amazon
Kinesis
Analytics
AWS Lambda
Function
DynamoDB
Table
Amazon
Cognitio
Amazon Kinesis
Streams
Streaming Data Platform for IoT Sensor Data
You have hundreds of IoT sensors that are producing data continuously that
you need to ingest, analyze, and store.
You will:
1. Produce data continuously from hundreds of (simulated) IoT sensors
2. Durably ingest the incoming data using Kinesis Streams
3. Filter and aggregate the data using Kinesis Analytics
4. Deliver processed data to S3 using Kinesis Firehose
Kinesis
Stream
Amazon
S3
Kinesis
Firehose
Kinesis
Analytics
IoT sensors
Mature Streaming IoT Platform Architecture
Kinesis
Stream
Amazon
S3
Kinesis
Firehose
Kinesis
Analytics
IoT sensors
Kinesis
Stream
Amazon
DynamoDB
AWS
Lambda
Amazon
CloudWatch
CloudWatch
Alarms
CloudWatch
Dashboard
AWS IoT
IoT
rule
IoT
action
Kinesis
Firehose
Amazon
S3
AWS
Lambda
Amazon
SNS
mobile client
raw data
live
metric
alarm
threshold
Event
w/ metadata
Summary
1. Leverage AWS managed services;
Scalable/elastic, available, reliable, secure, no/low admin
2. Amazon kinesis is platform for streaming data ingestion, processing, and Analytics.
3. Create your first Amazon Kinesis stream. Configure hundreds of thousands of data
producers to put data into an Amazon Kinesis stream.
4. Choose the Processing framework for your use case:
Kinesis Analytics, Amazon EMR with Spark, Lambda, and more.
5. Check out our AWS Big Data Blog and find useful code sample for your use case:
• Getting started with Amazon Kinesis Analytics
• Spark SQL and Amazon Kinesis Streams
• Building Near Real-Time Discover platform using Kinesis Firehose and Lambda
Thank you!
aws.amazon.com/kinesis
Reference
We have many AWS Big Data Blogs which cover more examples. Full list here. Some good
ones:
1. Kinesis Streams
1. Implement Efficient and Reliable Producers with the Amazon Kinesis Producer Library
2. Presto and Amazon Kinesis
3. Querying Amazon Kinesis Streams Directly with SQL and Sparking Streaming
4. Optimize Spark-Streaming to Efficiently Process Amazon Kinesis Streams
2. Kinesis Firehose
1. Persist Streaming Data to Amazon S3 using Amazon Kinesis Firehose and AWS Lambda
2. Building a Near Real-Time Discovery Platform with AWS
3. Kinesis Analytics
1. Writing SQL on Streaming Data With Amazon Kinesis Analytics Part 1 | Part 2
2. Real-time Clickstream Anomaly Detection with Amazon Kinesis Analytics
• Technical documentations
• Amazon Kinesis Agent
• Amazon Kinesis Streams and Spark Streaming
• Amazon Kinesis Producer Library Best Practice
• Amazon Kinesis Firehose and AWS Lambda
• Building Near Real-Time Discovery Platform with Amazon Kinesis
• Public case studies
• Glu mobile – Real-Time Analytics
• Hearst Publishing – Clickstream Analytics
• How Sonos Leverages Amazon Kinesis
• Nordstorm Online Stylist
Reference
APPENDIX
Putting Data into a Kinesis stream
• Data producers call PutRecord(s) to send
data to a Kinesis stream
• PutRecord {Data,StreamName,PartitionKey}
• PutRecords {Records{Data,PartitionKey}, StreamName}
• A Partition Key is supplied by producer and
used to distribute (MD5 hash) the PUTs
across (hash key range) of Shards
• A unique Sequence # is returned to the
Producer upon a successful PUT call
• Options: AWS SDKs, Kinesis Producer
Library (KPL), Kinesis Agent, FluentD,
Flume, and more…
Producer
Producer
Producer
Producer
Producer
Producer
Producer
Kinesis stream
Shard 1
Shard 2
Shard 3
Shard 4
Shard n
Most data producers are not reliable
Connectivity – producers are not always connected, need
to send data with low latency
Durability – producers have limited or no local storage,
need to get data elsewhere quickly or its lost
Efficiency – producers primary job is not data collection,
need low overhead for sending data
Distributed – large number of producers, need to receive
and maintain order across different keys
Getting Data from a Kinesis stream
• Each shard is polled continuously using
GetRecords, determine where to start
using GetSharIterator
• GetRecords {Limit, ShardIterator}
• GetShardIterator {StreamName, ShardId,
ShardIteratorType, StartingSequenceNumber,
Timestamp}
• Options: Kinesis Client Library (KCL) on
EC2, AWS Lambda, Spark Streaming
(EMR), Storm on EC2
• (Almost) All solutions use the KCL under
the hood
Kinesis stream
Shard 1
Shard 2
Shard 3
Shard 4
Shard n
Consumer
Consumer
Consumer
Why Amazon Kinesis Client Library?
• Open source client library available for Java, Ruby, Python,
Node.JS dev
• Deploy on your EC2 instances, scales easily with Elastic
Beanstalk
• Manages consumer to shard mapping and checkpoints
• KCL Application includes three components:
1. Record Processor Factory – Creates the record processor
2. Record Processor – Processor unit that processes data from a
shard in Amazon Kinesis Streams
3. Worker – Processing unit that maps to each application instance
Streaming Architecture Workflow: Lambda + Kinesis
Data Input Amazon
Kinesis
Action Lambda Data Output
IT application activity
Capture the
stream
Audit
Process the
stream
SNS
Metering records Condense Amazon Redshift
Change logs Backup S3
Financial data Store RDS
Transaction orders Process SQS
Server health metrics Monitor EC2
User clickstream Analyze EMR
IoT device data Respond Backend endpoint
Custom data Custom action Custom application
Nordstrom Online Stylist
Let’s start with a product that empowers our Data Creators –
Buzzing@Hearst
Hearst’s Serverless Data Pipeline

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Aurora는 어떻게 다른가 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
Aurora는 어떻게 다른가 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 GamingAurora는 어떻게 다른가 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
Aurora는 어떻게 다른가 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
 
Intro to AWS: Database Services
Intro to AWS: Database ServicesIntro to AWS: Database Services
Intro to AWS: Database Services
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
 
AWS Route53
AWS Route53AWS Route53
AWS Route53
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Aws glue를 통한 손쉬운 데이터 전처리 작업하기Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
 
Backing up Amazon EC2 with Amazon EBS Snapshots - June 2017 AWS Online Tech T...
Backing up Amazon EC2 with Amazon EBS Snapshots - June 2017 AWS Online Tech T...Backing up Amazon EC2 with Amazon EBS Snapshots - June 2017 AWS Online Tech T...
Backing up Amazon EC2 with Amazon EBS Snapshots - June 2017 AWS Online Tech T...
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed Instance
 
Azure Storage
Azure StorageAzure Storage
Azure Storage
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Getting Started with Amazon Kinesis
Getting Started with Amazon KinesisGetting Started with Amazon Kinesis
Getting Started with Amazon Kinesis
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - Webinar
 
AWS glue technical enablement training
AWS glue technical enablement trainingAWS glue technical enablement training
AWS glue technical enablement training
 

Destacado

Destacado (7)

Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
Getting Started with Amazon CloudSearch
Getting Started with Amazon CloudSearchGetting Started with Amazon CloudSearch
Getting Started with Amazon CloudSearch
 
Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913
Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913
Using Amazon CloudSearch With Databases - CloudSearch Meetup 061913
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
 
Machine Learning & Data Lake for IoT scenarios on AWS
Machine Learning & Data Lake for IoT scenarios on AWSMachine Learning & Data Lake for IoT scenarios on AWS
Machine Learning & Data Lake for IoT scenarios on AWS
 

Similar a Real-Time Streaming: Intro to Amazon Kinesis

Similar a Real-Time Streaming: Intro to Amazon Kinesis (20)

Path to the future #4 - Ingestão, processamento e análise de dados em tempo real
Path to the future #4 - Ingestão, processamento e análise de dados em tempo realPath to the future #4 - Ingestão, processamento e análise de dados em tempo real
Path to the future #4 - Ingestão, processamento e análise de dados em tempo real
 
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
 
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
 
Analyzing Real-time Streaming Data with Amazon Kinesis
Analyzing Real-time Streaming Data with Amazon KinesisAnalyzing Real-time Streaming Data with Amazon Kinesis
Analyzing Real-time Streaming Data with Amazon Kinesis
 
Introduction to Amazon Kinesis Firehose - AWS August Webinar Series
Introduction to Amazon Kinesis Firehose - AWS August Webinar SeriesIntroduction to Amazon Kinesis Firehose - AWS August Webinar Series
Introduction to Amazon Kinesis Firehose - AWS August Webinar Series
 
Driving Business Outcomes with a Modern Data Architecture - Level 100
Driving Business Outcomes with a Modern Data Architecture - Level 100Driving Business Outcomes with a Modern Data Architecture - Level 100
Driving Business Outcomes with a Modern Data Architecture - Level 100
 
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Analysing Data in Real-time
Analysing Data in Real-timeAnalysing Data in Real-time
Analysing Data in Real-time
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
 
Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics w...
Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics w...Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics w...
Slashing Big Data Complexity: How Comcast X1 Syndicates Streaming Analytics w...
 
Intro Presentation at AWS AWSome Day Glasgow September 2015
Intro Presentation at AWS AWSome Day Glasgow September 2015Intro Presentation at AWS AWSome Day Glasgow September 2015
Intro Presentation at AWS AWSome Day Glasgow September 2015
 
Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon Kinesis
 
Getting started with amazon kinesis
Getting started with amazon kinesisGetting started with amazon kinesis
Getting started with amazon kinesis
 
Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWS
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
 

Más de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Real-Time Streaming: Intro to Amazon Kinesis

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 25th of October 2016 Real-Time Streaming Intro to Amazon Kinesis Roy Ben-Alta Architect and Business Development Manager Amazon Web Services - Big Data, IoT Analytics Solutions
  • 2. Agenda Streaming data on AWS and Customer scenarios Amazon Kinesis platform overview Demo Q&A
  • 3. Big Data Storage Data Warehousing Real-time Streaming Distributed Analytics (Hadoop, Spark, Presto) NoSQL Databases Business Intelligence Relational Databases Internet of Things (IoT) Machine Learning Server-less Compute AWS provides the broadest platform for big data analytics in the market today.
  • 4. AnalyzeStore Glacier S3 DynamoDB RDS, Aurora AWS Big Data Portfolio Data Pipeline CloudSearch EMR EC2 Redshift Machine Learning ElasticSearch Database Migration QuickSight New Kinesis Analytics - SQL over Streams New Kinesis FirehoseAmazon Snowball Direct Connect Collect Kinesis Stream
  • 5. Mobile Apps Web Clickstream Application Logs Metering Records IoT Sensors Smart Buildings [Wed Oct 11 14:32:52 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/h tdocs/test Most Data is produced continuously
  • 6. What is streaming data? Data that is… • Moving – captured and processed with low latency (in ms to low single-digit sec) • Small (typically < 5KB) with high frequency (from hundreds to millions of data records per sec) • Sequenced – Data is produced, captured, and processed by sequence, by time or a derivative
  • 7. Scenarios Accelerated Ingest- Transform-Load to final destination Continual Metrics/ KPI Extraction Responsive Data Analytics Ad Tech/ Marketing Analytics Advertising data aggregation Advertising metrics like coverage, yield, conversion, scoring webpages User activity engagement analytics, optimized bid/ buy engines Consumer Online/ Gaming Online customer engagement data aggregation Consumer/ app engagement metrics like page views, CTR Customer clickstream analytics, recommendation engines, Financial Services Market/ financial transaction order data collection Financial market data metrics Fraud monitoring, and value-at- risk assessment, auditing of market order data IoT / Sensor Data Fitness device , vehicle sensor, telemetry data ingestion Wearable sensor operational metrics, and dashboards Devices / sensor operational intelligence 1 2 3 Streaming Data on AWS: Customer Scenarios
  • 8. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Platform
  • 9. Amazon Kinesis Streams • For Technical Developers • Build your own custom applications that process or analyze streaming data Amazon Kinesis Firehose • For ETL, Data Engineer • Easily load massive volumes of streaming data into S3, Amazon Redshift and Amazon Elasticsearch service Amazon Kinesis Analytics • For all developers, data scientists • Easily analyze data streams using standard SQL queries Amazon Kinesis: Streaming Data Made Easy Services make it easy to capture, deliver, process streams on AWS
  • 10. Amazon Kinesis Streams Build your own data streaming applications Easy administration: Simply create a new stream, and set the desired level of capacity with shards. Scale to match your data throughput rate and volume. Build real-time applications: Perform continual processing on streaming big data using Amazon Kinesis Client Library (KCL), Apache Spark/Storm, AWS Lambda, and more. Low cost: Cost-efficient for workloads of any scale.
  • 11. Data Sources App.4 [Machine Learning] AWSEndpoint App.1 [Aggregate & De-Duplicate] Data Sources Data Sources Data Sources App.2 [Metric Extraction] Amazon S3 Amazon Redshift App.3 [Sliding Window Analysis] Availability Zone Shard 1 Shard 2 Shard N Availability Zone Availability Zone Amazon Kinesis Streams Managed service for real-time streaming AWS Lambda Amazon EMR
  • 12. • Streams are made of shards • Each shard ingests up to 1 MB/sec, and 1000 records/sec • Each shard emits up to 2 MB/sec • All data is stored for 24 hours by default; storage can be extended for up to 7 days • Scale Kinesis streams using scaling util • Replay data inside of 24-hour window Amazon Kinesis Streams Managed Ability to Capture and store Data Time-based seek
  • 13. HTTP POST AWS SDK Sending Reading Sending & Reading from Amazon Kinesis Get* APIs Amazon Kinesis Client Library Amazon EMR AWS Lambda Amazon Kinesis Agent Amazon Kinesis Producer Library AWS IoT
  • 14. Amazon Kinesis Streams 3rd Party Connectors
  • 15. Amazon Kinesis Customer Base Diversity 1 billion events/wk from connected devices | IoT 17 PB of game data per season | Entertainment 80 billion ad impressions/day, 30 ms response time | Ad Tech 300 GB/day click streams from 300+ sites | Enterprise 50 billion ad impressions/day sub-50 ms responses | Ad Tech Sleep tracker sensor analysis | IoT Amazon Kinesis as Databus Funnel all production events through Amazon Kinesis
  • 16. Scenarios Accelerated Ingest- Transform-Load to final destination Continual Metrics/ KPI Extraction Responsive Data Analytics Ad Tech/ Marketing Analytics Advertising data aggregation Advertising metrics like coverage, yield, conversion, scoring webpages User activity engagement analytics, optimized bid/ buy engines Consumer Online/ Gaming Online customer engagement data aggregation Consumer/ app engagement metrics like page views, CTR Customer clickstream analytics, recommendation engines, Financial Services Market/ financial transaction order data collection Financial market data metrics Fraud monitoring, and value-at- risk assessment, auditing of market order data IoT / Sensor Data Fitness device , vehicle sensor, telemetry data ingestion Wearable sensor operational metrics, and dashboards Devices / sensor operational intelligence 1 2 3 Streaming Data on AWS: Customer Scenarios
  • 17. Fault Tolerant ingestion and delivery at scale is difficult A. All of the challenges associated with capturing the data B. Anti-pattern to send data to many destinations with large amounts of small records A. Issues downstream create backpressure on producers B. Do not want to stop processing if one portion of pipeline is down C. Need to perform data prep before persistence (compression, encryption, etc.)
  • 18. Amazon Kinesis Firehose Load massive volumes of streaming data into Amazon S3, Amazon Redshift, Amazon Elasticsearch service Zero administration: Capture and deliver streaming data into Amazon S3, Amazon Redshift and Amazon Elasticsearch without writing an application or managing infrastructure. Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery into data destinations in as little as 60 secs using simple configurations. Seamless elasticity: Seamlessly scales to match data throughput w/o intervention Capture and submit streaming data to Firehose Analyze streaming data using your favorite BI tools Firehose loads streaming data continuously into S3, Amazon Redshift and Amazon Elasticsearch
  • 19. How does Amazon Kinesis Firehose help? A. Provides scalable and durable ingest like Kinesis Streams (but not ordering) B. Configurable aggregation of data before writing to final destination C. Provides fully managed buffer with zero administration if destinations have issues D. Independently delivers data to different destinations; separates out processing and delivery in the case of Kinesis Analytics integration E. Configurable data-prep options like compression and encryption
  • 20. Amazon Kinesis Firehose vs. Amazon Kinesis Streams Amazon Kinesis Streams is for use cases that require custom processing, per incoming record, with sub-1 second processing latency, and a choice of stream processing frameworks. Amazon Kinesis Firehose is for use cases that require zero administration, ability to use existing analytics tools based on Amazon S3, Amazon Redshift and Amazon Elasticsearch Service and a data latency of 60 seconds or higher.
  • 21. Scenarios Accelerated Ingest- Transform-Load to final destination Continual Metrics/ KPI Extraction Responsive Data Analytics Ad Tech/ Marketing Analytics Advertising data aggregation Advertising metrics like coverage, yield, conversion, scoring webpages User activity engagement analytics, optimized bid/ buy engines Consumer Online/ Gaming Online customer engagement data aggregation Consumer/ app engagement metrics like page views, CTR Customer clickstream analytics, recommendation engines, Financial Services Market/ financial transaction order data collection Financial market data metrics Fraud monitoring, and value-at- risk assessment, auditing of market order data IoT / Sensor Data Fitness device , vehicle sensor, telemetry data ingestion Wearable sensor operational metrics, and dashboards Devices / sensor operational intelligence 1 2 3 Streaming Data on AWS: Customer Scenarios
  • 22. Amazon Kinesis Analytics (New) Apply SQL on streams: Easily connect to an Amazon Kinesis stream or Firehose delivery Stream and apply SQL skills. Build real-time applications: Perform continual processing on streaming big data with sub-second processing latencies. Easy Scalability: Elastically scales to match data throughput. Connect to Amazon Kinesis streams, Firehose delivery streams Run standard SQL queries against data streams Amazon Kinesis Analytics can send processed data to analytics tools so you can create alerts and respond in real-time
  • 23. Use SQL To Build Real-Time Applications Easily write SQL code to process streaming data Connect to streaming source Continuously deliver SQL results
  • 24. Real-time analytical patterns • Pre-processing: filtering, transformations • Basic Analytics: Simple counts, aggregates over windows • Advanced Analytics: Detecting anomalies, event correlation • Post-processing: Alerting, triggering, final filters
  • 25. Demo
  • 26. Building Serverless IoT Analytics stack Amazon Kinesis Streams Amazon Kinesis Analytics AWS Lambda Function DynamoDB Table Amazon Cognitio Amazon Kinesis Streams
  • 27. Streaming Data Platform for IoT Sensor Data You have hundreds of IoT sensors that are producing data continuously that you need to ingest, analyze, and store. You will: 1. Produce data continuously from hundreds of (simulated) IoT sensors 2. Durably ingest the incoming data using Kinesis Streams 3. Filter and aggregate the data using Kinesis Analytics 4. Deliver processed data to S3 using Kinesis Firehose Kinesis Stream Amazon S3 Kinesis Firehose Kinesis Analytics IoT sensors
  • 28. Mature Streaming IoT Platform Architecture Kinesis Stream Amazon S3 Kinesis Firehose Kinesis Analytics IoT sensors Kinesis Stream Amazon DynamoDB AWS Lambda Amazon CloudWatch CloudWatch Alarms CloudWatch Dashboard AWS IoT IoT rule IoT action Kinesis Firehose Amazon S3 AWS Lambda Amazon SNS mobile client raw data live metric alarm threshold Event w/ metadata
  • 29. Summary 1. Leverage AWS managed services; Scalable/elastic, available, reliable, secure, no/low admin 2. Amazon kinesis is platform for streaming data ingestion, processing, and Analytics. 3. Create your first Amazon Kinesis stream. Configure hundreds of thousands of data producers to put data into an Amazon Kinesis stream. 4. Choose the Processing framework for your use case: Kinesis Analytics, Amazon EMR with Spark, Lambda, and more. 5. Check out our AWS Big Data Blog and find useful code sample for your use case: • Getting started with Amazon Kinesis Analytics • Spark SQL and Amazon Kinesis Streams • Building Near Real-Time Discover platform using Kinesis Firehose and Lambda
  • 31. Reference We have many AWS Big Data Blogs which cover more examples. Full list here. Some good ones: 1. Kinesis Streams 1. Implement Efficient and Reliable Producers with the Amazon Kinesis Producer Library 2. Presto and Amazon Kinesis 3. Querying Amazon Kinesis Streams Directly with SQL and Sparking Streaming 4. Optimize Spark-Streaming to Efficiently Process Amazon Kinesis Streams 2. Kinesis Firehose 1. Persist Streaming Data to Amazon S3 using Amazon Kinesis Firehose and AWS Lambda 2. Building a Near Real-Time Discovery Platform with AWS 3. Kinesis Analytics 1. Writing SQL on Streaming Data With Amazon Kinesis Analytics Part 1 | Part 2 2. Real-time Clickstream Anomaly Detection with Amazon Kinesis Analytics
  • 32. • Technical documentations • Amazon Kinesis Agent • Amazon Kinesis Streams and Spark Streaming • Amazon Kinesis Producer Library Best Practice • Amazon Kinesis Firehose and AWS Lambda • Building Near Real-Time Discovery Platform with Amazon Kinesis • Public case studies • Glu mobile – Real-Time Analytics • Hearst Publishing – Clickstream Analytics • How Sonos Leverages Amazon Kinesis • Nordstorm Online Stylist Reference
  • 34. Putting Data into a Kinesis stream • Data producers call PutRecord(s) to send data to a Kinesis stream • PutRecord {Data,StreamName,PartitionKey} • PutRecords {Records{Data,PartitionKey}, StreamName} • A Partition Key is supplied by producer and used to distribute (MD5 hash) the PUTs across (hash key range) of Shards • A unique Sequence # is returned to the Producer upon a successful PUT call • Options: AWS SDKs, Kinesis Producer Library (KPL), Kinesis Agent, FluentD, Flume, and more… Producer Producer Producer Producer Producer Producer Producer Kinesis stream Shard 1 Shard 2 Shard 3 Shard 4 Shard n
  • 35. Most data producers are not reliable Connectivity – producers are not always connected, need to send data with low latency Durability – producers have limited or no local storage, need to get data elsewhere quickly or its lost Efficiency – producers primary job is not data collection, need low overhead for sending data Distributed – large number of producers, need to receive and maintain order across different keys
  • 36. Getting Data from a Kinesis stream • Each shard is polled continuously using GetRecords, determine where to start using GetSharIterator • GetRecords {Limit, ShardIterator} • GetShardIterator {StreamName, ShardId, ShardIteratorType, StartingSequenceNumber, Timestamp} • Options: Kinesis Client Library (KCL) on EC2, AWS Lambda, Spark Streaming (EMR), Storm on EC2 • (Almost) All solutions use the KCL under the hood Kinesis stream Shard 1 Shard 2 Shard 3 Shard 4 Shard n Consumer Consumer Consumer
  • 37. Why Amazon Kinesis Client Library? • Open source client library available for Java, Ruby, Python, Node.JS dev • Deploy on your EC2 instances, scales easily with Elastic Beanstalk • Manages consumer to shard mapping and checkpoints • KCL Application includes three components: 1. Record Processor Factory – Creates the record processor 2. Record Processor – Processor unit that processes data from a shard in Amazon Kinesis Streams 3. Worker – Processing unit that maps to each application instance
  • 38. Streaming Architecture Workflow: Lambda + Kinesis Data Input Amazon Kinesis Action Lambda Data Output IT application activity Capture the stream Audit Process the stream SNS Metering records Condense Amazon Redshift Change logs Backup S3 Financial data Store RDS Transaction orders Process SQS Server health metrics Monitor EC2 User clickstream Analyze EMR IoT device data Respond Backend endpoint Custom data Custom action Custom application
  • 40. Let’s start with a product that empowers our Data Creators – Buzzing@Hearst