SlideShare una empresa de Scribd logo
1 de 49
Descargar para leer sin conexión
Big Data Analytics Evolvement and Innovation
From Batch to Streaming
Yubo Wang
Product Mgr, GCR
Time 16:00-16:40
Level 200
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
• Real-time streaming data overview
• Streaming data services
• Benefits of streaming analytics
• Batch to streaming best practices
• How Amazon Flex moved from batch to streaming
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is batch processing?
Execution of a series of jobs in a program on a
computer without manual intervention - Wikipedia
• Data is collected over a period of time
• Process and analyze on a schedule
• Combine several processes to obtain final result
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What Is Real-time Data?
Mobile Apps Web Clickstream Application Logs
Metering Records IoT Sensors Smart Buildings
[Wed Oct 11 14:32:52
2000] [error] [client
127.0.0.1] client denied
by server configuration:
/export/home/live/ap/htdo
cs/test
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Hourly server logs
• Weekly or monthly bills
• Daily web-site clickstream
• Daily fraud reports
• Real time metrics
• Real time spending alerts/caps
• Real time clickstream analysis
• Real time detection
It’s All About the Pace
Batch Processing Stream Processing
The diminishing value of data
Recent data is highly valuable
• If you act on it in time
• Perishable insights (M. Gualtieri,
Forrester)
Old + recent data is more
valuable
• If you have the means to combine
them
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Simple Pattern for Streaming
Data
Continuously creates
data
Continuously writes
data to a stream
Can be almost anything
Data Producer
Durably stores data
Provides temporary
buffer that preps data
Supports very high-
throughput
Streaming Service
Continuously processes
data
Cleans, prepares, &
aggregates
Transforms data to
information
Data Consumer
Mobile Client Amazon Kinesis Amazon Kinesis app
Processing real-time, streaming data
• Durable
• Continuous
• Fast
• Correct
• Reactive
• Reliable
What are the key requirements?
Collect Transform Analyze React Persist
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis
Amazon Kinesis
Data Streams
Amazon Kinesis
Data Analytics
Amazon Kinesis
Data Firehose
Build custom
applications that process
and analyze streaming
data
Easily process and
analyze streaming data
with standard SQL
Easily load streaming
data into AWS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis Data Streams
• Easy administration and low cost
• Build real time applications with framework of choice
• Secure, durable storage
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis Data Analytics
• Powerful real time applications
• Easy to use, fully managed
• Automatic elasticity
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis Data Firehose
• Zero administration and seamless elasticity
• Direct-to-data store integration
• Serverless, continuous data transformations
Amazon S3
Amazon Redshift
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis Data Analytics Applications
Easily write SQL code to process streaming data
Connect to streaming source
Continuously deliver SQL results
Video is critical to many applications
Smart Home Security MonitoringSmart City
Industrial Automation Computer Vision
Unlocking Real-Time Video Analytics
Find the Red Car
Amazon Kinesis—Real-Time Analytics
Easily collect, process, and analyze video and data streams in real time
Capture, process,
and store video
streams for analytics
Load data streams
into AWS data stores
Analyze data streams
with SQL
Build custom
applications that
analyze data streams
Kinesis Video Streams Kinesis Data Streams Kinesis Data Firehose Kinesis Data Analytics
New at re:Invent 2017
Stream video from millions of devices
Easily build vision-enabled apps
Secure
Durable, searchable storage
Fully managed
Amazon Kinesis Video Streams
S t r e a m v i d e o a n d t i m e - e n c o d e d d a t a f o r a n a l y t i c s
Amazon AI
Services
Apache MxNet
TensorFlow
Custom Video
Processing
3rd Party
Partners
Kinesis Video
Streams
Use case: Smart Home
Example: Pet Monitor
Use case: Smart City
Example: Amber Alert System
Use case: Industrial Automation
Example: Equipment Preventive Maintenance
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of streaming analysis
Immediate results
• Real-time
aggregations
• Filtering
• Anomaly detection
Reduced
complexity
• Fewer scheduled
jobs to manage
• Kinesis is a fully-
managed solution
Scalable
• Enables parallel
processing
• Horizontally scales,
based on your
ingest rate
Real-Time Analytics Core Use
CasesScenarios/
Verticals
Accelerated Ingest-
Transform-Load
Continuous Metrics
Generation
Machine Learning and
Actionable Insights
Digital Ad
Tech/Marketing
Publisher, bidder data
aggregation
Advertising metrics like
coverage, yield, and
conversion
User engagement with ads,
optimized bid/buy engines
IoT Sensor, device telemetry
data ingestion
Operational metrics and
dashboards
Device operational
intelligence and alerts
Gaming Online data aggregation,
e.g., top 10 players
Massively multiplayer online
game (MMOG) live
dashboard
Leader board generation,
player-skill match
Consumer
Online
Clickstream analytics Metrics like impressions and
page views
Recommendation engines,
proactive care
Operation
Security
DevOps tools, ingesting
VPC Flow Logs
Subscribe to Amazon
CloudWatch Logs and
analyze logs in real-time
Anomaly detection
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Streaming
Ingest-
Transform-
Load
• Continuous
Metric
Generation
• Actionable
Insights
Three Common Scenarios
Compute analytics as the data is generated
React to analytics based off of insights
Deliver data to analytics tools faster and
cheaper
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Web Analytics and
Leaderboards
Amazon
Kinesis Data
Analytics
AWS
Lambda
function
Amazon
Cognito
Lightweight JS
client code
Web Server on
Amazon EC2
Instance
OR
Amazon
DynamoDB
Table
Amazon
Kinesis Data
Streams
Compute top 10 usersIngest web app data Persist to feed live apps
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitoring IoT Devices
IoT sensors AWS IoT
Amazon
RDS
MySQL DB
instance
Amazon
Kinesis
Data
Streams
Amazon
Kinesis
Data
Analytics
AWS
Lambda function
Compute avg temp
every 10 sec
Ingest sensor data
Persist time series
analytic to database
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analyzing CloudTrail Event Logs
AWS
CloudTrail
Amazon
CloudWatch
events trigger
AmazonKinesis
DataAnalytics
AWS Lambda
function
Amazon S3
bucket for raw
data
Amazon S3 bucket
for processed data
Amazon
DynamoDB
Table(s)
Chart.JS
Dashboard
Compute
operational metrics
Ingest and deliver raw
log data
Deliver to a real time
dashboards and archival
AmazonKinesis
DataFirehose
AmazonKinesis
DataFirehose
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Batch to streaming best practices
Migrate incrementally
• Don’t boil the ocean
• Begin by streaming data
in parallel to existing
batch processes
• Persist streaming data
into durable storage,
like Amazon S3
• Add in streaming
analysis results to
replace batch analysis
Application databases Data warehouseData producer
Amazon Kinesis
ETL
ETL
Amazon S3
Streaming
data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Batch to streaming best practices
Perform ITL rather than ETL
• ITL: Ingest-Transform-Load
• ETL: Extract-Transform-Load
• Transform data in near-real time
rather than a scheduled job
• Enrich data in near-real time
• Persist transformed and/or
enriched data
Data producer
Amazon Kinesis
Firehose
Raw streaming
data
AWS Lambda
function
Amazon S3
Transformed
data
Transform
data
Enrichment
source data
Raw data Transformed and/or
enriched data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Batch to streaming best practices
Aggregate upon arrival
• Continuously write raw data
to persistent data store for
archival and other analysis
• Aggregate in real time when
window size < 1 hour
• Write aggregated data to
persistent data store for
immediate value
Amazon Kinesis
Firehose
Raw streaming
data
Amazon S3
Raw
data
Aggregated
data
Amazon Kinesis
Analytics
Aggregate
Results
Data producer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Batch to streaming example
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Amazon delivery app (Android/iOS)
• Crowd-sourced model launched in
30+ U.S. cities
• Used by Amazon Logistics worldwide
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Deliveries for Amazon.com, Prime
Now, Amazon Fresh, restaurants,
grocery stores
• Millions of packages per year
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The problem
• Collecting, processing, and storing telemetry data
• Telemetry data = remote measurements
• Includes metrics, crashes, logs, sensor data, clickstream data, etc.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The goal
• Understand what’s happening in the field
• Analyze all the data and make performance optimizations
• Focus our time on improving the app and the delivery flow
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Getting from batch to streaming
• To solve our use cases, we had to incrementally improve our system
• We evolved from a batch-based system to a stream-based system
• Let’s walk through the iterations
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Collect metrics and send to an existing metrics service
• ETL jobs to load data into a big Oracle Data Warehouse
Iteration 1: Use existing systems
Existing metrics serviceApp DW
ETL
Data
collection
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• 1. Batch process with 24-hour delay
• 2. Fixed, inflexible DB schema
• 3. Analysis difficult and slow via SQL
Iteration 1: Use existing systems
Existing metrics serviceApp DW
ETL
Data
collection
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Collect metrics in the app using AWS Amazon Mobile Analytics SDK, which
automatically loads data into Redshift
Iteration 2: Use AWS
App
CloudFormation
ETL system
Data
collection
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• 1. Batch process with 24-hour delay 2-hour delay
• 2. Fixed, inflexible DB schema
• 3. Analysis difficult and slow via SQL
Iteration 2: Use AWS
App
CloudFormation
ETL system
Data
Collection
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Add shared configuration that is used in the app and automatically updates
the Redshift schema
Iteration 3: Automated DB schema
App
CloudFormation
ETL system
Data
collection
Schema config
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• 1. Batch process with 24-hour delay 2-hour delay
• 2. Fixed, inflexible Auto-updating DB schema
• 3. Analysis difficult and slow via SQL
Iteration 3: Automated DB
schema
App
Schema config
CloudFormation
ETL system
Data
collection
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Introduce a Kinesis stream and Kinesis Firehose to publish to Redshift
• Partition data by date to simplify data retention policies
Iteration 4: Use Streams
App
Data
collection Via Pinpoint
Schema
config
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• 1. Batch Streaming process with 24-hour 2 hour a delay of a couple
minutes
• 2. Fixed, inflexible Auto-updating DB schema
• 3. Analysis difficult and slow via SQL
Iteration 4: Use Streams
App
Data
collection Via Pinpoint
Schema
config
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Use generic message types
• Publish the data to:
• S3
• Redshift
• ElasticSearch
Iteration 5: Generic message
types
App
ElasticSearch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Iteration 5
App
Data
collection
ElasticSearch
Consumer Lambdas
SQL reports
Dashboards
ProtoBuf
Consumer Redshifts
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• 1. Batch Streaming process with 24-hour 2 hour a few seconds delay
• 2. Fixed, inflexible Auto-updating DB schema and generic message types
• 3. Analysis difficult and slow via SQL flexible by processing message
payload
Iteration 5: Generic message
types
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of Streaming
1. Agility: real-time data means your business can react quicker
2. Flexibility: generic message types give you flexible schemas so your
system can handle multiple data types and future use cases
3. Shareability: streams allow you to multiplex and share your data easily
with your consumers
4. Extensibility: Processing streams of data allows us to write it to multiple
data storage systems, which enables a variety of analytics tools
Thank you

Más contenido relacionado

La actualidad más candente

Accelerate Your Cloud Migration Journey.pdf
Accelerate Your Cloud Migration Journey.pdfAccelerate Your Cloud Migration Journey.pdf
Accelerate Your Cloud Migration Journey.pdfAmazon Web Services
 
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
AWS Summit Seoul 2023 | HL Mando가 AWS IoT Fleetwise로 그리는 미래 커넥티드 모빌리티 기술
AWS Summit Seoul 2023 | HL Mando가 AWS IoT Fleetwise로 그리는 미래 커넥티드 모빌리티 기술AWS Summit Seoul 2023 | HL Mando가 AWS IoT Fleetwise로 그리는 미래 커넥티드 모빌리티 기술
AWS Summit Seoul 2023 | HL Mando가 AWS IoT Fleetwise로 그리는 미래 커넥티드 모빌리티 기술Amazon Web Services Korea
 
Making the Case for Integration Platform as a Service (iPaaS)
Making the Case for Integration Platform as a Service (iPaaS)Making the Case for Integration Platform as a Service (iPaaS)
Making the Case for Integration Platform as a Service (iPaaS)Axway
 
How to backup, restore and archive your data on AWS
How to backup, restore and archive your data on AWSHow to backup, restore and archive your data on AWS
How to backup, restore and archive your data on AWSAmazon Web Services
 
AWS Data Analytics on AWS
AWS Data Analytics on AWSAWS Data Analytics on AWS
AWS Data Analytics on AWSsampath439572
 
How Hess Has Continued to Optimize the AWS Cloud After Migrating - ENT218 - r...
How Hess Has Continued to Optimize the AWS Cloud After Migrating - ENT218 - r...How Hess Has Continued to Optimize the AWS Cloud After Migrating - ENT218 - r...
How Hess Has Continued to Optimize the AWS Cloud After Migrating - ENT218 - r...Amazon Web Services
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxSwathiPonugumati
 
Building a Better Business Case for Migrating to Cloud
Building a Better Business Case for Migrating to CloudBuilding a Better Business Case for Migrating to Cloud
Building a Better Business Case for Migrating to CloudAmazon Web Services
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSAmazon Web Services
 
Building the business case for AWS
Building the business case for AWSBuilding the business case for AWS
Building the business case for AWSAmazon Web Services
 
Introducing Amazon Connect-Keynote-Enterprise Connect 2017
Introducing Amazon Connect-Keynote-Enterprise Connect 2017Introducing Amazon Connect-Keynote-Enterprise Connect 2017
Introducing Amazon Connect-Keynote-Enterprise Connect 2017Amazon Web Services
 
The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...
The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...
The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...Amazon Web Services
 
AWS Summit Seoul 2023 | 클라우드를 통한 온/오프라인 비즈니스의 통합, GS리테일의 현대화
AWS Summit Seoul 2023 | 클라우드를 통한 온/오프라인 비즈니스의 통합, GS리테일의 현대화AWS Summit Seoul 2023 | 클라우드를 통한 온/오프라인 비즈니스의 통합, GS리테일의 현대화
AWS Summit Seoul 2023 | 클라우드를 통한 온/오프라인 비즈니스의 통합, GS리테일의 현대화Amazon Web Services Korea
 
Cross-account encryption with AWS KMS and Slack Enterprise Key Management - S...
Cross-account encryption with AWS KMS and Slack Enterprise Key Management - S...Cross-account encryption with AWS KMS and Slack Enterprise Key Management - S...
Cross-account encryption with AWS KMS and Slack Enterprise Key Management - S...Amazon Web Services
 
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Amazon Web Services
 

La actualidad más candente (20)

Accelerate Your Cloud Migration Journey.pdf
Accelerate Your Cloud Migration Journey.pdfAccelerate Your Cloud Migration Journey.pdf
Accelerate Your Cloud Migration Journey.pdf
 
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
 
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Deep dive - AWS Fargate
Deep dive - AWS FargateDeep dive - AWS Fargate
Deep dive - AWS Fargate
 
AWS Summit Seoul 2023 | HL Mando가 AWS IoT Fleetwise로 그리는 미래 커넥티드 모빌리티 기술
AWS Summit Seoul 2023 | HL Mando가 AWS IoT Fleetwise로 그리는 미래 커넥티드 모빌리티 기술AWS Summit Seoul 2023 | HL Mando가 AWS IoT Fleetwise로 그리는 미래 커넥티드 모빌리티 기술
AWS Summit Seoul 2023 | HL Mando가 AWS IoT Fleetwise로 그리는 미래 커넥티드 모빌리티 기술
 
Making the Case for Integration Platform as a Service (iPaaS)
Making the Case for Integration Platform as a Service (iPaaS)Making the Case for Integration Platform as a Service (iPaaS)
Making the Case for Integration Platform as a Service (iPaaS)
 
How to backup, restore and archive your data on AWS
How to backup, restore and archive your data on AWSHow to backup, restore and archive your data on AWS
How to backup, restore and archive your data on AWS
 
App Modernization
App ModernizationApp Modernization
App Modernization
 
AWS Data Analytics on AWS
AWS Data Analytics on AWSAWS Data Analytics on AWS
AWS Data Analytics on AWS
 
How Hess Has Continued to Optimize the AWS Cloud After Migrating - ENT218 - r...
How Hess Has Continued to Optimize the AWS Cloud After Migrating - ENT218 - r...How Hess Has Continued to Optimize the AWS Cloud After Migrating - ENT218 - r...
How Hess Has Continued to Optimize the AWS Cloud After Migrating - ENT218 - r...
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptx
 
AWS for Manufacturing
AWS for ManufacturingAWS for Manufacturing
AWS for Manufacturing
 
Building a Better Business Case for Migrating to Cloud
Building a Better Business Case for Migrating to CloudBuilding a Better Business Case for Migrating to Cloud
Building a Better Business Case for Migrating to Cloud
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWS
 
Building the business case for AWS
Building the business case for AWSBuilding the business case for AWS
Building the business case for AWS
 
Introducing Amazon Connect-Keynote-Enterprise Connect 2017
Introducing Amazon Connect-Keynote-Enterprise Connect 2017Introducing Amazon Connect-Keynote-Enterprise Connect 2017
Introducing Amazon Connect-Keynote-Enterprise Connect 2017
 
The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...
The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...
The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...
 
AWS Summit Seoul 2023 | 클라우드를 통한 온/오프라인 비즈니스의 통합, GS리테일의 현대화
AWS Summit Seoul 2023 | 클라우드를 통한 온/오프라인 비즈니스의 통합, GS리테일의 현대화AWS Summit Seoul 2023 | 클라우드를 통한 온/오프라인 비즈니스의 통합, GS리테일의 현대화
AWS Summit Seoul 2023 | 클라우드를 통한 온/오프라인 비즈니스의 통합, GS리테일의 현대화
 
Cross-account encryption with AWS KMS and Slack Enterprise Key Management - S...
Cross-account encryption with AWS KMS and Slack Enterprise Key Management - S...Cross-account encryption with AWS KMS and Slack Enterprise Key Management - S...
Cross-account encryption with AWS KMS and Slack Enterprise Key Management - S...
 
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
 

Similar a From Batch to Streaming - How Amazon Flex Uses Real-time Analytics

ABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
ABD301-Analyzing Streaming Data in Real Time with Amazon KinesisABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
ABD301-Analyzing Streaming Data in Real Time with Amazon KinesisAmazon Web Services
 
Building a Real-Time Data Platform on AWS
Building a Real-Time Data Platform on AWSBuilding a Real-Time Data Platform on AWS
Building a Real-Time Data Platform on AWSInjae Kwak
 
ABD335_Real-Time Anomaly Detection Using Amazon Kinesis
ABD335_Real-Time Anomaly Detection Using Amazon KinesisABD335_Real-Time Anomaly Detection Using Amazon Kinesis
ABD335_Real-Time Anomaly Detection Using Amazon KinesisAmazon Web Services
 
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTHow TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTAmazon Web Services
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesAmazon Web Services
 
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018Amazon Web Services
 
Don’t Wait Until Tomorrow: From Batch to Streaming (ANT360) - AWS re:Invent 2018
Don’t Wait Until Tomorrow: From Batch to Streaming (ANT360) - AWS re:Invent 2018Don’t Wait Until Tomorrow: From Batch to Streaming (ANT360) - AWS re:Invent 2018
Don’t Wait Until Tomorrow: From Batch to Streaming (ANT360) - AWS re:Invent 2018Amazon Web Services
 
Analyzing Streams: Data Analytics Week SF
Analyzing Streams: Data Analytics Week SFAnalyzing Streams: Data Analytics Week SF
Analyzing Streams: Data Analytics Week SFAmazon Web Services
 
Analyzing Streams: Data Analytics Week at the SF Loft
Analyzing Streams: Data Analytics Week at the SF LoftAnalyzing Streams: Data Analytics Week at the SF Loft
Analyzing Streams: Data Analytics Week at the SF LoftAmazon Web Services
 
GAM310_Build a Telemetry and Analytics Pipeline for Game Balancing
GAM310_Build a Telemetry and Analytics Pipeline for Game BalancingGAM310_Build a Telemetry and Analytics Pipeline for Game Balancing
GAM310_Build a Telemetry and Analytics Pipeline for Game BalancingAmazon Web Services
 
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Amazon Web Services
 
Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...Amazon Web Services
 
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Amazon Web Services
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesAmazon Web Services
 
Real-time Analytics using Data from IoT Devices - AWS Online Tech Talks
Real-time Analytics using Data from IoT Devices - AWS Online Tech TalksReal-time Analytics using Data from IoT Devices - AWS Online Tech Talks
Real-time Analytics using Data from IoT Devices - AWS Online Tech TalksAmazon Web Services
 
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018Amazon Web Services
 

Similar a From Batch to Streaming - How Amazon Flex Uses Real-time Analytics (20)

ABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
ABD301-Analyzing Streaming Data in Real Time with Amazon KinesisABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
ABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
 
ABD217_From Batch to Streaming
ABD217_From Batch to StreamingABD217_From Batch to Streaming
ABD217_From Batch to Streaming
 
Building a Real-Time Data Platform on AWS
Building a Real-Time Data Platform on AWSBuilding a Real-Time Data Platform on AWS
Building a Real-Time Data Platform on AWS
 
ABD335_Real-Time Anomaly Detection Using Amazon Kinesis
ABD335_Real-Time Anomaly Detection Using Amazon KinesisABD335_Real-Time Anomaly Detection Using Amazon Kinesis
ABD335_Real-Time Anomaly Detection Using Amazon Kinesis
 
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTHow TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
 
Don’t Wait Until Tomorrow: From Batch to Streaming (ANT360) - AWS re:Invent 2018
Don’t Wait Until Tomorrow: From Batch to Streaming (ANT360) - AWS re:Invent 2018Don’t Wait Until Tomorrow: From Batch to Streaming (ANT360) - AWS re:Invent 2018
Don’t Wait Until Tomorrow: From Batch to Streaming (ANT360) - AWS re:Invent 2018
 
Analyzing Streams: Data Analytics Week SF
Analyzing Streams: Data Analytics Week SFAnalyzing Streams: Data Analytics Week SF
Analyzing Streams: Data Analytics Week SF
 
Analyzing Streams: Data Analytics Week at the SF Loft
Analyzing Streams: Data Analytics Week at the SF LoftAnalyzing Streams: Data Analytics Week at the SF Loft
Analyzing Streams: Data Analytics Week at the SF Loft
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
 
GAM310_Build a Telemetry and Analytics Pipeline for Game Balancing
GAM310_Build a Telemetry and Analytics Pipeline for Game BalancingGAM310_Build a Telemetry and Analytics Pipeline for Game Balancing
GAM310_Build a Telemetry and Analytics Pipeline for Game Balancing
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
 
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
 
Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...
 
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Real-time Analytics using Data from IoT Devices - AWS Online Tech Talks
Real-time Analytics using Data from IoT Devices - AWS Online Tech TalksReal-time Analytics using Data from IoT Devices - AWS Online Tech Talks
Real-time Analytics using Data from IoT Devices - AWS Online Tech Talks
 
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
 

Más de Amazon Web Services

Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSAmazon Web Services
 

Más de Amazon Web Services (20)

Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWS
 

From Batch to Streaming - How Amazon Flex Uses Real-time Analytics

  • 1. Big Data Analytics Evolvement and Innovation From Batch to Streaming Yubo Wang Product Mgr, GCR Time 16:00-16:40 Level 200
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda • Real-time streaming data overview • Streaming data services • Benefits of streaming analytics • Batch to streaming best practices • How Amazon Flex moved from batch to streaming
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is batch processing? Execution of a series of jobs in a program on a computer without manual intervention - Wikipedia • Data is collected over a period of time • Process and analyze on a schedule • Combine several processes to obtain final result
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What Is Real-time Data? Mobile Apps Web Clickstream Application Logs Metering Records IoT Sensors Smart Buildings [Wed Oct 11 14:32:52 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdo cs/test
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Hourly server logs • Weekly or monthly bills • Daily web-site clickstream • Daily fraud reports • Real time metrics • Real time spending alerts/caps • Real time clickstream analysis • Real time detection It’s All About the Pace Batch Processing Stream Processing
  • 6. The diminishing value of data Recent data is highly valuable • If you act on it in time • Perishable insights (M. Gualtieri, Forrester) Old + recent data is more valuable • If you have the means to combine them
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Simple Pattern for Streaming Data Continuously creates data Continuously writes data to a stream Can be almost anything Data Producer Durably stores data Provides temporary buffer that preps data Supports very high- throughput Streaming Service Continuously processes data Cleans, prepares, & aggregates Transforms data to information Data Consumer Mobile Client Amazon Kinesis Amazon Kinesis app
  • 8. Processing real-time, streaming data • Durable • Continuous • Fast • Correct • Reactive • Reliable What are the key requirements? Collect Transform Analyze React Persist
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Amazon Kinesis Data Streams Amazon Kinesis Data Analytics Amazon Kinesis Data Firehose Build custom applications that process and analyze streaming data Easily process and analyze streaming data with standard SQL Easily load streaming data into AWS
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Data Streams • Easy administration and low cost • Build real time applications with framework of choice • Secure, durable storage
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Data Analytics • Powerful real time applications • Easy to use, fully managed • Automatic elasticity
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Data Firehose • Zero administration and seamless elasticity • Direct-to-data store integration • Serverless, continuous data transformations Amazon S3 Amazon Redshift
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Data Analytics Applications Easily write SQL code to process streaming data Connect to streaming source Continuously deliver SQL results
  • 14. Video is critical to many applications Smart Home Security MonitoringSmart City Industrial Automation Computer Vision
  • 17. Amazon Kinesis—Real-Time Analytics Easily collect, process, and analyze video and data streams in real time Capture, process, and store video streams for analytics Load data streams into AWS data stores Analyze data streams with SQL Build custom applications that analyze data streams Kinesis Video Streams Kinesis Data Streams Kinesis Data Firehose Kinesis Data Analytics New at re:Invent 2017
  • 18. Stream video from millions of devices Easily build vision-enabled apps Secure Durable, searchable storage Fully managed Amazon Kinesis Video Streams S t r e a m v i d e o a n d t i m e - e n c o d e d d a t a f o r a n a l y t i c s Amazon AI Services Apache MxNet TensorFlow Custom Video Processing 3rd Party Partners Kinesis Video Streams
  • 19. Use case: Smart Home Example: Pet Monitor
  • 20. Use case: Smart City Example: Amber Alert System
  • 21. Use case: Industrial Automation Example: Equipment Preventive Maintenance
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of streaming analysis Immediate results • Real-time aggregations • Filtering • Anomaly detection Reduced complexity • Fewer scheduled jobs to manage • Kinesis is a fully- managed solution Scalable • Enables parallel processing • Horizontally scales, based on your ingest rate
  • 23. Real-Time Analytics Core Use CasesScenarios/ Verticals Accelerated Ingest- Transform-Load Continuous Metrics Generation Machine Learning and Actionable Insights Digital Ad Tech/Marketing Publisher, bidder data aggregation Advertising metrics like coverage, yield, and conversion User engagement with ads, optimized bid/buy engines IoT Sensor, device telemetry data ingestion Operational metrics and dashboards Device operational intelligence and alerts Gaming Online data aggregation, e.g., top 10 players Massively multiplayer online game (MMOG) live dashboard Leader board generation, player-skill match Consumer Online Clickstream analytics Metrics like impressions and page views Recommendation engines, proactive care Operation Security DevOps tools, ingesting VPC Flow Logs Subscribe to Amazon CloudWatch Logs and analyze logs in real-time Anomaly detection
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Streaming Ingest- Transform- Load • Continuous Metric Generation • Actionable Insights Three Common Scenarios Compute analytics as the data is generated React to analytics based off of insights Deliver data to analytics tools faster and cheaper
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Web Analytics and Leaderboards Amazon Kinesis Data Analytics AWS Lambda function Amazon Cognito Lightweight JS client code Web Server on Amazon EC2 Instance OR Amazon DynamoDB Table Amazon Kinesis Data Streams Compute top 10 usersIngest web app data Persist to feed live apps
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitoring IoT Devices IoT sensors AWS IoT Amazon RDS MySQL DB instance Amazon Kinesis Data Streams Amazon Kinesis Data Analytics AWS Lambda function Compute avg temp every 10 sec Ingest sensor data Persist time series analytic to database
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Analyzing CloudTrail Event Logs AWS CloudTrail Amazon CloudWatch events trigger AmazonKinesis DataAnalytics AWS Lambda function Amazon S3 bucket for raw data Amazon S3 bucket for processed data Amazon DynamoDB Table(s) Chart.JS Dashboard Compute operational metrics Ingest and deliver raw log data Deliver to a real time dashboards and archival AmazonKinesis DataFirehose AmazonKinesis DataFirehose
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Batch to streaming best practices Migrate incrementally • Don’t boil the ocean • Begin by streaming data in parallel to existing batch processes • Persist streaming data into durable storage, like Amazon S3 • Add in streaming analysis results to replace batch analysis Application databases Data warehouseData producer Amazon Kinesis ETL ETL Amazon S3 Streaming data
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Batch to streaming best practices Perform ITL rather than ETL • ITL: Ingest-Transform-Load • ETL: Extract-Transform-Load • Transform data in near-real time rather than a scheduled job • Enrich data in near-real time • Persist transformed and/or enriched data Data producer Amazon Kinesis Firehose Raw streaming data AWS Lambda function Amazon S3 Transformed data Transform data Enrichment source data Raw data Transformed and/or enriched data
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Batch to streaming best practices Aggregate upon arrival • Continuously write raw data to persistent data store for archival and other analysis • Aggregate in real time when window size < 1 hour • Write aggregated data to persistent data store for immediate value Amazon Kinesis Firehose Raw streaming data Amazon S3 Raw data Aggregated data Amazon Kinesis Analytics Aggregate Results Data producer
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Batch to streaming example
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Amazon delivery app (Android/iOS) • Crowd-sourced model launched in 30+ U.S. cities • Used by Amazon Logistics worldwide
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Deliveries for Amazon.com, Prime Now, Amazon Fresh, restaurants, grocery stores • Millions of packages per year
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The problem • Collecting, processing, and storing telemetry data • Telemetry data = remote measurements • Includes metrics, crashes, logs, sensor data, clickstream data, etc.
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The goal • Understand what’s happening in the field • Analyze all the data and make performance optimizations • Focus our time on improving the app and the delivery flow
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Getting from batch to streaming • To solve our use cases, we had to incrementally improve our system • We evolved from a batch-based system to a stream-based system • Let’s walk through the iterations
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Collect metrics and send to an existing metrics service • ETL jobs to load data into a big Oracle Data Warehouse Iteration 1: Use existing systems Existing metrics serviceApp DW ETL Data collection
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • 1. Batch process with 24-hour delay • 2. Fixed, inflexible DB schema • 3. Analysis difficult and slow via SQL Iteration 1: Use existing systems Existing metrics serviceApp DW ETL Data collection
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Collect metrics in the app using AWS Amazon Mobile Analytics SDK, which automatically loads data into Redshift Iteration 2: Use AWS App CloudFormation ETL system Data collection
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • 1. Batch process with 24-hour delay 2-hour delay • 2. Fixed, inflexible DB schema • 3. Analysis difficult and slow via SQL Iteration 2: Use AWS App CloudFormation ETL system Data Collection
  • 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Add shared configuration that is used in the app and automatically updates the Redshift schema Iteration 3: Automated DB schema App CloudFormation ETL system Data collection Schema config
  • 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • 1. Batch process with 24-hour delay 2-hour delay • 2. Fixed, inflexible Auto-updating DB schema • 3. Analysis difficult and slow via SQL Iteration 3: Automated DB schema App Schema config CloudFormation ETL system Data collection
  • 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Introduce a Kinesis stream and Kinesis Firehose to publish to Redshift • Partition data by date to simplify data retention policies Iteration 4: Use Streams App Data collection Via Pinpoint Schema config
  • 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • 1. Batch Streaming process with 24-hour 2 hour a delay of a couple minutes • 2. Fixed, inflexible Auto-updating DB schema • 3. Analysis difficult and slow via SQL Iteration 4: Use Streams App Data collection Via Pinpoint Schema config
  • 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Use generic message types • Publish the data to: • S3 • Redshift • ElasticSearch Iteration 5: Generic message types App ElasticSearch
  • 46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Iteration 5 App Data collection ElasticSearch Consumer Lambdas SQL reports Dashboards ProtoBuf Consumer Redshifts
  • 47. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • 1. Batch Streaming process with 24-hour 2 hour a few seconds delay • 2. Fixed, inflexible Auto-updating DB schema and generic message types • 3. Analysis difficult and slow via SQL flexible by processing message payload Iteration 5: Generic message types
  • 48. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of Streaming 1. Agility: real-time data means your business can react quicker 2. Flexibility: generic message types give you flexible schemas so your system can handle multiple data types and future use cases 3. Shareability: streams allow you to multiplex and share your data easily with your consumers 4. Extensibility: Processing streams of data allows us to write it to multiple data storage systems, which enables a variety of analytics tools