SlideShare una empresa de Scribd logo
1 de 40
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
October 2015
Streaming Data Flows with
Amazon Kinesis Firehose
Adi Krishnan, Product Manager Amazon Kinesis
What to Expect From This Session
Amazon Kinesis Firehose
Key concepts
Firehose Experience for S3
Firehose Experience for Redshift
Data delivery patterns
Putting data using the Amazon Kinesis Agent
Key API Overview
Pricing
Firehose Troubleshooting
Amazon Kinesis:
Streaming Data Done the AWS Way
Makes it easy to capture, deliver, and process real-time data streams
Pay as you go, no up-front costs
Elastically scalable
Right services for your specific use cases
Easy to provision, deploy, and manage
Real-time latencies
Zero administration: Capture and deliver streaming data into S3, Redshift, and other
destinations without writing an application or managing infrastructure.
Direct-to-data store integration: Batch, compress, and encrypt streaming data for
delivery into data destinations in as little as 60 secs using simple configurations.
Seamless elasticity: Seamlessly scales to match data throughput w/o intervention
Capture and submit
streaming data to
Firehose
Firehose loads streaming data
continuously into S3 and
Redshift
Analyze streaming data using your
favorite BI tools
Amazon Kinesis Firehose
Load massive volumes of streaming data into Amazon S3 and Amazon
Redshift
Amazon Kinesis Firehose
Customer Experience
1. Delivery Stream: The underlying entity of Firehose. Use Firehose
by creating a delivery stream to a specified destination and send
data to it.
• You do not have to create a stream or provision shards.
• You do not have to specify partition keys.
2. Records: The data producer sends data blobs as large as 1,000
KB to a delivery stream. That data blob is called a record.
3. Data Producers: Producers send records to a Delivery Stream.
For example, a web server sends log data to a delivery stream is a
data producer.
Amazon Kinesis Firehose
3 Simple Concepts
Amazon Kinesis Firehose Console Experience
Unified Console Experience for Firehose and Streams
Amazon Kinesis Firehose Console Experience (S3)
Create fully managed resources for delivery without building an app
Amazon Kinesis Firehose Console Experience
Configure data delivery options simply using the console
Amazon Kinesis Firehose to
S3 Data Delivery Specifics
Data Delivery Methodology for Amazon S3
Controlling size and frequency of S3 objects
Single delivery stream delivers to single S3 bucket
Buffer size/intervals control size + frequency of delivery
• Buffer size – 1 MB to 128 MBs or
• Buffer interval - 60 to 900 seconds
• Firehose concatenates records into a single larger object
• Condition satisfied first triggers data delivery
Compress data after data is buffered
• GZIP, ZIP, SNAPPY
• Delivered S3 objects might be smaller than total data Put into Firehose
• For Amazon Redshift, GZIP is only supported format currently
Data Delivery Methodology for Amazon S3
AWS IAM Roles and Encryption
Firehose needs IAM role to access to your S3 bucket
• Delivery stream assumes specified role to gain access to
target S3 bucket
Encrypt data with AWS Key Management Service (KMS)
• AWS KMS is a managed service to create and control the keys used
for data encryption
• Uses Hardware Security Modules (HSMs) to protect keys
• Firehose passes KMS-id to S3 which uses existing integration
• S3 uses bucket and object name as encryption context
Data Delivery Methodology for Amazon S3
S3 Object Name Format
Firehose adds a UTC time prefix in the format YYYY/MM/DD/HH
before putting objects to Amazon S3. Each ‘/’ is a sub-folder
Specify S3-prefix to get your own top-level folder
• Modify this by adding your own top-level folder with ‘/’ to get
myApp/YYYY/MM/DD/HH
• Or simply pre-pend text without a ‘/’ to get myapp YYYY/MM/DD/HH
Expect following naming pattern “DeliveryStreamName-
DeliveryStreamVersion-YYYY-MM-DD-HH-MM-SS-RandomString”
• DeliveryStreamVersion == 1 if delivery stream config is not updated
• DeliveryStreamVersion increases by 1 for every config change
Data Delivery Methodology for Amazon S3
If Firehose cannot reach your S3 bucket
If Amazon S3 bucket is not reachable by Firehose, it tries to access
the S3 bucket every 5 seconds until the issue is resolved.
Firehose durably stores your data for 24 hours
Resumes delivery (per configurations) as soon as your S3 bucket
becomes available and catches up
Data can be lost if bucket is not accessible for > 24 hours
Amazon Kinesis Firehose
to Redshift
Amazon Kinesis Firehose to Redshift
A two-step process
Use customer-provided S3 bucket as an intermediate destination
• Still the most efficient way to do large-scale loads to Redshift.
• Never lose data, always safe, and available in your S3 bucket.
1
Use customer-provided S3 bucket as an intermediate destination
• Still the most efficient way to do large scale loads to Redshift.
• Never lose data, always safe, and available in your S3 bucket.
Firehose issues customer-provided COPY command
synchronously. It continuously issues a COPY command once the
previous COPY command is finished and acknowledged back from
Redshift.
2
1
Amazon Kinesis Firehose to Redshift
A two-step process
Amazon Kinesis Firehose Console
Configure data delivery to Redshift simply using the console
Amazon Kinesis Firehose to Redshift
Data Delivery Specifics
Data Delivery Methodology for Amazon Redshift
2-step process executed on your behalf
Use customer-provided S3 bucket as an intermediate
destination
• Still the most efficient way to do large scale loads to Redshift
• Never lose data, always safe, and available in your S3 bucket
Firehose issues customer-provided Redshift COPY
command synchronously
• Loads data from intermediate-S3 bucket to Redshift cluster
• Single delivery stream loads into a single Redshift cluster,
database, and table
Data Delivery Methodology for Amazon Redshift
Frequency of loads into Redshift Cluster
Based on delivery to S3 – buffer size and interval
Redshift COPY command frequency
• Continuously issues the Redshift COPY command once the
previous one is acknowledged from Redshift.
• Frequency of COPYs from S3 to Redshift is determined by how
fast your cluster can finish the COPY command
For efficiency, Firehose uses manifest copy
• Intermediate S3 bucket contains ‘manifests’ folder – that holds
a manifest of files that are to be copied into Redshift
Data Delivery Methodology for Amazon Redshift
Redshift COPY command mechanics
Firehose issues the Redshift COPY command with zero
error tolerance. If a single record within a file fails, the
whole batch of files within the COPY command fails
Firehose does not perform partial file or batch copies to
your Amazon Redshift table
Skipped objects' information is delivered to S3 bucket as
manifest file in the errors folder
Data Delivery Methodology for Amazon Redshift
If Redshift cluster is not accessible
If Firehose cannot reach cluster it tries every 5 minutes for
up to a total of 60 minutes
If cluster is still not reachable after 60 minutes, it skips the
current batch of S3 objects and moves on to next
Like before, skipped objects' information is delivered to S3
bucket as manifest file in the errors folder. Use info for
manual backfill as appropriate
Amazon Kinesis Agent
Software agent makes submitting data to Firehose easy
• Monitors files and sends new data records to your delivery stream
• Handles file rotation, checkpointing, and retry upon failures
• Delivers all data in a reliable, timely, and simple manner
• Emits AWS CloudWatch metrics to help you better monitor and
troubleshoot the streaming process
Supported on Amazon Linux AMI with version 2015.09 or later, or Red Hat
Enterprise Linux version 7 or later. Install on Linux-based server
environments such as web servers, front ends, log servers, and more
• Also enabled for Streams
CreateDeliveryStream: Create a DeliveryStream. You specify the S3 bucket
information where your data will be delivered to.
DeleteDeliveryStream: Delete a DeliveryStream.
DescribeDeliveryStream: Describe a DeliveryStream and returns configuration
information.
ListDeliveryStreams: Return a list of DeliveryStreams under this account.
UpdateDestination: Update the destination S3 bucket information for a
DeliveryStream.
PutRecord: Put a single data record (up to 1000KB) to a DeliveryStream.
PutRecordBatch: Put multiple data records (up to 500 records OR 5MBs) to a
DeliveryStream.
Amazon Kinesis Firehose API Overview
Amazon Kinesis Firehose Pricing
Simple, pay-as-you-go and no up-front costs
Dimension Value
Per 1 GB of data ingested $0.035
Amazon Kinesis Firehose
Metrics and Troubleshooting
Metrics Description
Incoming.
Bytes
The number of bytes ingested into the Firehose delivery stream.
Incoming.
Records
The number of records ingested into the Firehose delivery stream.
DeliveryToS3.
Bytes
The number of bytes delivered to Amazon S3
DeliveryToS3.
DataFreshness
Age of the oldest record in Firehose. Any record older than this
age has been delivered to the S3 bucket.
DeliveryToS3.
Records
The number of records delivered to Amazon S3
DeliveryToS3.
Success
Sum of successful Amazon S3 put commands over sum of all
Amazon S3 put commands.
Amazon Kinesis Firehose Monitoring
Load to S3 specific metrics
Metrics Description
DeliveryToRedshift.Bytes The number of bytes copied to Amazon Redshift
DeliveryToRedshift.Records The number of records copied to Amazon Redshift
DeliveryToRedshift.Success Sum of successful Amazon Redshift COPY
commands over the sum of all Amazon Redshift
COPY commands.
Amazon Kinesis Firehose Monitoring
Load to Redshift specific metrics
Amazon Kinesis Firehose Troubleshooting
Is data flowing end-to-end?
Key question: Is data flowing through Firehose from producers to the
S3 destination end-to-end?
Key metrics: Incoming.Bytes, and Incoming.Records metrics along
with DeliveryToS3.Success and DeliveryToRedshift.Success can
be used to confirm that data is flowing end-to-end
Additionally, check the Put* APIs Bytes, Latency, and Request metrics
for Firehose and or S3
Amazon Kinesis Firehose Trouble-shooting
What is being captured, and what is being delivered?
Key question: How much data is being captured and delivered to the
destination?
Key metrics: Put* Records, Bytes, and Requests to determine the
data volume captured by Firehose. Next check DeliveryToS3.Bytes/
Records and DeliverytoRedshift.Bytes/Records
Additionally, check the S3 bucket related metrics above confirm the
data volume delivered to S3. Verify with Incoming.Records and
Incoming.Bytes
Amazon Kinesis Firehose Troubleshooting
Is my data showing ‘on time’?
Key question: Is Firehose delivering data in a timely way?
Key metrics: DeliveryToS3.DataFreshness in seconds helps
determine freshness of data. It shows the maximum age in seconds of
oldest undelivered record in Firehose.
Any record that is older than this age has been delivered into S3.
Amazon Kinesis Firehose Troubleshooting
Is something wrong with my Firehose or bucket or cluster?
Key question: Is something wrong with Firehose or with the customers’
own configuration, say S3 bucket?
Key metrics: The DeliveryToS3.Success metric computes the sum of
successful S3 Puts over sum of all S3 puts managed by Firehose.
Similarly DeliveryToRedshift.Success metric computes sum of
successful COPY commands over sum of all COPY commands.
If this trends below “1” it suggests potential issues with the S3 bucket
configuration such that Firehose puts to S3 are failing.
Amazon Kinesis Firehose Troubleshooting
Key steps if data isn’t showing up in S3
Check Incoming.Bytes and Incoming.Records metrics to ensure that data
is Put successfully.
Check DeliveryToS3.Success metric to ensure that Firehose is putting
data to your S3 bucket.
Ensure that the S3 bucket specified in your delivery stream still exists.
Ensure that the IAM role specified in your delivery stream has access to
your S3 bucket.
Amazon Kinesis Firehose Troubleshooting
Key steps if data isn’t showing up in Redshift
Check Incoming.Bytes and Incoming.Records metrics to ensure that data
is Put successfully, and the DeliveryToS3.Success metric to ensure that
data is being staged in the S3 bucket.
Check DeliveryToRedshift.Success metric to ensure that Firehose has
copied data from your S3 bucket to the cluster.
Check Redshift STL_LOAD_ERRORS table to verify the reason of the
COPY failure.
Ensure that Redshift config in Firehose is accurate
Amazon Kinesis Firehose
or
Amazon Kinesis Streams?
Amazon Kinesis Streams is a service for workloads that
requires custom processing, per incoming record, with
sub-1 second processing latency, and a choice of stream
processing frameworks.
Amazon Kinesis Firehose is a service for workloads that
require zero administration, ability to use existing
analytics tools based on S3 or Redshift, and a data
latency of 60 seconds or higher.
Amazon Kinesis
Streaming Data Made Easy
Amazon Kinesis: Streaming data made easy
Services make it easy to capture, deliver, and process streams on
AWS
Amazon Kinesis
Streams
Build your own custom
applications that process or
analyze streaming data
Amazon Kinesis
Firehose
Easily load massive volumes
of streaming data into
Amazon S3 and Redshift
Thank You

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon GlacierSRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
 
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
 
Storage & Content Delivery
Storage & Content DeliveryStorage & Content Delivery
Storage & Content Delivery
 
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
 
SRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBSRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDB
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
(CMP407) Lambda as Cron: Scheduling Invocations in AWS Lambda
(CMP407) Lambda as Cron: Scheduling Invocations in AWS Lambda(CMP407) Lambda as Cron: Scheduling Invocations in AWS Lambda
(CMP407) Lambda as Cron: Scheduling Invocations in AWS Lambda
 
Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017
Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017
Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017
 
AWS re:Invent 2016: Infrastructure Continuous Delivery Using AWS CloudFormati...
AWS re:Invent 2016: Infrastructure Continuous Delivery Using AWS CloudFormati...AWS re:Invent 2016: Infrastructure Continuous Delivery Using AWS CloudFormati...
AWS re:Invent 2016: Infrastructure Continuous Delivery Using AWS CloudFormati...
 
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
 
A Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionA Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in Action
 
Getting started with Amazon Dynamo BD
Getting started with Amazon Dynamo BDGetting started with Amazon Dynamo BD
Getting started with Amazon Dynamo BD
 
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
Deep Dive on Object Storage: Amazon S3 and Amazon GlacierDeep Dive on Object Storage: Amazon S3 and Amazon Glacier
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
 
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech TalksAWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
AWS Services Overview and Quarterly Update - April 2017 AWS Online Tech Talks
 
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
 
Serverless Patterns: “No server is easier to manage than no server” - AWS Sec...
Serverless Patterns: “No server is easier to manage than no server” - AWS Sec...Serverless Patterns: “No server is easier to manage than no server” - AWS Sec...
Serverless Patterns: “No server is easier to manage than no server” - AWS Sec...
 
Data Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and ArchiveData Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and Archive
 
SRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal Health
SRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal HealthSRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal Health
SRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal Health
 

Destacado

Mackenzie Uter- Resume
Mackenzie Uter- ResumeMackenzie Uter- Resume
Mackenzie Uter- Resume
Mackenzie Uter
 

Destacado (17)

Mackenzie Uter- Resume
Mackenzie Uter- ResumeMackenzie Uter- Resume
Mackenzie Uter- Resume
 
resume engineering
resume engineeringresume engineering
resume engineering
 
A big-data-approach-to-energy-management-in-retail
A big-data-approach-to-energy-management-in-retailA big-data-approach-to-energy-management-in-retail
A big-data-approach-to-energy-management-in-retail
 
The Tightrope for K12 IT
The Tightrope for K12 ITThe Tightrope for K12 IT
The Tightrope for K12 IT
 
Switch to-results-in-educational-institutions
Switch to-results-in-educational-institutionsSwitch to-results-in-educational-institutions
Switch to-results-in-educational-institutions
 
Securing The AWS Cloud, Steve Riley, AWS Events, April 2010
Securing The AWS Cloud, Steve Riley, AWS Events, April 2010Securing The AWS Cloud, Steve Riley, AWS Events, April 2010
Securing The AWS Cloud, Steve Riley, AWS Events, April 2010
 
Serverless architecture with AWS Lambda (June 2016)
Serverless architecture with AWS Lambda (June 2016)Serverless architecture with AWS Lambda (June 2016)
Serverless architecture with AWS Lambda (June 2016)
 
AWS Mobile Hub - Building Mobile Apps with AWS
AWS Mobile Hub - Building Mobile Apps with AWSAWS Mobile Hub - Building Mobile Apps with AWS
AWS Mobile Hub - Building Mobile Apps with AWS
 
ECS and ECR deep dive
ECS and ECR deep diveECS and ECR deep dive
ECS and ECR deep dive
 
Docker on AWS with Amazon ECR & ECS - Pop-up Loft Tel Aviv
Docker on AWS with Amazon ECR & ECS - Pop-up Loft Tel Aviv Docker on AWS with Amazon ECR & ECS - Pop-up Loft Tel Aviv
Docker on AWS with Amazon ECR & ECS - Pop-up Loft Tel Aviv
 
Amazon EC2 Container Service: Deep Dive
Amazon EC2 Container Service: Deep DiveAmazon EC2 Container Service: Deep Dive
Amazon EC2 Container Service: Deep Dive
 
AWS re:Invent 2016: How News UK Centralized Cloud Governance Through Policy M...
AWS re:Invent 2016: How News UK Centralized Cloud Governance Through Policy M...AWS re:Invent 2016: How News UK Centralized Cloud Governance Through Policy M...
AWS re:Invent 2016: How News UK Centralized Cloud Governance Through Policy M...
 
Intro to AWS Security
Intro to AWS SecurityIntro to AWS Security
Intro to AWS Security
 
React Fast by Processing Streaming Data in Real-Time
React Fast by Processing Streaming Data in Real-TimeReact Fast by Processing Streaming Data in Real-Time
React Fast by Processing Streaming Data in Real-Time
 
AWS Security & Compliance
AWS Security & ComplianceAWS Security & Compliance
AWS Security & Compliance
 
Building Serverless Backends with AWS Lambda and Amazon API Gateway
Building Serverless Backends with AWS Lambda and Amazon API GatewayBuilding Serverless Backends with AWS Lambda and Amazon API Gateway
Building Serverless Backends with AWS Lambda and Amazon API Gateway
 
AWS Services for Content Production
AWS Services for Content ProductionAWS Services for Content Production
AWS Services for Content Production
 

Similar a AWS October Webinar Series - Introducing Amazon Kinesis Firehose

Similar a AWS October Webinar Series - Introducing Amazon Kinesis Firehose (20)

Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis FirehoseStreaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
 
(BDT320) New! Streaming Data Flows with Amazon Kinesis Firehose
(BDT320) New! Streaming Data Flows with Amazon Kinesis Firehose(BDT320) New! Streaming Data Flows with Amazon Kinesis Firehose
(BDT320) New! Streaming Data Flows with Amazon Kinesis Firehose
 
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
 
Serverless Real Time Analytics
Serverless Real Time AnalyticsServerless Real Time Analytics
Serverless Real Time Analytics
 
Cloud Data Migration Strategies - AWS May 2016 Webinar Series
Cloud Data Migration Strategies - AWS May 2016 Webinar SeriesCloud Data Migration Strategies - AWS May 2016 Webinar Series
Cloud Data Migration Strategies - AWS May 2016 Webinar Series
 
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
 
AWS Storage and Data Migration
AWS Storage and Data MigrationAWS Storage and Data Migration
AWS Storage and Data Migration
 
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
 
Getting Started with Amazon Kinesis
Getting Started with Amazon KinesisGetting Started with Amazon Kinesis
Getting Started with Amazon Kinesis
 
AWS Storage and Data Migration: AWS Innovate Ottawa
AWS Storage and Data Migration: AWS Innovate OttawaAWS Storage and Data Migration: AWS Innovate Ottawa
AWS Storage and Data Migration: AWS Innovate Ottawa
 
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
 
Streaming Data Analytics with Amazon Redshift Firehose
Streaming Data Analytics with Amazon Redshift FirehoseStreaming Data Analytics with Amazon Redshift Firehose
Streaming Data Analytics with Amazon Redshift Firehose
 
Moving Data into the Cloud with AWS Transfer Services - May 2017 AWS Online ...
Moving Data into the Cloud with AWS Transfer Services  - May 2017 AWS Online ...Moving Data into the Cloud with AWS Transfer Services  - May 2017 AWS Online ...
Moving Data into the Cloud with AWS Transfer Services - May 2017 AWS Online ...
 
Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis FirehoseStreaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
 
Deep Dive on Amazon S3
Deep Dive on Amazon S3Deep Dive on Amazon S3
Deep Dive on Amazon S3
 
Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon Kinesis
 
Getting started with amazon kinesis
Getting started with amazon kinesisGetting started with amazon kinesis
Getting started with amazon kinesis
 
Serverless Datalake Day with AWS
Serverless Datalake Day with AWSServerless Datalake Day with AWS
Serverless Datalake Day with AWS
 
AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amaz...
AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amaz...AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amaz...
AWS re:Invent 2016: Strategic Planning for Long-Term Data Archiving with Amaz...
 
AWS-Data-Migration-module3
AWS-Data-Migration-module3AWS-Data-Migration-module3
AWS-Data-Migration-module3
 

Más de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

AWS October Webinar Series - Introducing Amazon Kinesis Firehose

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. October 2015 Streaming Data Flows with Amazon Kinesis Firehose Adi Krishnan, Product Manager Amazon Kinesis
  • 2. What to Expect From This Session Amazon Kinesis Firehose Key concepts Firehose Experience for S3 Firehose Experience for Redshift Data delivery patterns Putting data using the Amazon Kinesis Agent Key API Overview Pricing Firehose Troubleshooting
  • 3. Amazon Kinesis: Streaming Data Done the AWS Way Makes it easy to capture, deliver, and process real-time data streams Pay as you go, no up-front costs Elastically scalable Right services for your specific use cases Easy to provision, deploy, and manage Real-time latencies
  • 4. Zero administration: Capture and deliver streaming data into S3, Redshift, and other destinations without writing an application or managing infrastructure. Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery into data destinations in as little as 60 secs using simple configurations. Seamless elasticity: Seamlessly scales to match data throughput w/o intervention Capture and submit streaming data to Firehose Firehose loads streaming data continuously into S3 and Redshift Analyze streaming data using your favorite BI tools Amazon Kinesis Firehose Load massive volumes of streaming data into Amazon S3 and Amazon Redshift
  • 6. 1. Delivery Stream: The underlying entity of Firehose. Use Firehose by creating a delivery stream to a specified destination and send data to it. • You do not have to create a stream or provision shards. • You do not have to specify partition keys. 2. Records: The data producer sends data blobs as large as 1,000 KB to a delivery stream. That data blob is called a record. 3. Data Producers: Producers send records to a Delivery Stream. For example, a web server sends log data to a delivery stream is a data producer. Amazon Kinesis Firehose 3 Simple Concepts
  • 7. Amazon Kinesis Firehose Console Experience Unified Console Experience for Firehose and Streams
  • 8. Amazon Kinesis Firehose Console Experience (S3) Create fully managed resources for delivery without building an app
  • 9. Amazon Kinesis Firehose Console Experience Configure data delivery options simply using the console
  • 10. Amazon Kinesis Firehose to S3 Data Delivery Specifics
  • 11. Data Delivery Methodology for Amazon S3 Controlling size and frequency of S3 objects Single delivery stream delivers to single S3 bucket Buffer size/intervals control size + frequency of delivery • Buffer size – 1 MB to 128 MBs or • Buffer interval - 60 to 900 seconds • Firehose concatenates records into a single larger object • Condition satisfied first triggers data delivery Compress data after data is buffered • GZIP, ZIP, SNAPPY • Delivered S3 objects might be smaller than total data Put into Firehose • For Amazon Redshift, GZIP is only supported format currently
  • 12. Data Delivery Methodology for Amazon S3 AWS IAM Roles and Encryption Firehose needs IAM role to access to your S3 bucket • Delivery stream assumes specified role to gain access to target S3 bucket Encrypt data with AWS Key Management Service (KMS) • AWS KMS is a managed service to create and control the keys used for data encryption • Uses Hardware Security Modules (HSMs) to protect keys • Firehose passes KMS-id to S3 which uses existing integration • S3 uses bucket and object name as encryption context
  • 13. Data Delivery Methodology for Amazon S3 S3 Object Name Format Firehose adds a UTC time prefix in the format YYYY/MM/DD/HH before putting objects to Amazon S3. Each ‘/’ is a sub-folder Specify S3-prefix to get your own top-level folder • Modify this by adding your own top-level folder with ‘/’ to get myApp/YYYY/MM/DD/HH • Or simply pre-pend text without a ‘/’ to get myapp YYYY/MM/DD/HH Expect following naming pattern “DeliveryStreamName- DeliveryStreamVersion-YYYY-MM-DD-HH-MM-SS-RandomString” • DeliveryStreamVersion == 1 if delivery stream config is not updated • DeliveryStreamVersion increases by 1 for every config change
  • 14. Data Delivery Methodology for Amazon S3 If Firehose cannot reach your S3 bucket If Amazon S3 bucket is not reachable by Firehose, it tries to access the S3 bucket every 5 seconds until the issue is resolved. Firehose durably stores your data for 24 hours Resumes delivery (per configurations) as soon as your S3 bucket becomes available and catches up Data can be lost if bucket is not accessible for > 24 hours
  • 16. Amazon Kinesis Firehose to Redshift A two-step process Use customer-provided S3 bucket as an intermediate destination • Still the most efficient way to do large-scale loads to Redshift. • Never lose data, always safe, and available in your S3 bucket. 1
  • 17. Use customer-provided S3 bucket as an intermediate destination • Still the most efficient way to do large scale loads to Redshift. • Never lose data, always safe, and available in your S3 bucket. Firehose issues customer-provided COPY command synchronously. It continuously issues a COPY command once the previous COPY command is finished and acknowledged back from Redshift. 2 1 Amazon Kinesis Firehose to Redshift A two-step process
  • 18. Amazon Kinesis Firehose Console Configure data delivery to Redshift simply using the console
  • 19. Amazon Kinesis Firehose to Redshift Data Delivery Specifics
  • 20. Data Delivery Methodology for Amazon Redshift 2-step process executed on your behalf Use customer-provided S3 bucket as an intermediate destination • Still the most efficient way to do large scale loads to Redshift • Never lose data, always safe, and available in your S3 bucket Firehose issues customer-provided Redshift COPY command synchronously • Loads data from intermediate-S3 bucket to Redshift cluster • Single delivery stream loads into a single Redshift cluster, database, and table
  • 21. Data Delivery Methodology for Amazon Redshift Frequency of loads into Redshift Cluster Based on delivery to S3 – buffer size and interval Redshift COPY command frequency • Continuously issues the Redshift COPY command once the previous one is acknowledged from Redshift. • Frequency of COPYs from S3 to Redshift is determined by how fast your cluster can finish the COPY command For efficiency, Firehose uses manifest copy • Intermediate S3 bucket contains ‘manifests’ folder – that holds a manifest of files that are to be copied into Redshift
  • 22. Data Delivery Methodology for Amazon Redshift Redshift COPY command mechanics Firehose issues the Redshift COPY command with zero error tolerance. If a single record within a file fails, the whole batch of files within the COPY command fails Firehose does not perform partial file or batch copies to your Amazon Redshift table Skipped objects' information is delivered to S3 bucket as manifest file in the errors folder
  • 23. Data Delivery Methodology for Amazon Redshift If Redshift cluster is not accessible If Firehose cannot reach cluster it tries every 5 minutes for up to a total of 60 minutes If cluster is still not reachable after 60 minutes, it skips the current batch of S3 objects and moves on to next Like before, skipped objects' information is delivered to S3 bucket as manifest file in the errors folder. Use info for manual backfill as appropriate
  • 24. Amazon Kinesis Agent Software agent makes submitting data to Firehose easy • Monitors files and sends new data records to your delivery stream • Handles file rotation, checkpointing, and retry upon failures • Delivers all data in a reliable, timely, and simple manner • Emits AWS CloudWatch metrics to help you better monitor and troubleshoot the streaming process Supported on Amazon Linux AMI with version 2015.09 or later, or Red Hat Enterprise Linux version 7 or later. Install on Linux-based server environments such as web servers, front ends, log servers, and more • Also enabled for Streams
  • 25. CreateDeliveryStream: Create a DeliveryStream. You specify the S3 bucket information where your data will be delivered to. DeleteDeliveryStream: Delete a DeliveryStream. DescribeDeliveryStream: Describe a DeliveryStream and returns configuration information. ListDeliveryStreams: Return a list of DeliveryStreams under this account. UpdateDestination: Update the destination S3 bucket information for a DeliveryStream. PutRecord: Put a single data record (up to 1000KB) to a DeliveryStream. PutRecordBatch: Put multiple data records (up to 500 records OR 5MBs) to a DeliveryStream. Amazon Kinesis Firehose API Overview
  • 26. Amazon Kinesis Firehose Pricing Simple, pay-as-you-go and no up-front costs Dimension Value Per 1 GB of data ingested $0.035
  • 27. Amazon Kinesis Firehose Metrics and Troubleshooting
  • 28. Metrics Description Incoming. Bytes The number of bytes ingested into the Firehose delivery stream. Incoming. Records The number of records ingested into the Firehose delivery stream. DeliveryToS3. Bytes The number of bytes delivered to Amazon S3 DeliveryToS3. DataFreshness Age of the oldest record in Firehose. Any record older than this age has been delivered to the S3 bucket. DeliveryToS3. Records The number of records delivered to Amazon S3 DeliveryToS3. Success Sum of successful Amazon S3 put commands over sum of all Amazon S3 put commands. Amazon Kinesis Firehose Monitoring Load to S3 specific metrics
  • 29. Metrics Description DeliveryToRedshift.Bytes The number of bytes copied to Amazon Redshift DeliveryToRedshift.Records The number of records copied to Amazon Redshift DeliveryToRedshift.Success Sum of successful Amazon Redshift COPY commands over the sum of all Amazon Redshift COPY commands. Amazon Kinesis Firehose Monitoring Load to Redshift specific metrics
  • 30. Amazon Kinesis Firehose Troubleshooting Is data flowing end-to-end? Key question: Is data flowing through Firehose from producers to the S3 destination end-to-end? Key metrics: Incoming.Bytes, and Incoming.Records metrics along with DeliveryToS3.Success and DeliveryToRedshift.Success can be used to confirm that data is flowing end-to-end Additionally, check the Put* APIs Bytes, Latency, and Request metrics for Firehose and or S3
  • 31. Amazon Kinesis Firehose Trouble-shooting What is being captured, and what is being delivered? Key question: How much data is being captured and delivered to the destination? Key metrics: Put* Records, Bytes, and Requests to determine the data volume captured by Firehose. Next check DeliveryToS3.Bytes/ Records and DeliverytoRedshift.Bytes/Records Additionally, check the S3 bucket related metrics above confirm the data volume delivered to S3. Verify with Incoming.Records and Incoming.Bytes
  • 32. Amazon Kinesis Firehose Troubleshooting Is my data showing ‘on time’? Key question: Is Firehose delivering data in a timely way? Key metrics: DeliveryToS3.DataFreshness in seconds helps determine freshness of data. It shows the maximum age in seconds of oldest undelivered record in Firehose. Any record that is older than this age has been delivered into S3.
  • 33. Amazon Kinesis Firehose Troubleshooting Is something wrong with my Firehose or bucket or cluster? Key question: Is something wrong with Firehose or with the customers’ own configuration, say S3 bucket? Key metrics: The DeliveryToS3.Success metric computes the sum of successful S3 Puts over sum of all S3 puts managed by Firehose. Similarly DeliveryToRedshift.Success metric computes sum of successful COPY commands over sum of all COPY commands. If this trends below “1” it suggests potential issues with the S3 bucket configuration such that Firehose puts to S3 are failing.
  • 34. Amazon Kinesis Firehose Troubleshooting Key steps if data isn’t showing up in S3 Check Incoming.Bytes and Incoming.Records metrics to ensure that data is Put successfully. Check DeliveryToS3.Success metric to ensure that Firehose is putting data to your S3 bucket. Ensure that the S3 bucket specified in your delivery stream still exists. Ensure that the IAM role specified in your delivery stream has access to your S3 bucket.
  • 35. Amazon Kinesis Firehose Troubleshooting Key steps if data isn’t showing up in Redshift Check Incoming.Bytes and Incoming.Records metrics to ensure that data is Put successfully, and the DeliveryToS3.Success metric to ensure that data is being staged in the S3 bucket. Check DeliveryToRedshift.Success metric to ensure that Firehose has copied data from your S3 bucket to the cluster. Check Redshift STL_LOAD_ERRORS table to verify the reason of the COPY failure. Ensure that Redshift config in Firehose is accurate
  • 37. Amazon Kinesis Streams is a service for workloads that requires custom processing, per incoming record, with sub-1 second processing latency, and a choice of stream processing frameworks. Amazon Kinesis Firehose is a service for workloads that require zero administration, ability to use existing analytics tools based on S3 or Redshift, and a data latency of 60 seconds or higher.
  • 39. Amazon Kinesis: Streaming data made easy Services make it easy to capture, deliver, and process streams on AWS Amazon Kinesis Streams Build your own custom applications that process or analyze streaming data Amazon Kinesis Firehose Easily load massive volumes of streaming data into Amazon S3 and Redshift