SlideShare una empresa de Scribd logo
1 de 41
WIFI: awsDevDay | PASS: CodeHappy
U P N E X T :
Real-Time Data Processing
Using AWS Lambda
T H A N K S T O O U R F R I E N D S A T :
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Chris Marshall, Solutions Architect
10/10/2017
SMC303
Real-Time Data Processing
Using AWS Lambda
Agenda
What’s Serverless Real-Time Data Processing?
Serverless Processing of Real-Time Streaming Data
Example Kinesis/Lambda Pipeline
Streaming Demo
Customer Story: Fannie Mae-Distributed Computing with Lambda
What’s Serverless Real-Time
Data Processing?
AWS Lambda
Bring your own code
• Node.js, Java, Python,
C#
• Bring your own libraries
(even native ones)
Simple resource model
• Select power rating from
128 MB to 1.5 GB
• CPU and network
allocated proportionately
Flexible use
• Synchronous or
asynchronous
• Integrated with other
AWS services
Flexible authorization
• Securely grant access to
resources and VPCs
• Fine-grained control for
invoking your functions
Amazon
S3
Amazon
DynamoDB
Amazon
Kinesis
AWS
CloudFormation
AWS
CloudTrail
Amazon
CloudWatch
Amazon
Cognito
Amazon
SNS
Amazon
SES
Cron
events
DATA STORES ENDPOINTS
CONFIGURATION REPOSITORIES EVENT/MESSAGE SERVICES
Lambda Event Sources
… more on the way!
AWS
CodeCommit
Amazon
API Gateway
Amazon
Alexa
AWS
IoT
AWS Step
Functions
Amazon
DynamoDB
Amazon
Kinesis
Amazon
S3
Amazon
SNS
ASYNCHRONOUS PUSH MODEL
STREAM PULL MODEL
Lambda Real-Time Event Sources
Amazon
Alexa
AWS
IoT
SYNCHRONOUS PUSH MODEL
Mapping owned by Event Source
Mapping owned by Lambda
Invokes Lambda via Event Source API
Lambda function invokes when new
records found on stream
Resource-based policy permissions
Lambda Execution role policy permissions
Concurrent executions
Sync invocation
Async Invocation
Sync invocation
Lambda polls the streams
HOW IT WORKS
Serverless Real-Time Data Processing Is..
Capture Data
Streams
IoT Data
Financial
Data
Log Data
No servers to
provision or
manage
EVENT SOURCE
Node.js
Python
Java
C#
Process Data
Streams
FUNCTION
Clickstream
Data
Output
Data
DATABASE
CLOUD
SERVICES
Serverless Processing of
Real-Time Streaming Data
Amazon Kinesis
Amazon Kinesis Offering: Managed services for streaming data
ingestion and processing.
• Amazon Kinesis Streams: Build applications that process or
analyze streaming data.
• Amazon Kinesis Firehose: Load massive volumes of
streaming data into Amazon S3, Amazon Redshift, and
Elasticsearch.
• Amazon Kinesis Analytics: Analyze data streams using SQL
queries.
Easy to use: Focus on quickly launching data streaming
applications instead of managing infrastructure.
Real-Time: Collect real-time data streams and promptly
respond to key business events and operational triggers.
Real-time latencies.
Processing Real-Time Streams: Lambda + Amazon Kinesis
Streaming data sent to Amazon
Kinesis and stored in shards
Multiple Lambda functions can be
triggered to process same Amazon
Kinesis stream for “fan out”
Lambda can process data and store
results ex. to DynamoDB, S3
Lambda can aggregate data to
services like Amazon Elasticsearch
Service for analytics
Lambda sends event data and
function info to Amazon CloudWatch
for capturing metrics and monitoring
Amazon
Kinesis
AWS
Lambda
Amazon
CloudWatch
Amazon
DynamoDB
AWS
Lambda
Amazon
Elasticsearch Service
Amazon
S3
Processing Streams: Set Up Amazon Kinesis Stream
Streams
Made up of Shards
Each Shard ingests/reads data up to 1 MB/sec
Each Shard emits/writes data up to 2 MB/sec
Each Shard supports 5 read transactions/sec
Data
All data is stored and is replayable for 24 hours (default)
Retention window can be configured up to 7 days
Partition key used to distribute PUTs across shards
Even partition key distribution optimizes throughput
Best Practice
Determine an initial size/shards to plan for expected maximum demand
ü Leverage “Help me decide how many shards I need” option in Console
ü Use formula for Number Of Shards:
max(incoming_write_bandwidth_in_KB/1000, outgoing_read_bandwidth_in_KB / 2000)
Processing Streams: Create Lambda functions
Memory
CPU allocation proportional to the memory configured
Increasing memory makes your code execute faster (if CPU bound)
Increasing memory allows for larger record sizes processed
Timeout
Increasing timeout allows for longer functions, but longer wait in case of errors
Retries
With Amazon Kinesis, Lambda retries until the data expires
(i.e. 24 hours)
Permission model
Execution role defined for Lambda must have permission to access the stream
Best Practice
Write Lambda function code to be stateless
Instantiate AWS clients & database clients outside the scope of the function handler to take
advantage of connection re-use.
Processing Streams: Configure Event Source
Amazon Kinesis mapped as event source in Lambda
Batch size
Max number of records that Lambda will send to one invocation
Not equivalent to effective batch size
Effective batch size is every 1 second – Calculated as:
MIN(records available, batch size, 6MB)
Increasing batch size allows fewer Lambda function invocations with more data
processed per function
Best Practices
Set to “Trim Horizon” for reading from start of
stream (all data)
Set to “Latest” for reading most recent data (LIFO) (latest data)
Set to “At timestamp” to pick up at a specific time
Processing streams: How It Works
Polling
Concurrent polling and processing per shard
Lambda polls every 1s if no records found
Will grab as much data as possible in one GetRecords call (Batch)
Batching
Batches are passed for invocation to Lambda through
function parameters
Batch size may impact duration if the Lambda function
takes longer to process more records
Sub batch in memory for invocation payload
Synchronous invocation
Batches invoked as synchronous RequestResponse type
Lambda honors Amazon Kinesis at least once semantics
Each shard blocks in order of synchronous invocation
Processing streams: Tuning throughput
If put / ingestion rate is greater than the theoretical throughput, your
processing is at risk of falling behind
Maximum theoretical throughput
# shards * 2MB / Lambda function duration (s)
Effective theoretical throughput
# shards * batch size (MB) / Lambda function duration (s)
… …
Source
Amazon Kinesis
Destination
1
Lambda
Destination
2
FunctionsShards
Lambda will scale automaticallyScale Amazon Kinesis by splitting or merging shards
Waits for responsePolls a batch
Processing streams: Tuning Throughput w/ Retries
Retries
Will retry on execution failures until the record is expired
Throttles and errors impacts duration and directly impacts throughput
Best Practice
Retry with exponential back-off of up to 60s
Effective theoretical throughput with retries
( # shards * batch size (MB) ) / ( function duration (s) * retries until expiry)
… …
Source
Amazon Kinesis
Destination
1
Lambda
Destination
2
FunctionsShards
Lambda will scale automatically
Receives errorPolls a batch
Receives error
Receives success
Processing streams: Common observations
Effective batch size may be less than configured during low throughput
Effective batch size will increase during higher throughput
Increased Lambda duration -> decreased # of invokes and GetRecords calls
Too many consumers of your stream may compete with Amazon Kinesis read
limits and induce ReadProvisionedThroughputExceeded errors and metrics
Amazon
Kinesis
AWS
Lambda
Processing streams: Monitoring with Cloudwatch
• GetRecords: (effective throughput)
• PutRecord: bytes, latency, records, etc
• GetRecords.IteratorAgeMilliseconds: how old your
last processed records were
Monitoring Amazon Kinesis Streams
Monitoring Lambda functions
• Invocation count: Time function invoked
• Duration: Execution/processing time
• Error count: Number of Errors
• Throttle count: Number of time function throttled
• Iterator Age: Time elapsed from batch received &
final record written to stream
• Review All Metrics
• Make Custom logs
• View RAM consumed
• Search for log events
Debugging
Example Kinesis/Lambda Pipeline
Kinesis/Lambda Demo
amzn.to/bigdata
Kinesis/Lambda Demo
amzn.to/bigdata
Quiz: If I set my batch size to 100, each Lambda call…
A) Will get exactly 100 records
B) Will get 100 records or less
C) Will get an average of 100 records
D) Will get 95 ReadProvisionedThroughputExceeded
errors
Kinesis/Lambda Demo
amzn.to/bigdata
Quiz: If I set my batch size to 100, each Lambda call…
A) Will get exactly 100 records
B) Will get 100 records or less
C) Will get an average of 100 records
D) Will get 95 ReadProvisionedThroughputExceeded
errors
Kinesis/Lambda Demo
amzn.to/bigdata
I think this session…
A) Was really useful
B) Was a too technical
C) Was not deep enough
D) When is lunch?
E) This guy is totally confusing
Kinesis/Lambda Demo
amzn.to/takeselfie
SELECT
STREAM COUNT(*) AS MUSTACH_COUNT,
STEP(ROWTIME BY INTERVAL '1' SECOND)
FROM SOURCE_STREAM
WHERE HAS_MUSTACH = TRUE;
End-to-End Architecture
Amazon
Kinesis
Stream
Amazon
Kinesis
Analytics
Amazon
Cognito
Amazon
Kinesis
Stream
Amazon
DynamoDB
Amazon
Lambda
Amazon
S3
JavaScript
SDK
Amazon
Rekognition
Amazon
Kinesis
Firehose
Amazon
S3
Kinesis
Ingestion
Stream
Kinesis
Analytics
Kinesis
Aggregate
Stream
Lambda
Function
DynamoDB
TableAmazon
Cognito
SELECT ROWTIME, userId, COUNT(*)
FROM STREAM
GROUP BY userId, FLOOR(ROWTIME to SECOND)
S3 Bucket
HTML, JavascriptAggregated DataRaw Device and
Quadrant Data
Demo architecture
The demo application
CREATE OR REPLACE STREAM DESTINATION_SQL_STREAM (UNIQUE_USER_COUNT INT, ANDROID_COUNT INT, IOS_COUNT INT, WINDOWS_PHONE_COUNT INT,
OTHER_OS_COUNT INT, QUADRANT_A_COUNT INT, QUADRANT_B_COUNT INT, QUADRANT_C_COUNT INT, QUADRANT_D_COUNT INT, WINDOW_TIME TIMESTAMP);
CREATE OR REPLACE STREAM DISTINCT_USER_STREAM (COGNITO_ID VARCHAR(64), DEVICE VARCHAR(32), OS VARCHAR(32), QUADRANT char(1), DT
TIMESTAMP);
CREATE OR REPLACE PUMP "DISTINCT_USER_PUMP" AS
INSERT INTO "DISTINCT_USER_STREAM"
SELECT STREAM DISTINCT
"cognitoId",
"device",
"os",
"quadrant",
FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO SECOND)
FROM "SOURCE_SQL_STREAM_001";
CREATE OR REPLACE PUMP "OUTPUT_PUMP" AS
INSERT INTO "DESTINATION_SQL_STREAM"
SELECT STREAM
COUNT("DISTINCT_USER_STREAM".COGNITO_ID) AS UNIQUE_USER_COUNT,
COUNT((CASE WHEN "DISTINCT_USER_STREAM".OS = 'Android' THEN COGNITO_ID ELSE null END)) AS ANDROID_COUNT,
COUNT((CASE WHEN "DISTINCT_USER_STREAM".OS = 'iOS' THEN COGNITO_ID ELSE null END)) AS IOS_COUNT,
COUNT((CASE WHEN "DISTINCT_USER_STREAM".OS = 'Windows Phone' THEN COGNITO_ID ELSE null END)) AS WINDOWS_PHONE_COUNT,
COUNT((CASE WHEN "DISTINCT_USER_STREAM".OS = 'other' THEN COGNITO_ID ELSE null END)) AS OTHER_OS_COUNT,
COUNT((CASE WHEN "DISTINCT_USER_STREAM".QUADRANT = 'A' THEN COGNITO_ID ELSE null END)) AS QUADRANT_A_COUNT,
COUNT((CASE WHEN "DISTINCT_USER_STREAM".QUADRANT = 'B' THEN COGNITO_ID ELSE null END)) AS QUADRANT_B_COUNT,
COUNT((CASE WHEN "DISTINCT_USER_STREAM".QUADRANT = 'C' THEN COGNITO_ID ELSE null END)) AS QUADRANT_C_COUNT,
COUNT((CASE WHEN "DISTINCT_USER_STREAM".QUADRANT = 'D' THEN COGNITO_ID ELSE null END)) AS QUADRANT_D_COUNT,
ROWTIME
FROM "DISTINCT_USER_STREAM"
GROUP BY
FLOOR("DISTINCT_USER_STREAM".ROWTIME TO SECOND);
Serverless Data Processing with
Distributed Computing
10101101
11001010
Serverless Distributed Computing: Map-Reduce Model
Why Serverless Data Processing with Distributed
Computing?
Remove Difficult infrastructure management
ü Cluster administration
ü Complex configuration tools
Enable simple, elastic, user-friendly distributed data
processing
ü Eliminate complexity of state management
ü Bring Distributed Computing power to the masses
Serverless Distributed Computing: MapReduce
Input Bucket
1
2
Driver
job state
Mapper Functions
map phase
S3
event
source
mapper
output
3 Coordinator
4
Reducer step 1
reducer output
5
recursively
create
n‘th reducer
step
ResultFinal Reducer
reduce phase
6
Fannie Mae
Distributed Computing with Lambda
Fannie Mae’s Serverless HPC Performance
Lambda service configuration:
• Initial burst rate = 2,000, incremental rate = 100 per
minute, throttle limit = 15,000.
• Lambda ramps up automatically from 2,000 to 15,000
concurrent executions.
Application Result:
• One simulation run of ~ 20 million mortgages takes 2
hours, >3 times faster than the existing process.
• The performance does not degrade during the ramp up
period.
• Lambdas’ CPU efficiency is close to 100%. Actual
elapsed time is consistent with the estimated elapsed
time based on Lambda billing time.
Number of New
Lambda Invocations
every 5 Mins
Maximum
Concurrent
Lambdas =
15,000
Complex Serverless HPC Reference
Architecture
Breakdown complex workload into multiple simple ones:
…
Reducer
Final Reducer Result
Input Bucket Mapper Functions
Reducers
Reducersmapper
output
mapper
output
reducer
output
Real-time Data Processing with
Lambda: Next Steps
Data Processing with AWS: Next steps
ü Learn more about AWS Serverless at
https://aws.amazon.com/serverless
ü Explore Real-time Clickstream Anomaly Detection with
Amazon Kinesis Analytics on the AWS Big Data Blog at
https://aws.amazon.com/blogs/big-data/real-time-
clickstream-anomaly-detection-with-amazon-kinesis-
analytics/
ü Explore the AWS Lambda Reference Architecture on GitHub:
§ Real-Time Streaming: https://github.com/awslabs/lambda-refarch-
streamprocessing
§ Distributed Computing Reference Architecture (serverless MapReduce)
https://github.com/awslabs/lambda-refarch-mapreduce
Data Processing with AWS: Next steps
ü Create an Amazon Kinesis stream. Visit the Amazon Kinesis
Console and configure a stream to receive data Ex. data from
Social media feeds.
ü Create & test a Lambda function to process streams from Amazon
Kinesis by visiting Lambda console. First 1M requests each month
are on us!
ü Read the Developer Guide and try the Lambda and Amazon
Kinesis Tutorial:
§ http://docs.aws.amazon.com/lambda/latest/dg/with-
kinesis.html
ü Send questions, comments, feedback to the AWS Lambda Forums
Thank You!
Don’t Forget Evaluations!

Más contenido relacionado

La actualidad más candente

AWS Lambda - Event Driven Event-driven Code in the Cloud
AWS Lambda - Event Driven Event-driven Code in the CloudAWS Lambda - Event Driven Event-driven Code in the Cloud
AWS Lambda - Event Driven Event-driven Code in the CloudAmazon Web Services
 
A Walk in the Cloud with AWS Lambda
A Walk in the Cloud with AWS LambdaA Walk in the Cloud with AWS Lambda
A Walk in the Cloud with AWS LambdaAmazon Web Services
 
A Walk in the Cloud with AWS Lambda
A Walk in the Cloud with AWS LambdaA Walk in the Cloud with AWS Lambda
A Walk in the Cloud with AWS LambdaAmazon Web Services
 
Aws lambda and accesing AWS RDS - Clouddictive
Aws lambda and accesing AWS RDS - ClouddictiveAws lambda and accesing AWS RDS - Clouddictive
Aws lambda and accesing AWS RDS - ClouddictiveClouddictive
 
Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesAmazon Web Services
 
AWS Lambda and Serverless framework: lessons learned while building a serverl...
AWS Lambda and Serverless framework: lessons learned while building a serverl...AWS Lambda and Serverless framework: lessons learned while building a serverl...
AWS Lambda and Serverless framework: lessons learned while building a serverl...Luciano Mammino
 
(CMP403) AWS Lambda: Simplifying Big Data Workloads
(CMP403) AWS Lambda: Simplifying Big Data Workloads(CMP403) AWS Lambda: Simplifying Big Data Workloads
(CMP403) AWS Lambda: Simplifying Big Data WorkloadsAmazon Web Services
 
Aws Lambda Cart Microservice Server Less
Aws Lambda Cart Microservice Server LessAws Lambda Cart Microservice Server Less
Aws Lambda Cart Microservice Server LessDhanu Gupta
 
Building Serverless Backends with AWS Lambda and Amazon API Gateway
Building Serverless Backends with AWS Lambda and Amazon API GatewayBuilding Serverless Backends with AWS Lambda and Amazon API Gateway
Building Serverless Backends with AWS Lambda and Amazon API GatewayAmazon Web Services
 
(CMP407) Lambda as Cron: Scheduling Invocations in AWS Lambda
(CMP407) Lambda as Cron: Scheduling Invocations in AWS Lambda(CMP407) Lambda as Cron: Scheduling Invocations in AWS Lambda
(CMP407) Lambda as Cron: Scheduling Invocations in AWS LambdaAmazon Web Services
 
AWS Lambda: Event-driven Code for Devices and the Cloud
AWS Lambda: Event-driven Code for Devices and the CloudAWS Lambda: Event-driven Code for Devices and the Cloud
AWS Lambda: Event-driven Code for Devices and the CloudAmazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Serverless archtiectures
Serverless archtiecturesServerless archtiectures
Serverless archtiecturesIegor Fadieiev
 
AWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the CloudAWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the CloudAmazon Web Services
 
Lambda and serverless - DevOps North East Jan 2017
Lambda and serverless - DevOps North East Jan 2017Lambda and serverless - DevOps North East Jan 2017
Lambda and serverless - DevOps North East Jan 2017Mike Shutlar
 
10 Tips For Serverless Backends With NodeJS and AWS Lambda
10 Tips For Serverless Backends With NodeJS and AWS Lambda10 Tips For Serverless Backends With NodeJS and AWS Lambda
10 Tips For Serverless Backends With NodeJS and AWS LambdaJim Lynch
 

La actualidad más candente (20)

AWS Lambda - Event Driven Event-driven Code in the Cloud
AWS Lambda - Event Driven Event-driven Code in the CloudAWS Lambda - Event Driven Event-driven Code in the Cloud
AWS Lambda - Event Driven Event-driven Code in the Cloud
 
A Walk in the Cloud with AWS Lambda
A Walk in the Cloud with AWS LambdaA Walk in the Cloud with AWS Lambda
A Walk in the Cloud with AWS Lambda
 
A Walk in the Cloud with AWS Lambda
A Walk in the Cloud with AWS LambdaA Walk in the Cloud with AWS Lambda
A Walk in the Cloud with AWS Lambda
 
Aws lambda and accesing AWS RDS - Clouddictive
Aws lambda and accesing AWS RDS - ClouddictiveAws lambda and accesing AWS RDS - Clouddictive
Aws lambda and accesing AWS RDS - Clouddictive
 
Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless Architectures
 
AWS Lambda and Serverless framework: lessons learned while building a serverl...
AWS Lambda and Serverless framework: lessons learned while building a serverl...AWS Lambda and Serverless framework: lessons learned while building a serverl...
AWS Lambda and Serverless framework: lessons learned while building a serverl...
 
(CMP403) AWS Lambda: Simplifying Big Data Workloads
(CMP403) AWS Lambda: Simplifying Big Data Workloads(CMP403) AWS Lambda: Simplifying Big Data Workloads
(CMP403) AWS Lambda: Simplifying Big Data Workloads
 
Aws Lambda Cart Microservice Server Less
Aws Lambda Cart Microservice Server LessAws Lambda Cart Microservice Server Less
Aws Lambda Cart Microservice Server Less
 
Building Serverless Backends with AWS Lambda and Amazon API Gateway
Building Serverless Backends with AWS Lambda and Amazon API GatewayBuilding Serverless Backends with AWS Lambda and Amazon API Gateway
Building Serverless Backends with AWS Lambda and Amazon API Gateway
 
(CMP407) Lambda as Cron: Scheduling Invocations in AWS Lambda
(CMP407) Lambda as Cron: Scheduling Invocations in AWS Lambda(CMP407) Lambda as Cron: Scheduling Invocations in AWS Lambda
(CMP407) Lambda as Cron: Scheduling Invocations in AWS Lambda
 
AWS Lambda: Event-driven Code for Devices and the Cloud
AWS Lambda: Event-driven Code for Devices and the CloudAWS Lambda: Event-driven Code for Devices and the Cloud
AWS Lambda: Event-driven Code for Devices and the Cloud
 
AWS Lambda
AWS LambdaAWS Lambda
AWS Lambda
 
Deep Dive on AWS Lambda
Deep Dive on AWS LambdaDeep Dive on AWS Lambda
Deep Dive on AWS Lambda
 
Serverless for Developers
Serverless for DevelopersServerless for Developers
Serverless for Developers
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Intro to AWS Lambda
Intro to AWS Lambda Intro to AWS Lambda
Intro to AWS Lambda
 
Serverless archtiectures
Serverless archtiecturesServerless archtiectures
Serverless archtiectures
 
AWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the CloudAWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the Cloud
 
Lambda and serverless - DevOps North East Jan 2017
Lambda and serverless - DevOps North East Jan 2017Lambda and serverless - DevOps North East Jan 2017
Lambda and serverless - DevOps North East Jan 2017
 
10 Tips For Serverless Backends With NodeJS and AWS Lambda
10 Tips For Serverless Backends With NodeJS and AWS Lambda10 Tips For Serverless Backends With NodeJS and AWS Lambda
10 Tips For Serverless Backends With NodeJS and AWS Lambda
 

Similar a Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017

Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Amazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
SMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS LambdaSMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
Building Big Data Applications with Serverless Architectures -  June 2017 AWS...Building Big Data Applications with Serverless Architectures -  June 2017 AWS...
Building Big Data Applications with Serverless Architectures - June 2017 AWS...Amazon Web Services
 
Raleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS LambdaRaleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS LambdaAmazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Real Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaReal Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaAmazon Web Services
 
AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)
AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)
AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)Amazon Web Services
 
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...Amazon Web Services
 
Real-Time Processing Using AWS Lambda
Real-Time Processing Using AWS LambdaReal-Time Processing Using AWS Lambda
Real-Time Processing Using AWS LambdaAmazon Web Services
 
Real-time Data Processing using AWS Lambda
Real-time Data Processing using AWS LambdaReal-time Data Processing using AWS Lambda
Real-time Data Processing using AWS LambdaAmazon Web Services
 
AWS Summit Seoul 2015 - AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
AWS Summit Seoul 2015 -  AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석AWS Summit Seoul 2015 -  AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
AWS Summit Seoul 2015 - AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석Amazon Web Services Korea
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsAmazon Web Services
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Amazon Web Services
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxRenjithPillai26
 

Similar a Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017 (20)

Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
SMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS LambdaSMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS Lambda
 
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
Building Big Data Applications with Serverless Architectures -  June 2017 AWS...Building Big Data Applications with Serverless Architectures -  June 2017 AWS...
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
 
Raleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS LambdaRaleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS Lambda
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Real Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaReal Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS Lambda
 
AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)
AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)
AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)
 
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
 
Real-Time Processing Using AWS Lambda
Real-Time Processing Using AWS LambdaReal-Time Processing Using AWS Lambda
Real-Time Processing Using AWS Lambda
 
Real-time Data Processing using AWS Lambda
Real-time Data Processing using AWS LambdaReal-time Data Processing using AWS Lambda
Real-time Data Processing using AWS Lambda
 
AWS Summit Seoul 2015 - AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
AWS Summit Seoul 2015 -  AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석AWS Summit Seoul 2015 -  AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
AWS Summit Seoul 2015 - AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
Real-Time Event Processing
Real-Time Event ProcessingReal-Time Event Processing
Real-Time Event Processing
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptx
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017

  • 1.
  • 2. WIFI: awsDevDay | PASS: CodeHappy U P N E X T : Real-Time Data Processing Using AWS Lambda
  • 3. T H A N K S T O O U R F R I E N D S A T :
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Chris Marshall, Solutions Architect 10/10/2017 SMC303 Real-Time Data Processing Using AWS Lambda
  • 5. Agenda What’s Serverless Real-Time Data Processing? Serverless Processing of Real-Time Streaming Data Example Kinesis/Lambda Pipeline Streaming Demo Customer Story: Fannie Mae-Distributed Computing with Lambda
  • 7. AWS Lambda Bring your own code • Node.js, Java, Python, C# • Bring your own libraries (even native ones) Simple resource model • Select power rating from 128 MB to 1.5 GB • CPU and network allocated proportionately Flexible use • Synchronous or asynchronous • Integrated with other AWS services Flexible authorization • Securely grant access to resources and VPCs • Fine-grained control for invoking your functions
  • 8. Amazon S3 Amazon DynamoDB Amazon Kinesis AWS CloudFormation AWS CloudTrail Amazon CloudWatch Amazon Cognito Amazon SNS Amazon SES Cron events DATA STORES ENDPOINTS CONFIGURATION REPOSITORIES EVENT/MESSAGE SERVICES Lambda Event Sources … more on the way! AWS CodeCommit Amazon API Gateway Amazon Alexa AWS IoT AWS Step Functions
  • 9. Amazon DynamoDB Amazon Kinesis Amazon S3 Amazon SNS ASYNCHRONOUS PUSH MODEL STREAM PULL MODEL Lambda Real-Time Event Sources Amazon Alexa AWS IoT SYNCHRONOUS PUSH MODEL Mapping owned by Event Source Mapping owned by Lambda Invokes Lambda via Event Source API Lambda function invokes when new records found on stream Resource-based policy permissions Lambda Execution role policy permissions Concurrent executions Sync invocation Async Invocation Sync invocation Lambda polls the streams HOW IT WORKS
  • 10. Serverless Real-Time Data Processing Is.. Capture Data Streams IoT Data Financial Data Log Data No servers to provision or manage EVENT SOURCE Node.js Python Java C# Process Data Streams FUNCTION Clickstream Data Output Data DATABASE CLOUD SERVICES
  • 12. Amazon Kinesis Amazon Kinesis Offering: Managed services for streaming data ingestion and processing. • Amazon Kinesis Streams: Build applications that process or analyze streaming data. • Amazon Kinesis Firehose: Load massive volumes of streaming data into Amazon S3, Amazon Redshift, and Elasticsearch. • Amazon Kinesis Analytics: Analyze data streams using SQL queries. Easy to use: Focus on quickly launching data streaming applications instead of managing infrastructure. Real-Time: Collect real-time data streams and promptly respond to key business events and operational triggers. Real-time latencies.
  • 13. Processing Real-Time Streams: Lambda + Amazon Kinesis Streaming data sent to Amazon Kinesis and stored in shards Multiple Lambda functions can be triggered to process same Amazon Kinesis stream for “fan out” Lambda can process data and store results ex. to DynamoDB, S3 Lambda can aggregate data to services like Amazon Elasticsearch Service for analytics Lambda sends event data and function info to Amazon CloudWatch for capturing metrics and monitoring Amazon Kinesis AWS Lambda Amazon CloudWatch Amazon DynamoDB AWS Lambda Amazon Elasticsearch Service Amazon S3
  • 14. Processing Streams: Set Up Amazon Kinesis Stream Streams Made up of Shards Each Shard ingests/reads data up to 1 MB/sec Each Shard emits/writes data up to 2 MB/sec Each Shard supports 5 read transactions/sec Data All data is stored and is replayable for 24 hours (default) Retention window can be configured up to 7 days Partition key used to distribute PUTs across shards Even partition key distribution optimizes throughput Best Practice Determine an initial size/shards to plan for expected maximum demand ü Leverage “Help me decide how many shards I need” option in Console ü Use formula for Number Of Shards: max(incoming_write_bandwidth_in_KB/1000, outgoing_read_bandwidth_in_KB / 2000)
  • 15. Processing Streams: Create Lambda functions Memory CPU allocation proportional to the memory configured Increasing memory makes your code execute faster (if CPU bound) Increasing memory allows for larger record sizes processed Timeout Increasing timeout allows for longer functions, but longer wait in case of errors Retries With Amazon Kinesis, Lambda retries until the data expires (i.e. 24 hours) Permission model Execution role defined for Lambda must have permission to access the stream Best Practice Write Lambda function code to be stateless Instantiate AWS clients & database clients outside the scope of the function handler to take advantage of connection re-use.
  • 16. Processing Streams: Configure Event Source Amazon Kinesis mapped as event source in Lambda Batch size Max number of records that Lambda will send to one invocation Not equivalent to effective batch size Effective batch size is every 1 second – Calculated as: MIN(records available, batch size, 6MB) Increasing batch size allows fewer Lambda function invocations with more data processed per function Best Practices Set to “Trim Horizon” for reading from start of stream (all data) Set to “Latest” for reading most recent data (LIFO) (latest data) Set to “At timestamp” to pick up at a specific time
  • 17. Processing streams: How It Works Polling Concurrent polling and processing per shard Lambda polls every 1s if no records found Will grab as much data as possible in one GetRecords call (Batch) Batching Batches are passed for invocation to Lambda through function parameters Batch size may impact duration if the Lambda function takes longer to process more records Sub batch in memory for invocation payload Synchronous invocation Batches invoked as synchronous RequestResponse type Lambda honors Amazon Kinesis at least once semantics Each shard blocks in order of synchronous invocation
  • 18. Processing streams: Tuning throughput If put / ingestion rate is greater than the theoretical throughput, your processing is at risk of falling behind Maximum theoretical throughput # shards * 2MB / Lambda function duration (s) Effective theoretical throughput # shards * batch size (MB) / Lambda function duration (s) … … Source Amazon Kinesis Destination 1 Lambda Destination 2 FunctionsShards Lambda will scale automaticallyScale Amazon Kinesis by splitting or merging shards Waits for responsePolls a batch
  • 19. Processing streams: Tuning Throughput w/ Retries Retries Will retry on execution failures until the record is expired Throttles and errors impacts duration and directly impacts throughput Best Practice Retry with exponential back-off of up to 60s Effective theoretical throughput with retries ( # shards * batch size (MB) ) / ( function duration (s) * retries until expiry) … … Source Amazon Kinesis Destination 1 Lambda Destination 2 FunctionsShards Lambda will scale automatically Receives errorPolls a batch Receives error Receives success
  • 20. Processing streams: Common observations Effective batch size may be less than configured during low throughput Effective batch size will increase during higher throughput Increased Lambda duration -> decreased # of invokes and GetRecords calls Too many consumers of your stream may compete with Amazon Kinesis read limits and induce ReadProvisionedThroughputExceeded errors and metrics Amazon Kinesis AWS Lambda
  • 21. Processing streams: Monitoring with Cloudwatch • GetRecords: (effective throughput) • PutRecord: bytes, latency, records, etc • GetRecords.IteratorAgeMilliseconds: how old your last processed records were Monitoring Amazon Kinesis Streams Monitoring Lambda functions • Invocation count: Time function invoked • Duration: Execution/processing time • Error count: Number of Errors • Throttle count: Number of time function throttled • Iterator Age: Time elapsed from batch received & final record written to stream • Review All Metrics • Make Custom logs • View RAM consumed • Search for log events Debugging
  • 24. Kinesis/Lambda Demo amzn.to/bigdata Quiz: If I set my batch size to 100, each Lambda call… A) Will get exactly 100 records B) Will get 100 records or less C) Will get an average of 100 records D) Will get 95 ReadProvisionedThroughputExceeded errors
  • 25. Kinesis/Lambda Demo amzn.to/bigdata Quiz: If I set my batch size to 100, each Lambda call… A) Will get exactly 100 records B) Will get 100 records or less C) Will get an average of 100 records D) Will get 95 ReadProvisionedThroughputExceeded errors
  • 26. Kinesis/Lambda Demo amzn.to/bigdata I think this session… A) Was really useful B) Was a too technical C) Was not deep enough D) When is lunch? E) This guy is totally confusing
  • 28. SELECT STREAM COUNT(*) AS MUSTACH_COUNT, STEP(ROWTIME BY INTERVAL '1' SECOND) FROM SOURCE_STREAM WHERE HAS_MUSTACH = TRUE; End-to-End Architecture Amazon Kinesis Stream Amazon Kinesis Analytics Amazon Cognito Amazon Kinesis Stream Amazon DynamoDB Amazon Lambda Amazon S3 JavaScript SDK Amazon Rekognition Amazon Kinesis Firehose Amazon S3
  • 29. Kinesis Ingestion Stream Kinesis Analytics Kinesis Aggregate Stream Lambda Function DynamoDB TableAmazon Cognito SELECT ROWTIME, userId, COUNT(*) FROM STREAM GROUP BY userId, FLOOR(ROWTIME to SECOND) S3 Bucket HTML, JavascriptAggregated DataRaw Device and Quadrant Data Demo architecture
  • 30. The demo application CREATE OR REPLACE STREAM DESTINATION_SQL_STREAM (UNIQUE_USER_COUNT INT, ANDROID_COUNT INT, IOS_COUNT INT, WINDOWS_PHONE_COUNT INT, OTHER_OS_COUNT INT, QUADRANT_A_COUNT INT, QUADRANT_B_COUNT INT, QUADRANT_C_COUNT INT, QUADRANT_D_COUNT INT, WINDOW_TIME TIMESTAMP); CREATE OR REPLACE STREAM DISTINCT_USER_STREAM (COGNITO_ID VARCHAR(64), DEVICE VARCHAR(32), OS VARCHAR(32), QUADRANT char(1), DT TIMESTAMP); CREATE OR REPLACE PUMP "DISTINCT_USER_PUMP" AS INSERT INTO "DISTINCT_USER_STREAM" SELECT STREAM DISTINCT "cognitoId", "device", "os", "quadrant", FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO SECOND) FROM "SOURCE_SQL_STREAM_001"; CREATE OR REPLACE PUMP "OUTPUT_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM COUNT("DISTINCT_USER_STREAM".COGNITO_ID) AS UNIQUE_USER_COUNT, COUNT((CASE WHEN "DISTINCT_USER_STREAM".OS = 'Android' THEN COGNITO_ID ELSE null END)) AS ANDROID_COUNT, COUNT((CASE WHEN "DISTINCT_USER_STREAM".OS = 'iOS' THEN COGNITO_ID ELSE null END)) AS IOS_COUNT, COUNT((CASE WHEN "DISTINCT_USER_STREAM".OS = 'Windows Phone' THEN COGNITO_ID ELSE null END)) AS WINDOWS_PHONE_COUNT, COUNT((CASE WHEN "DISTINCT_USER_STREAM".OS = 'other' THEN COGNITO_ID ELSE null END)) AS OTHER_OS_COUNT, COUNT((CASE WHEN "DISTINCT_USER_STREAM".QUADRANT = 'A' THEN COGNITO_ID ELSE null END)) AS QUADRANT_A_COUNT, COUNT((CASE WHEN "DISTINCT_USER_STREAM".QUADRANT = 'B' THEN COGNITO_ID ELSE null END)) AS QUADRANT_B_COUNT, COUNT((CASE WHEN "DISTINCT_USER_STREAM".QUADRANT = 'C' THEN COGNITO_ID ELSE null END)) AS QUADRANT_C_COUNT, COUNT((CASE WHEN "DISTINCT_USER_STREAM".QUADRANT = 'D' THEN COGNITO_ID ELSE null END)) AS QUADRANT_D_COUNT, ROWTIME FROM "DISTINCT_USER_STREAM" GROUP BY FLOOR("DISTINCT_USER_STREAM".ROWTIME TO SECOND);
  • 31. Serverless Data Processing with Distributed Computing 10101101 11001010
  • 32. Serverless Distributed Computing: Map-Reduce Model Why Serverless Data Processing with Distributed Computing? Remove Difficult infrastructure management ü Cluster administration ü Complex configuration tools Enable simple, elastic, user-friendly distributed data processing ü Eliminate complexity of state management ü Bring Distributed Computing power to the masses
  • 33. Serverless Distributed Computing: MapReduce Input Bucket 1 2 Driver job state Mapper Functions map phase S3 event source mapper output 3 Coordinator 4 Reducer step 1 reducer output 5 recursively create n‘th reducer step ResultFinal Reducer reduce phase 6
  • 35. Fannie Mae’s Serverless HPC Performance Lambda service configuration: • Initial burst rate = 2,000, incremental rate = 100 per minute, throttle limit = 15,000. • Lambda ramps up automatically from 2,000 to 15,000 concurrent executions. Application Result: • One simulation run of ~ 20 million mortgages takes 2 hours, >3 times faster than the existing process. • The performance does not degrade during the ramp up period. • Lambdas’ CPU efficiency is close to 100%. Actual elapsed time is consistent with the estimated elapsed time based on Lambda billing time. Number of New Lambda Invocations every 5 Mins Maximum Concurrent Lambdas = 15,000
  • 36. Complex Serverless HPC Reference Architecture Breakdown complex workload into multiple simple ones: … Reducer Final Reducer Result Input Bucket Mapper Functions Reducers Reducersmapper output mapper output reducer output
  • 37. Real-time Data Processing with Lambda: Next Steps
  • 38. Data Processing with AWS: Next steps ü Learn more about AWS Serverless at https://aws.amazon.com/serverless ü Explore Real-time Clickstream Anomaly Detection with Amazon Kinesis Analytics on the AWS Big Data Blog at https://aws.amazon.com/blogs/big-data/real-time- clickstream-anomaly-detection-with-amazon-kinesis- analytics/ ü Explore the AWS Lambda Reference Architecture on GitHub: § Real-Time Streaming: https://github.com/awslabs/lambda-refarch- streamprocessing § Distributed Computing Reference Architecture (serverless MapReduce) https://github.com/awslabs/lambda-refarch-mapreduce
  • 39. Data Processing with AWS: Next steps ü Create an Amazon Kinesis stream. Visit the Amazon Kinesis Console and configure a stream to receive data Ex. data from Social media feeds. ü Create & test a Lambda function to process streams from Amazon Kinesis by visiting Lambda console. First 1M requests each month are on us! ü Read the Developer Guide and try the Lambda and Amazon Kinesis Tutorial: § http://docs.aws.amazon.com/lambda/latest/dg/with- kinesis.html ü Send questions, comments, feedback to the AWS Lambda Forums