SlideShare una empresa de Scribd logo
1 de 41
Descargar para leer sin conexión
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tara E. Walker | Technical Evangelist | @taraw
April 2017
SMC303
Real-Time Data Processing Using
AWS Lambda
Agenda
What’s Serverless Real-Time Data Processing?
Processing Streaming Data with Lambda and Kinesis
Streaming Data Processing Demo
Data Processing Pipeline with Lambda and MapReduce
Building a Big Data Processing Solution Demo
What’s Serverless Real-Time Data Processing?
Serverless Processing of Real-Time Streaming Data
Streaming Data Processing Demo
Serverless Data Processing with Distributed Computing
Customer Story:
Fannie Mae-Distributed Computing with Lambda
What’s Serverless Real-Time
Data Processing?
AWS Lambda
Efficient performance at scale Easy to author, deploy,
maintain, secure & manage. Focus on business logic
to build back-end services that perform at scale.
Bring Your Own Code: Stateless, event-driven code
with native support for Node.js, Java, Python and C#
languages.
No Infrastructure to manage: Compute without
managing infrastructure like Amazon EC2 instances
and Auto Scaling groups.
Cost-effective: Automatically matches capacity to
request rate. Compute cost 100 ms increments.
Triggered by events: Direct Sync & Async API calls,
AWS service integrations, and 3rd party triggers.
Amazon
S3
Amazon
DynamoDB
Amazon
Kinesis
AWS
CloudFormation
AWS
CloudTrail
Amazon
CloudWatch
Amazon
Cognito
Amazon
SNS
Amazon
SES
Cron
events
DATA STORES ENDPOINTS
CONFIGURATION REPOSITORIES EVENT/MESSAGE SERVICES
Lambda Event Sources
… more on the way!
AWS
CodeCommit
Amazon
API Gateway
Amazon
Alexa
AWS
IoT
AWS Step
Functions
Serverless Real-Time Data Processing Is..
Capture Data
Streams
IoT Data
Financial
Data
Log Data
No servers to
provision or
manage
EVENT SOURCE
Node.js
Python
Java
C#
Process Data
Streams
FUNCTION
Clickstream
Data
Output
Data
DATABASE
CLOUD
SERVICES
Amazon
DynamoDB
Amazon
Kinesis
Amazon
S3
Amazon
SNS
ASYNCHRONOUS PUSH MODEL
STREAM PULL MODEL
Lambda Real-Time Event Sources
Amazon
Alexa
AWS
IoT
SYNCHRONOUS PUSH MODEL
Mapping owned by Event Source
Mapping owned by Lambda
Invokes Lambda via Event Source API
Lambda function invokes when new
records found on stream
Resource-based policy permissions
Lambda Execution role policy permissions
Concurrent executions
Sync invocation
Async Invocation
Sync invocation
Lambda polls the streams
HOW IT WORKS
Serverless Processing of
Real-Time Streaming Data
Amazon Kinesis
Real-Time: Collect real-time data streams and
promptly respond to key business events and
operational triggers. Real-time latencies.
Easy to use: Focus on quickly launching data
streaming applications instead of managing
infrastructure.
Amazon Kinesis Offering: Managed services for
streaming data ingestion and processing.
• Amazon Kinesis Streams: Build applications
that process or analyze streaming data.
• Amazon Kinesis Firehose: Load massive
volumes of streaming data into Amazon S3
and Amazon Redshift.
• Amazon Kinesis Analytics: Analyze data
streams using SQL queries.
Processing Real-Time Streams: Lambda + Amazon Kinesis
Streaming data sent to Amazon
Kinesis and stored in shards
Multiple Lambda functions can be
triggered to process same Amazon
Kinesis stream for “fan out”
Lambda can process data and store
results ex. to DynamoDB, S3
Lambda can aggregate data to
services like Amazon Elasticsearch
Service for analytics
Lambda sends event data and
function info to Amazon CloudWatch
for capturing metrics and monitoring
Amazon
Kinesis
AWS
Lambda
Amazon
CloudWatch
Amazn
DynamoDB
AWS
Lambda
Amazon
Elasticsearch Service
Amazon
S3
Processing Streams: Set Up Amazon Kinesis Stream
Streams
Made up of Shards
Each Shard ingests/reads data up to 1 MB/sec
Each Shard emits/writes data up to 2 MB/sec
Each shard supports 5 reads/sec
Data
All data is stored and is replayable for 24 hours
Make sure partition key distribution is even to optimize parallel throughput
Partition key used to distribute PUTs across shards, choose key with more groups than
shards
Best Practice
Determine an initial size/shards to plan for expected maximum demand
 Leverage “Help me decide how many shards I need” option in Console
 Use formula for Number Of Shards:
max(incoming_write_bandwidth_in_KB/1000, outgoing_read_bandwidth_in_KB / 2000)
Processing Streams: Create Lambda functions
Memory
CPU allocation proportional to the memory configured
Increasing memory makes your code execute faster (if CPU bound)
Increasing memory allows for larger record sizes processed
Timeout
Increasing timeout allows for longer functions, but longer wait in case of errors
Permission model
Execution role defined for Lambda must have permission to access the stream
Retries
With Amazon Kinesis, Lambda retries until the data expires
(24 hours)
Best Practice
Write Lambda function code to be stateless
Instantiate AWS clients & database clients outside the scope of the function handler
Processing Streams: Configure Event Source
Amazon Kinesis mapped as event source in Lambda
Batch size
Max number of records that Lambda will send to one invocation
Not equivalent to effective batch size
Effective batch size is every 250 ms – Calculated as:
MIN(records available, batch size, 6MB)
Increasing batch size allows fewer Lambda function invocations with more
data processed per function
Best Practices
Set to “Trim Horizon” for reading from start of
stream (all data)
Set to “Latest” for reading most recent data (LIFO) (latest data)
Processing streams: How It Works
Polling
Concurrent polling and processing per shard
Lambda polls every 250 ms if no records found
Will grab as much data as possible in one GetRecords call (Batch)
Batching
Batches are passed for invocation to Lambda through
function parameters
Batch size may impact duration if the Lambda function
takes longer to process more records
Sub batch in memory for invocation payload
Synchronous invocation
Batches invoked as synchronous RequestResponse type
Lambda honors Amazon Kinesis at least once semantics
Each shard blocks in order of synchronous invocation
Processing streams: Tuning throughput
If put / ingestion rate is greater than the theoretical throughput, your
processing is at risk of falling behind
Maximum theoretical throughput
# shards * 2MB / Lambda function duration (s)
Effective theoretical throughput
# shards * batch size (MB) / Lambda function duration (s)
… …
Source
Amazon Kinesis
Destination
1
Lambda
Destination
2
FunctionsShards
Lambda will scale automaticallyScale Amazon Kinesis by splitting or merging shards
Waits for responsePolls a batch
Processing streams: Tuning Throughput w/ Retries
Retries
Will retry on execution failures until the record is expired
Throttles and errors impacts duration and directly impacts throughput
Best Practice
Retry with exponential backoff of up to 60s
Effective theoretical throughput with retries
( # shards * batch size (MB) ) / ( function duration (s) * retries until expiry)
… …
Source
Amazon Kinesis
Destination
1
Lambda
Destination
2
FunctionsShards
Lambda will scale automaticallyScale Amazon Kinesis by splitting or merging shards
Receives errorPolls a batch
Receives error
Receives success
Processing streams: Common observations
Effective batch size may be less than configured during low throughput
Effective batch size will increase during higher throughput
Increased Lambda duration -> decreased # of invokes and GetRecord calls
Too many consumers of your stream may compete with Amazon Kinesis read
limits and induce ReadProvisionedThroughputExceeded errors and metrics
Amazon
Kinesis
AWS
Lambda
Processing streams: Monitoring with Cloudwatch
• GetRecords: (effective throughput)
• PutRecord : bytes, latency, records, etc
• GetRecords.IteratorAgeMilliseconds: how old your
last processed records were
Monitoring Amazon Kinesis Streams
Monitoring Lambda functions
• Invocation count: Time function invoked
• Duration: Execution/processing time
• Error count: Number of Errors
• Throttle count: Number of time function throttled
• Iterator Age: Time elapsed from batch received &
final record written to stream
• Review All Metrics
• Make Custom logs
• View RAM consumed
• Search for log events
Debugging
AWS X-Ray
Coming soon!
Streaming Data Processing
Demo
Serverless Data Processing with
Distributed Computing
10101101
11001010
Serverless Distributed Computing: Map-Reduce Model
Why Serverless Data Processing with Distributed
Computing?
Remove Difficult infrastructure management
 Cluster administration
 Complex configuration tools
Enable simple, elastic, user-friendly distributed data
processing
 Eliminate complexity of state management
 Bring Distributed Computing power to the masses
Serverless Distributed Computing: Map-Reduce Model
Why Serverless Data Processing with Distributed
Computing?
Eliminate utilization concerns
 Makes code simpler by removes complexities of multi-
threading processing to optimize server usage
 Cost-effective option to run ad hoc MapReduce jobs
Easier, automatic horizontal scaling
 Provide ability to process scientific and analytics
applications
Serverless Distributed Computing: MapReduce
Input Bucket
1
2
Driver
job state
Mapper Functions
map phase
S3
event
source
mapper
output
3 Coordinator
4
Reducer step 1
reducer output
5
recursively
create
n‘th reducer
step
ResultFinal Reducer
reduce phase
6
Serverless Distributed Computing: PyWren
PyWren Prototype Developed at University of California, Berkeley
Uses Python with AWS Lambda stateless functions for large scale data
analytics
Achieved @ 30-40 MB/s write and read performance per-core to S3
object store
Scaled to 60-80 GB/s across 2800 simultaneous functions
Serverless Distributed Computing: Benchmark
Using Amazon MapReduce Reference Architecture Framework
with Lambda
Dataset
Queries:
 Scan query (90 M Rows, 6.36 GB of data)
 Select query on Page Rankings
 Aggregation query on UserVisits ( 775M rows, ~127GB of
data)
Rankings
(rows)
Rankings
(bytes)
UserVisits
(rows)
UserVisits
(bytes)
Documents
(bytes)
90 Million 6.38 GB 775 Million 126.8 GB 136.9 GB
Serverless Distributed Computing: Benchmark
Using Amazon MapReduce Reference Architecture Framework
with Lambda
Subset of the Amplab benchmark ran to compare with other data
processing frameworks
Performance Benchmarks: Execution time for each workload in seconds
TECHNOLOGY SCAN 1A SCAN 1B AGGREGATE 2A
Amazon Redshift (HDD) 2.49 2.61 25.46
Serverless MapReduce 39 47 200
Impala - Disk - 1.2.3 12.015 12.015 113.72
Impala - Mem - 1.2.3 2.17 3.01 84.35
Shark - Disk - 0.8.1 6.6 7 151.4
Shark - Mem - 0.8.1 1.7 1.8 83.7
Hive - 0.12 YARN 50.49 59.93 730.62
Tez - 0.2.0 28.22 36.35 377.48
Fannie Mae: Distributed
Computing with Lambda
© 2017 Fannie Mae. Trademarks of Fannie Mae. 29
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Bin Lu, Fannie Mae
4/18/2017
High Performance Computing Using
AWS Lambda for Financial Modeling
© 2017 Fannie Mae. Trademarks of Fannie Mae. 304/19/2017
Fannie Mae Business
Fannie Mae is a leading source of financing for mortgage
lenders:
• Providing access to affordable mortgage financing in all
market conditions.
• Effectively managing and reducing risk to our business,
taxpayers, and the housing finance system.
In 2016, Fannie Mae provided $637B in liquidity to the
mortgage market, enabling
• 1.1M home purchase ,
• 1.4 M refinancing,
• 724K rental housing units.
© 2017 Fannie Mae. Trademarks of Fannie Mae. 314/19/2017
Fannie Mae Financial Modeling
Financial Modeling is a Monte-Carlo simulation process to project future cash flows , which is used for managing
the mortgage risk on daily basis:
• Underwriting and valuation
• Risk management
• Financial reporting
• Loss mitigation and loan removal
~10 Quadrillion (10𝑥𝑥𝑥𝑥15
) of cash flow
projections each month in hundreds of
economic scenarios.
© 2017 Fannie Mae. Trademarks of Fannie Mae. 324/19/2017
Fannie Mae Financial Modeling Infrastructure
High Performance Computing grids is the key infrastructure component for financial modeling at Fannie Mae.
Fannie Mae existing HPC grids no longer meet our growing business needs:
• It is 7 years old with limited computing capacity, limited IO capacity, limited storage and complex API.
• It takes more than half a year to add incremental compute capacity and develop any new application.
We are looking for a new HPC facility to react to the rapidly changing market!
• Unlimited computing resources and unlimited storage.
• Serverless infrastructure with simple distributed computing API.
• Efficient cost model.
© 2017 Fannie Mae. Trademarks of Fannie Mae. 334/19/2017
Fannie Mae’s Journey to AWS Serverless HPC Service
In 2016, Fannie Mae began to work with AWS to build the first serverless HPC computing platform in the
industry using Lambda service. This is also the first pilot program for Fannie Mae to develop an AWS cloud
native application.
Once the infrastructure is setup, we are able to develop a new application within a month and provision the
compute resources within minutes.
In March 2017, Fannie Mae successfully deployed the first financial modeling application to preproduction and
ran on 15,000 concurrent executions
© 2017 Fannie Mae. Trademarks of Fannie Mae. 344/19/2017
Fannie Mae’s Serverless HPC Performance
Lambda service configuration:
• Initial burst rate = 2,000, incremental rate = 100 per minute,
throttle limit = 15,000.
• Lambda ramps up automatically from 2,000 to 15,000 concurrent
executions.
Application Result:
• One simulation run of ~ 20 million mortgages takes 2 hours, >3
times faster than the existing process.
• The performance does not degrade during the ramp up period.
• Lambdas’ CPU efficiency is close to 100%. Actual elapsed time is
consistent with the estimated elapsed time based on Lambda billing
time.
Number of New
Lambda Invocations
every 5 Mins
Maximum
Concurrent
Lambdas =
15,000
© 2017 Fannie Mae. Trademarks of Fannie Mae. 354/19/2017
Simple Serverless HPC Reference Architecture
Map-reduce framework is used for simple parallel workload:
• Input file in S3 input bucket is split using EC2 to n triggers, which are saved in S3 event bucket.
• Lambda automatically ramps up n concurrent executions and writes outputs to S3 mapper bucket.
• EC2 is used to aggregate outputs and write final result to S3 reducer bucket.
Amazon S3
Input
Amazon EC2
Splitter
…
AWS Lambda
Mappers
Amazon EC2
Reducer
AmazonS3
Mapper Result
Amazon S3
Reducer Result
…
Amazon S3
Event
© 2017 Fannie Mae. Trademarks of Fannie Mae. 364/19/2017
Complex Serverless HPC Reference Architecture
Breakdown complex workload into multiple simple ones:
…
© 2017 Fannie Mae. Trademarks of Fannie Mae. 374/19/2017
Benefit of Serverlesss HPC Service
Cost Effective
• Never pay for idle. The cost is based on actual vCPU usage, not elapsed time or maximum processing capacity
of the infrastructure.
• Performance improvement at zero cost: 1 Lambda x 15,000 hours = 15,000 Lambda x 1 hour.
Shorter Time to Market
• Ability to burst to cloud immediately to access additional computing resources.
• Ability to focus on business needs. No server to manage and no complex distributed computing code to write.
Most Complete Data Analytics Platform
• Streamlined integration with big data platform and BI tools / Data Lake.
• Business resiliency.
© 2017 Fannie Mae. Trademarks of Fannie Mae. 384/19/2017
Considerations and Next Step
Considerations:
• Maximize S3 performance by distributing the key names to evenly distribute objects across the partitions.
• Set up a separate AWS account for unlimited Lambda access / IP addresses.
• Adopt microservice architecture to migrate one business function/application at a time.
• Integrate with AWS big data analytics platform for accessing unlimited storage and state of art business
analytical tools.
Next step:
• Production migration of the first application in Q2 2017.
• Complete migration of primary loan performance modeling applications to AWS in early 2018.
Real-time Data Processing with
Lambda: Next Steps
Data Processing with AWS: Next steps
 Learn more about AWS Serverless at
https://aws.amazon.com/serverless
 Explore the AWS Lambda Reference Architecture on GitHub:
 Real-Time Streaming:
https://github.com/awslabs/lambda-refarch-
streamprocessing
 Distributed Computing Reference Architecture
(serverless MapReduce)
https://github.com/awslabs/lambda-refarch-mapreduce
Data Processing with AWS: Next steps
 Create an Amazon Kinesis stream. Visit the Amazon Kinesis
Console and configure a stream to receive data Ex. data from
Social media feeds.
 Create & test a Lambda function to process streams from Amazon
Kinesis by visiting Lambda console. First 1M requests each month
are on us!
 Read the Developer Guide and try the Lambda and Amazon
Kinesis Tutorial:
 http://docs.aws.amazon.com/lambda/latest/dg/with-
kinesis.html
 Send questions, comments, feedback to the AWS Lambda Forums
Thank you!
Tara E. Walker
AWS Technical Evangelist
@taraw

Más contenido relacionado

La actualidad más candente

AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveCobus Bernard
 
Introduction to AWS Cloud Computing | AWS Public Sector Summit 2016
Introduction to AWS Cloud Computing | AWS Public Sector Summit 2016Introduction to AWS Cloud Computing | AWS Public Sector Summit 2016
Introduction to AWS Cloud Computing | AWS Public Sector Summit 2016Amazon Web Services
 
Enterprise-Database-Migration-Strategies-and-Options-on-AWS
Enterprise-Database-Migration-Strategies-and-Options-on-AWSEnterprise-Database-Migration-Strategies-and-Options-on-AWS
Enterprise-Database-Migration-Strategies-and-Options-on-AWSAmazon Web Services
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceAmazon Web Services
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineAmazon Web Services
 
Webinar aws 101 a walk through the aws cloud- introduction to cloud computi...
Webinar aws 101   a walk through the aws cloud- introduction to cloud computi...Webinar aws 101   a walk through the aws cloud- introduction to cloud computi...
Webinar aws 101 a walk through the aws cloud- introduction to cloud computi...Amazon Web Services
 
Cloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for PartnersCloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for PartnersAmazon Web Services
 
Getting Started with AWS Compute Services
Getting Started with AWS Compute ServicesGetting Started with AWS Compute Services
Getting Started with AWS Compute ServicesAmazon Web Services
 
Introduction to Amazon Web Services by i2k2 Networks
Introduction to Amazon Web Services by i2k2 NetworksIntroduction to Amazon Web Services by i2k2 Networks
Introduction to Amazon Web Services by i2k2 Networksi2k2 Networks (P) Ltd.
 
AWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices WebinarAWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices WebinarAmazon Web Services
 
Intro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesIntro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesAmazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Amazon Web Services
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaScyllaDB
 
AWS S3 Cost Optimization
AWS S3 Cost OptimizationAWS S3 Cost Optimization
AWS S3 Cost OptimizationEric Kim
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
금융 서비스 패러다임의 전환 가속화 시대, 신한금융투자의 Cloud First 전략 - 신중훈 AWS 솔루션즈 아키텍트 / 최성봉 클라우...
금융 서비스 패러다임의 전환 가속화 시대, 신한금융투자의 Cloud First 전략  - 신중훈 AWS 솔루션즈 아키텍트 / 최성봉 클라우...금융 서비스 패러다임의 전환 가속화 시대, 신한금융투자의 Cloud First 전략  - 신중훈 AWS 솔루션즈 아키텍트 / 최성봉 클라우...
금융 서비스 패러다임의 전환 가속화 시대, 신한금융투자의 Cloud First 전략 - 신중훈 AWS 솔루션즈 아키텍트 / 최성봉 클라우...Amazon Web Services Korea
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 

La actualidad más candente (20)

AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
Introduction to AWS Cloud Computing | AWS Public Sector Summit 2016
Introduction to AWS Cloud Computing | AWS Public Sector Summit 2016Introduction to AWS Cloud Computing | AWS Public Sector Summit 2016
Introduction to AWS Cloud Computing | AWS Public Sector Summit 2016
 
Enterprise-Database-Migration-Strategies-and-Options-on-AWS
Enterprise-Database-Migration-Strategies-and-Options-on-AWSEnterprise-Database-Migration-Strategies-and-Options-on-AWS
Enterprise-Database-Migration-Strategies-and-Options-on-AWS
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database Service
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
 
Webinar aws 101 a walk through the aws cloud- introduction to cloud computi...
Webinar aws 101   a walk through the aws cloud- introduction to cloud computi...Webinar aws 101   a walk through the aws cloud- introduction to cloud computi...
Webinar aws 101 a walk through the aws cloud- introduction to cloud computi...
 
Cloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for PartnersCloud Migration, Application Modernization and Security for Partners
Cloud Migration, Application Modernization and Security for Partners
 
Getting Started with AWS Compute Services
Getting Started with AWS Compute ServicesGetting Started with AWS Compute Services
Getting Started with AWS Compute Services
 
Introduction to Amazon Web Services by i2k2 Networks
Introduction to Amazon Web Services by i2k2 NetworksIntroduction to Amazon Web Services by i2k2 Networks
Introduction to Amazon Web Services by i2k2 Networks
 
AWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices WebinarAWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices Webinar
 
Intro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesIntro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
 
Data as a service
Data as a serviceData as a service
Data as a service
 
AWS S3 Cost Optimization
AWS S3 Cost OptimizationAWS S3 Cost Optimization
AWS S3 Cost Optimization
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
금융 서비스 패러다임의 전환 가속화 시대, 신한금융투자의 Cloud First 전략 - 신중훈 AWS 솔루션즈 아키텍트 / 최성봉 클라우...
금융 서비스 패러다임의 전환 가속화 시대, 신한금융투자의 Cloud First 전략  - 신중훈 AWS 솔루션즈 아키텍트 / 최성봉 클라우...금융 서비스 패러다임의 전환 가속화 시대, 신한금융투자의 Cloud First 전략  - 신중훈 AWS 솔루션즈 아키텍트 / 최성봉 클라우...
금융 서비스 패러다임의 전환 가속화 시대, 신한금융투자의 Cloud First 전략 - 신중훈 AWS 솔루션즈 아키텍트 / 최성봉 클라우...
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 

Similar a SMC303 Real-time Data Processing Using AWS Lambda

Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Amazon Web Services
 
Raleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS LambdaRaleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS LambdaAmazon Web Services
 
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
Building Big Data Applications with Serverless Architectures -  June 2017 AWS...Building Big Data Applications with Serverless Architectures -  June 2017 AWS...
Building Big Data Applications with Serverless Architectures - June 2017 AWS...Amazon Web Services
 
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017Amazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...Amazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosAmazon Web Services LATAM
 
Real Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaReal Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaAmazon Web Services
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsAmazon Web Services
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
 
(CMP403) AWS Lambda: Simplifying Big Data Workloads
(CMP403) AWS Lambda: Simplifying Big Data Workloads(CMP403) AWS Lambda: Simplifying Big Data Workloads
(CMP403) AWS Lambda: Simplifying Big Data WorkloadsAmazon Web Services
 
Getting Started with Serverless Architectures | AWS Public Sector Summit 2016
Getting Started with Serverless Architectures | AWS Public Sector Summit 2016Getting Started with Serverless Architectures | AWS Public Sector Summit 2016
Getting Started with Serverless Architectures | AWS Public Sector Summit 2016Amazon Web Services
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Amazon Web Services
 
AWS Summit Seoul 2015 - AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
AWS Summit Seoul 2015 -  AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석AWS Summit Seoul 2015 -  AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
AWS Summit Seoul 2015 - AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석Amazon Web Services Korea
 
AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)
AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)
AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)Amazon Web Services
 
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...Amazon Web Services
 
Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesAmazon Web Services
 

Similar a SMC303 Real-time Data Processing Using AWS Lambda (20)

Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
 
Raleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS LambdaRaleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS Lambda
 
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
Building Big Data Applications with Serverless Architectures -  June 2017 AWS...Building Big Data Applications with Serverless Architectures -  June 2017 AWS...
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
 
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dados
 
Real Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaReal Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS Lambda
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
(CMP403) AWS Lambda: Simplifying Big Data Workloads
(CMP403) AWS Lambda: Simplifying Big Data Workloads(CMP403) AWS Lambda: Simplifying Big Data Workloads
(CMP403) AWS Lambda: Simplifying Big Data Workloads
 
Real-Time Event Processing
Real-Time Event ProcessingReal-Time Event Processing
Real-Time Event Processing
 
Getting Started with Serverless Architectures | AWS Public Sector Summit 2016
Getting Started with Serverless Architectures | AWS Public Sector Summit 2016Getting Started with Serverless Architectures | AWS Public Sector Summit 2016
Getting Started with Serverless Architectures | AWS Public Sector Summit 2016
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
AWS Summit Seoul 2015 - AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
AWS Summit Seoul 2015 -  AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석AWS Summit Seoul 2015 -  AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
AWS Summit Seoul 2015 - AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
 
AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)
AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)
AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)
 
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
 
Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless Architectures
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Último

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Último (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

SMC303 Real-time Data Processing Using AWS Lambda

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tara E. Walker | Technical Evangelist | @taraw April 2017 SMC303 Real-Time Data Processing Using AWS Lambda
  • 2. Agenda What’s Serverless Real-Time Data Processing? Processing Streaming Data with Lambda and Kinesis Streaming Data Processing Demo Data Processing Pipeline with Lambda and MapReduce Building a Big Data Processing Solution Demo What’s Serverless Real-Time Data Processing? Serverless Processing of Real-Time Streaming Data Streaming Data Processing Demo Serverless Data Processing with Distributed Computing Customer Story: Fannie Mae-Distributed Computing with Lambda
  • 4. AWS Lambda Efficient performance at scale Easy to author, deploy, maintain, secure & manage. Focus on business logic to build back-end services that perform at scale. Bring Your Own Code: Stateless, event-driven code with native support for Node.js, Java, Python and C# languages. No Infrastructure to manage: Compute without managing infrastructure like Amazon EC2 instances and Auto Scaling groups. Cost-effective: Automatically matches capacity to request rate. Compute cost 100 ms increments. Triggered by events: Direct Sync & Async API calls, AWS service integrations, and 3rd party triggers.
  • 5. Amazon S3 Amazon DynamoDB Amazon Kinesis AWS CloudFormation AWS CloudTrail Amazon CloudWatch Amazon Cognito Amazon SNS Amazon SES Cron events DATA STORES ENDPOINTS CONFIGURATION REPOSITORIES EVENT/MESSAGE SERVICES Lambda Event Sources … more on the way! AWS CodeCommit Amazon API Gateway Amazon Alexa AWS IoT AWS Step Functions
  • 6. Serverless Real-Time Data Processing Is.. Capture Data Streams IoT Data Financial Data Log Data No servers to provision or manage EVENT SOURCE Node.js Python Java C# Process Data Streams FUNCTION Clickstream Data Output Data DATABASE CLOUD SERVICES
  • 7. Amazon DynamoDB Amazon Kinesis Amazon S3 Amazon SNS ASYNCHRONOUS PUSH MODEL STREAM PULL MODEL Lambda Real-Time Event Sources Amazon Alexa AWS IoT SYNCHRONOUS PUSH MODEL Mapping owned by Event Source Mapping owned by Lambda Invokes Lambda via Event Source API Lambda function invokes when new records found on stream Resource-based policy permissions Lambda Execution role policy permissions Concurrent executions Sync invocation Async Invocation Sync invocation Lambda polls the streams HOW IT WORKS
  • 9. Amazon Kinesis Real-Time: Collect real-time data streams and promptly respond to key business events and operational triggers. Real-time latencies. Easy to use: Focus on quickly launching data streaming applications instead of managing infrastructure. Amazon Kinesis Offering: Managed services for streaming data ingestion and processing. • Amazon Kinesis Streams: Build applications that process or analyze streaming data. • Amazon Kinesis Firehose: Load massive volumes of streaming data into Amazon S3 and Amazon Redshift. • Amazon Kinesis Analytics: Analyze data streams using SQL queries.
  • 10. Processing Real-Time Streams: Lambda + Amazon Kinesis Streaming data sent to Amazon Kinesis and stored in shards Multiple Lambda functions can be triggered to process same Amazon Kinesis stream for “fan out” Lambda can process data and store results ex. to DynamoDB, S3 Lambda can aggregate data to services like Amazon Elasticsearch Service for analytics Lambda sends event data and function info to Amazon CloudWatch for capturing metrics and monitoring Amazon Kinesis AWS Lambda Amazon CloudWatch Amazn DynamoDB AWS Lambda Amazon Elasticsearch Service Amazon S3
  • 11. Processing Streams: Set Up Amazon Kinesis Stream Streams Made up of Shards Each Shard ingests/reads data up to 1 MB/sec Each Shard emits/writes data up to 2 MB/sec Each shard supports 5 reads/sec Data All data is stored and is replayable for 24 hours Make sure partition key distribution is even to optimize parallel throughput Partition key used to distribute PUTs across shards, choose key with more groups than shards Best Practice Determine an initial size/shards to plan for expected maximum demand  Leverage “Help me decide how many shards I need” option in Console  Use formula for Number Of Shards: max(incoming_write_bandwidth_in_KB/1000, outgoing_read_bandwidth_in_KB / 2000)
  • 12. Processing Streams: Create Lambda functions Memory CPU allocation proportional to the memory configured Increasing memory makes your code execute faster (if CPU bound) Increasing memory allows for larger record sizes processed Timeout Increasing timeout allows for longer functions, but longer wait in case of errors Permission model Execution role defined for Lambda must have permission to access the stream Retries With Amazon Kinesis, Lambda retries until the data expires (24 hours) Best Practice Write Lambda function code to be stateless Instantiate AWS clients & database clients outside the scope of the function handler
  • 13. Processing Streams: Configure Event Source Amazon Kinesis mapped as event source in Lambda Batch size Max number of records that Lambda will send to one invocation Not equivalent to effective batch size Effective batch size is every 250 ms – Calculated as: MIN(records available, batch size, 6MB) Increasing batch size allows fewer Lambda function invocations with more data processed per function Best Practices Set to “Trim Horizon” for reading from start of stream (all data) Set to “Latest” for reading most recent data (LIFO) (latest data)
  • 14. Processing streams: How It Works Polling Concurrent polling and processing per shard Lambda polls every 250 ms if no records found Will grab as much data as possible in one GetRecords call (Batch) Batching Batches are passed for invocation to Lambda through function parameters Batch size may impact duration if the Lambda function takes longer to process more records Sub batch in memory for invocation payload Synchronous invocation Batches invoked as synchronous RequestResponse type Lambda honors Amazon Kinesis at least once semantics Each shard blocks in order of synchronous invocation
  • 15. Processing streams: Tuning throughput If put / ingestion rate is greater than the theoretical throughput, your processing is at risk of falling behind Maximum theoretical throughput # shards * 2MB / Lambda function duration (s) Effective theoretical throughput # shards * batch size (MB) / Lambda function duration (s) … … Source Amazon Kinesis Destination 1 Lambda Destination 2 FunctionsShards Lambda will scale automaticallyScale Amazon Kinesis by splitting or merging shards Waits for responsePolls a batch
  • 16. Processing streams: Tuning Throughput w/ Retries Retries Will retry on execution failures until the record is expired Throttles and errors impacts duration and directly impacts throughput Best Practice Retry with exponential backoff of up to 60s Effective theoretical throughput with retries ( # shards * batch size (MB) ) / ( function duration (s) * retries until expiry) … … Source Amazon Kinesis Destination 1 Lambda Destination 2 FunctionsShards Lambda will scale automaticallyScale Amazon Kinesis by splitting or merging shards Receives errorPolls a batch Receives error Receives success
  • 17. Processing streams: Common observations Effective batch size may be less than configured during low throughput Effective batch size will increase during higher throughput Increased Lambda duration -> decreased # of invokes and GetRecord calls Too many consumers of your stream may compete with Amazon Kinesis read limits and induce ReadProvisionedThroughputExceeded errors and metrics Amazon Kinesis AWS Lambda
  • 18. Processing streams: Monitoring with Cloudwatch • GetRecords: (effective throughput) • PutRecord : bytes, latency, records, etc • GetRecords.IteratorAgeMilliseconds: how old your last processed records were Monitoring Amazon Kinesis Streams Monitoring Lambda functions • Invocation count: Time function invoked • Duration: Execution/processing time • Error count: Number of Errors • Throttle count: Number of time function throttled • Iterator Age: Time elapsed from batch received & final record written to stream • Review All Metrics • Make Custom logs • View RAM consumed • Search for log events Debugging AWS X-Ray Coming soon!
  • 20. Serverless Data Processing with Distributed Computing 10101101 11001010
  • 21. Serverless Distributed Computing: Map-Reduce Model Why Serverless Data Processing with Distributed Computing? Remove Difficult infrastructure management  Cluster administration  Complex configuration tools Enable simple, elastic, user-friendly distributed data processing  Eliminate complexity of state management  Bring Distributed Computing power to the masses
  • 22. Serverless Distributed Computing: Map-Reduce Model Why Serverless Data Processing with Distributed Computing? Eliminate utilization concerns  Makes code simpler by removes complexities of multi- threading processing to optimize server usage  Cost-effective option to run ad hoc MapReduce jobs Easier, automatic horizontal scaling  Provide ability to process scientific and analytics applications
  • 23. Serverless Distributed Computing: MapReduce Input Bucket 1 2 Driver job state Mapper Functions map phase S3 event source mapper output 3 Coordinator 4 Reducer step 1 reducer output 5 recursively create n‘th reducer step ResultFinal Reducer reduce phase 6
  • 24. Serverless Distributed Computing: PyWren PyWren Prototype Developed at University of California, Berkeley Uses Python with AWS Lambda stateless functions for large scale data analytics Achieved @ 30-40 MB/s write and read performance per-core to S3 object store Scaled to 60-80 GB/s across 2800 simultaneous functions
  • 25. Serverless Distributed Computing: Benchmark Using Amazon MapReduce Reference Architecture Framework with Lambda Dataset Queries:  Scan query (90 M Rows, 6.36 GB of data)  Select query on Page Rankings  Aggregation query on UserVisits ( 775M rows, ~127GB of data) Rankings (rows) Rankings (bytes) UserVisits (rows) UserVisits (bytes) Documents (bytes) 90 Million 6.38 GB 775 Million 126.8 GB 136.9 GB
  • 26. Serverless Distributed Computing: Benchmark Using Amazon MapReduce Reference Architecture Framework with Lambda Subset of the Amplab benchmark ran to compare with other data processing frameworks Performance Benchmarks: Execution time for each workload in seconds TECHNOLOGY SCAN 1A SCAN 1B AGGREGATE 2A Amazon Redshift (HDD) 2.49 2.61 25.46 Serverless MapReduce 39 47 200 Impala - Disk - 1.2.3 12.015 12.015 113.72 Impala - Mem - 1.2.3 2.17 3.01 84.35 Shark - Disk - 0.8.1 6.6 7 151.4 Shark - Mem - 0.8.1 1.7 1.8 83.7 Hive - 0.12 YARN 50.49 59.93 730.62 Tez - 0.2.0 28.22 36.35 377.48
  • 28. © 2017 Fannie Mae. Trademarks of Fannie Mae. 29 © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Bin Lu, Fannie Mae 4/18/2017 High Performance Computing Using AWS Lambda for Financial Modeling
  • 29. © 2017 Fannie Mae. Trademarks of Fannie Mae. 304/19/2017 Fannie Mae Business Fannie Mae is a leading source of financing for mortgage lenders: • Providing access to affordable mortgage financing in all market conditions. • Effectively managing and reducing risk to our business, taxpayers, and the housing finance system. In 2016, Fannie Mae provided $637B in liquidity to the mortgage market, enabling • 1.1M home purchase , • 1.4 M refinancing, • 724K rental housing units.
  • 30. © 2017 Fannie Mae. Trademarks of Fannie Mae. 314/19/2017 Fannie Mae Financial Modeling Financial Modeling is a Monte-Carlo simulation process to project future cash flows , which is used for managing the mortgage risk on daily basis: • Underwriting and valuation • Risk management • Financial reporting • Loss mitigation and loan removal ~10 Quadrillion (10𝑥𝑥𝑥𝑥15 ) of cash flow projections each month in hundreds of economic scenarios.
  • 31. © 2017 Fannie Mae. Trademarks of Fannie Mae. 324/19/2017 Fannie Mae Financial Modeling Infrastructure High Performance Computing grids is the key infrastructure component for financial modeling at Fannie Mae. Fannie Mae existing HPC grids no longer meet our growing business needs: • It is 7 years old with limited computing capacity, limited IO capacity, limited storage and complex API. • It takes more than half a year to add incremental compute capacity and develop any new application. We are looking for a new HPC facility to react to the rapidly changing market! • Unlimited computing resources and unlimited storage. • Serverless infrastructure with simple distributed computing API. • Efficient cost model.
  • 32. © 2017 Fannie Mae. Trademarks of Fannie Mae. 334/19/2017 Fannie Mae’s Journey to AWS Serverless HPC Service In 2016, Fannie Mae began to work with AWS to build the first serverless HPC computing platform in the industry using Lambda service. This is also the first pilot program for Fannie Mae to develop an AWS cloud native application. Once the infrastructure is setup, we are able to develop a new application within a month and provision the compute resources within minutes. In March 2017, Fannie Mae successfully deployed the first financial modeling application to preproduction and ran on 15,000 concurrent executions
  • 33. © 2017 Fannie Mae. Trademarks of Fannie Mae. 344/19/2017 Fannie Mae’s Serverless HPC Performance Lambda service configuration: • Initial burst rate = 2,000, incremental rate = 100 per minute, throttle limit = 15,000. • Lambda ramps up automatically from 2,000 to 15,000 concurrent executions. Application Result: • One simulation run of ~ 20 million mortgages takes 2 hours, >3 times faster than the existing process. • The performance does not degrade during the ramp up period. • Lambdas’ CPU efficiency is close to 100%. Actual elapsed time is consistent with the estimated elapsed time based on Lambda billing time. Number of New Lambda Invocations every 5 Mins Maximum Concurrent Lambdas = 15,000
  • 34. © 2017 Fannie Mae. Trademarks of Fannie Mae. 354/19/2017 Simple Serverless HPC Reference Architecture Map-reduce framework is used for simple parallel workload: • Input file in S3 input bucket is split using EC2 to n triggers, which are saved in S3 event bucket. • Lambda automatically ramps up n concurrent executions and writes outputs to S3 mapper bucket. • EC2 is used to aggregate outputs and write final result to S3 reducer bucket. Amazon S3 Input Amazon EC2 Splitter … AWS Lambda Mappers Amazon EC2 Reducer AmazonS3 Mapper Result Amazon S3 Reducer Result … Amazon S3 Event
  • 35. © 2017 Fannie Mae. Trademarks of Fannie Mae. 364/19/2017 Complex Serverless HPC Reference Architecture Breakdown complex workload into multiple simple ones: …
  • 36. © 2017 Fannie Mae. Trademarks of Fannie Mae. 374/19/2017 Benefit of Serverlesss HPC Service Cost Effective • Never pay for idle. The cost is based on actual vCPU usage, not elapsed time or maximum processing capacity of the infrastructure. • Performance improvement at zero cost: 1 Lambda x 15,000 hours = 15,000 Lambda x 1 hour. Shorter Time to Market • Ability to burst to cloud immediately to access additional computing resources. • Ability to focus on business needs. No server to manage and no complex distributed computing code to write. Most Complete Data Analytics Platform • Streamlined integration with big data platform and BI tools / Data Lake. • Business resiliency.
  • 37. © 2017 Fannie Mae. Trademarks of Fannie Mae. 384/19/2017 Considerations and Next Step Considerations: • Maximize S3 performance by distributing the key names to evenly distribute objects across the partitions. • Set up a separate AWS account for unlimited Lambda access / IP addresses. • Adopt microservice architecture to migrate one business function/application at a time. • Integrate with AWS big data analytics platform for accessing unlimited storage and state of art business analytical tools. Next step: • Production migration of the first application in Q2 2017. • Complete migration of primary loan performance modeling applications to AWS in early 2018.
  • 38. Real-time Data Processing with Lambda: Next Steps
  • 39. Data Processing with AWS: Next steps  Learn more about AWS Serverless at https://aws.amazon.com/serverless  Explore the AWS Lambda Reference Architecture on GitHub:  Real-Time Streaming: https://github.com/awslabs/lambda-refarch- streamprocessing  Distributed Computing Reference Architecture (serverless MapReduce) https://github.com/awslabs/lambda-refarch-mapreduce
  • 40. Data Processing with AWS: Next steps  Create an Amazon Kinesis stream. Visit the Amazon Kinesis Console and configure a stream to receive data Ex. data from Social media feeds.  Create & test a Lambda function to process streams from Amazon Kinesis by visiting Lambda console. First 1M requests each month are on us!  Read the Developer Guide and try the Lambda and Amazon Kinesis Tutorial:  http://docs.aws.amazon.com/lambda/latest/dg/with- kinesis.html  Send questions, comments, feedback to the AWS Lambda Forums
  • 41. Thank you! Tara E. Walker AWS Technical Evangelist @taraw