SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
GPS: Real - Time Data Processin g with A WS
Lambda Quickly, at Scale, and What Comes Next?
J u a n V i l l a | P a r t n e r S o l u t i o n s A r c h i t e c t
G P S T E C 3 1 3
N o v e m b e r 2 7 , 2 0 1 7
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
About Me—Juan Villa
• Partner Solutions Architect
• Focus on IoT, serverless, and machine
learning
• Love to build cool software and
hardware projects
• Ex-Software Developer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Real-Time vs Batch Processing?
B a t c h P r o c e s s i n g
IoT Data
Financial
Data
Log Data
NodeJS
Python
Java
C#
Clickstream
Data
Action
Pipeline Time/Decision Time
15+ minutes **
DataSources
** application specific, and can be less than 15 minutes, or more than 1 day
Serverless
Automatically Scales
Event Sources
S3 Bucket
Events
Stored
Scheduled
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Real-Time vs Batch Processing?
R e a l - T i m e ( E v e n t D r i v e n )
Action
Pipeline Time / Decision Time
0 sec–15 minutes **
IoT Data
Financial
Data
Log Data
Clickstream
Data
DataSources
** as fast as possible, depends on nature of event
Near real time?
NodeJS
Python
Java
C#
Serverless
Automatically Scales
Event Sources
EVENT SOURCE
Real-Time
Events
AWS Lambda
S e r v e r l e s s C o m p u t e
Efficient performance at scale: Easy to author, deploy,
maintain, secure, and manage. Focus on business logic to build
back-end services that perform at scale.
Bring your own code: Stateless, event-driven code with native
support for Node.js, Java, Python, and C# languages.
No Infrastructure to manage: Compute without managing
infrastructure like Amazon EC2 instances and Auto Scaling
groups.
Cost-effective: Automatically matches capacity to request rate.
Compute cost 100 ms increments.
Triggered by events: Direct Sync & Async API calls, AWS service
integrations, and third-party triggers.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Sources and Applications
Load Balancer
Parse Load Balancer Access Logs looking for anomalous patterns
Flow Logs
Parse VPC Flow Logs to detect anomalies and abuse
CloudTrail
Parse CloudTrail events to detect suspicious patterns
such as logins from another country
CloudWatch
(logs, events, alarms)
Configure event triggering and alarms from sources
such as CloudTrail or other event sources
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Sources and Applications
-- Many More --
DynamoDB Stream of real-time changes to database tables
DynamoDB
S3 events, moderate image with Amazon Rekognition
S3 Bucket
SNS
Trigger Lambda actions from SNS events
AWS IoT
IoT Rules, Lambda, and IoT device lifecycle events
Kinesis
Send events into a Kinesis stream for real-time processing
Your Application
Your own applications!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lambda Event Driven Models
Amazon
S3
Amazon
SNS
ASYNCHRONOUS
PUSH MODEL
Async Invocation
Amazon
Alexa
AWS
IoT
SYNCHRONOUS
PUSH MODEL
Sync Invocation
Amazon
DynamoDB
Amazon
Kinesis
STREAM
PULL MODEL
Sync Invocation
Mapping owned by Lambda
Lambda function invokes when new records
found on stream
Lambda Execution role policy permissions
Lambda polls the streams
Invokes Lambda via Event Source API
Mapping owned by Event Source
Resource-based policy permissions
Concurrent executions
HOW IT WORKS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pipeline Example #1
P a r s i n g S e c u r i t y E v e n t s
CloudWatch
Events
CloudTrail
Kinesis
Stream
Lambda SNS
Notify when an EC2 instance tagged as
production is terminated.
Kinesis
Firehose
ElasticsearchEvent TargetEC2
Index audit trail events in ES, and search
with Kibana and live dashboards.
Event Source
~ 1 minute
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Configuring CloudWatch Event Rules
P a r s i n g S e c u r i t y E v e n t s
• Configure “Event Source”
• EC2 CloudTrail
“TerminateInstances”
event
• Can have more than one
event type in source filter
• Configure event “Targets”
• Kinesis
• SNS
• Can have more than one
target
Configure Target
Configure Source
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Configuring CloudWatch Event Rules
P a r s i n g S e c u r i t y E v e n t s
CloudWatch Event Source Pattern
• Will get EC2 CloudTrail audit events on the account
• CloudTrail audit events track API events, which
includes the principal, target resource, and action
that took place
• Specifically filtering for “TerminateInstances” events
• On average, events are emitted in ~1–2 minutes
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What Does the Event Look Like?
P a r s i n g S e c u r i t y E v e n t s
What do we need?
Summarized
• IAM user that terminated the
instance
• When the instance was terminated
• IP address of user that terminated
the instance
• What instance was terminated
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lambda Function
P a r s i n g S e c u r i t y E v e n t s
• Send SMS message to
administrator when a user
terminates an instance
tagger as a production
instance
• We could also send a
message to a Slack
Channel, or Pager Duty, or
send an email
• Got notified in 30
seconds!
Imports
Boto3 clients
Parse Kinesis Record
Extract Username and
InstanceId
Check if instance is
Production
Send SMS!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Executing the Lambda Function
P a r s i n g S e c u r i t y E v e n t s
• Configure a Kinesis trigger for our
Lambda function
• Lambda service handles the Kinesis
ingestion and execution of the
Lambda function
• Also handles errors and retries
• Finally, we get our text message
when an instance is terminated Huzzah!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deeper Look into Amazon Kinesis
Real-time: Collect real-time data streams and promptly respond to key
business events and operational triggers. Real-time latencies.
Easy to use: Focus on quickly launching data streaming applications instead
of managing infrastructure.
Amazon Kinesis offering: Managed services for streaming data ingestion
and processing.
• Amazon Kinesis Streams: Build applications that process or analyze
streaming data.
• Amazon Kinesis Firehose: Load massive volumes of streaming data
into Amazon S3, Amazon Redshift, and Elasticsearch.
• Amazon Kinesis Analytics: Analyze data streams using SQL queries.
LatestTrim Horizon
(1–7 days retention)
Record
Publish
Lambda Consumer
Iterator
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Creating a Kinesis Stream
Streams
• Made up of shards
• Each shard ingests/reads data up to 1 MB/sec
• Each shard emits/writes data up to 2 MB/sec
• Each shard supports 5 read transactions/sec
Data
• All data is stored and is replayable for 24 hours (default)
• Retention window can be configured up to 7 days
• Partition key used to distribute PUTs across shards
• Even partition key distribution optimizes throughput
Best practice
Determine an initial size/shards to plan for expected maximum demand
 Leverage “Help me decide how many shards I need” option in console
 Use formula for number of shards:
max(incoming_write_bandwidth_in_KB/1000, outgoing_read_bandwidth_in_KB / 2000)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Creating Lambda Function
Memory
CPU allocation proportional to the memory configured
Increasing memory makes your code execute faster (if CPU
bound)
Increasing memory allows for larger record sizes processed
Timeout
Increasing timeout allows for longer functions, but longer wait
in case of errors
Permission model
Execution role defined for Lambda must have permission to
access the stream
Retries
With Amazon Kinesis, Lambda retries until the data expires
(24 hours)
Best practice
Write Lambda function code to be stateless
Instantiate AWS clients and database clients
outside the scope of the function handler to
take advantage of connection re-use.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Configuring Event Trigger
Amazon Kinesis mapped as event source in Lambda
Batch size
Max number of records that Lambda will send to one invocation
Not equivalent to effective batch size
Effective batch size is every 250 ms—calculated as:
MIN(records available, batch size, 6MB)
Increasing batch size allows fewer Lambda function invocations with more data processed per
function
Best Practices
Set to “Trim Horizon” for reading from start of
stream (all data)
Set to “Latest” for reading most recent data
(LIFO) (latest data)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How It Works
Polling
Concurrent polling and processing per shard
Lambda polls every 250 ms if no records found
Will grab as much data as possible in one GetRecords call (Batch)
Batching
Batches are passed for invocation to Lambda through
function parameters
Batch size may impact duration if the Lambda function
takes longer to process more records
Sub batch in memory for invocation payload
Synchronous invocation
Batches invoked as synchronous RequestResponse type
Lambda honors Amazon Kinesis at least once semantics
Each shard blocks in order of synchronous invocation
250 ms
poll
Batch
Sync invocation
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tuning Throughput
If put/ingestion rate is greater than the theoretical throughput, your processing is at risk of falling behind
Maximum theoretical throughput
# shards * 2 MB / Lambda function duration (s)
Effective theoretical throughput
# shards * batch size (MB) / Lambda function duration (s)
… Shards
Scale Amazon Kinesis by splitting and merging shards
Lambda Lambda Functions
Lambda will scale automatically
Kinesis
Kinesis
Stream
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tuning Throughput with Retries
Retries
Will retry on execution failures until the record is expired
Throttles and errors impact duration and directly impact throughput
Best Practice
Retry with exponential back-off of up to 60 s
Effective theoretical throughput with retries
( # shards * batch size (MB) ) / ( function duration (s) * retries until expiry)
… Shards
Scale Amazon Kinesis by splitting and merging shards
Lambda Lambda Functions
Lambda will scale automatically
Kinesis
Kinesis
Stream
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitoring Your Real-Time Pipeline
•GetRecords: (effective throughput)
•PutRecord: bytes, latency, records, etc.
•GetRecords.IteratorAgeMilliseconds: how old your last
processed records were
Monitoring Amazon Kinesis Streams
Monitoring Lambda functions
•Invocation count: time function invoked
•Duration: execution/processing time
•Error count: number of errors
•Throttle count: number of time function throttled
•Iterator age: time elapsed from batch received and
final record written to stream
• Review all metrics
• Make custom logs
• View RAM consumed
• Search for log events
Debugging
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example Debugging with X-Ray
Retries with Lambda/Exponential Back-offs
• Total Duration = 6 seconds (five retries before success)
• Lambda Invocation Duration = 2.2 seconds
• Actual Throughput = 1 * (10000/6) = 1666 records/s
• Ideal Throughput = 1 * (10000/2.2) = 4545 records/s
2.72x difference
AWS X-Ray
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Falling Behind
Not Processing Fast Enough
• Producers are inserting 100k records per minute
• 2 shards
• Lambda functions take 1.7 seconds to process 1000 records
• 2 * (1000/1.7) = 1176 records/s = ~70k/minute
100k produced per minute > ~70k processed per minute
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Resharding to Increase Throughput
Increase Number of Shards to Catch Up
• Increased number of shards to 4 (doubled)
• Lambda functions take 1.7 seconds to process 1000 records
• 4 * (1000/1.7) = 2352 records/s = ~141k / minute
100k produced per minute < ~141k processed per minute
REAL TIME!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Netflix and AWS
Amazon Kinesis Streams processes
multiple terabytes of log data each day, yet
events show up in our analytics in seconds.
We can discover and respond to issues in
real time, ensuring high availability and a
great customer experience.
• Netflix needed a solution for ingesting,
augmenting, and analyzing the multiple
terabytes of data its network generates
daily in the form of virtual private cloud
(VPC) flow logs.
• Netflix uses AWS to analyze billions of
messages across more than 100,000
application instances daily in real time,
enabling it to optimize user experience,
reduce costs, and improve application
resilience.
–John Bennett
Senior Software Engineer, Netflix
”
“
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What about EMR?
IoT Data
Financial
Data
Log Data
EVENT SOURCE
Apache
Spark
Clickstream
Data
Action
Pipeline
DataSources Amazon EMR
Apache Spark
Kinesis Connector
Real-Time
Events
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why EMR + Apache Spark?
• Mature features and libraries
• Machine learning functionality with SparkML
• Interact with Data using SparkSQL
• Spark Streaming provides support for operators
states such as sliding windows
• Fault tolerance/recovery out of the box
• Ability to combine batch and streaming
• Multiple languages (Scala, Java, Python, R, more)
SparkSQL (Python)
SparkML (Python)
GraphX (Scala)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Zillow and AWS
Zillow provides online home information to tens of
millions of buyers and sellers every day.
We can compute
Zestimates in seconds, as
opposed to hours, by using
Amazon Kinesis Streams
and Spark on Amazon
EMR.
• Needed to provide timely home valuations for all
new homes
• Runs Zestimate, its machine learning-based home-
valuation tool, on AWS
• Performs machine-learning jobs in hours instead of
a day
• Gives customers more accurate data on more than
100 million homes
• Scales storage and compute capacity on demand
–Jasjeet Thind
Vice President of Data Science and Engineering
”
“
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Next Steps
 Learn more about AWS Serverless at
https://aws.amazon.com/serverless
 Explore the AWS Lambda Reference Architecture on GitHub:
 Real-Time Streaming: https://github.com/awslabs/lambda-refarch-
streamprocessing
 Distributed Computing Reference Architecture (serverless
MapReduce) https://github.com/awslabs/lambda-refarch-mapreduce
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Next Steps
 Create an Amazon Kinesis stream. Visit the Amazon Kinesis Console and
configure a stream to receive data Ex. data from Social media feeds.
 Create and test a Lambda function to process streams from Amazon
Kinesis by visiting Lambda console. First 1M requests each month are
on us!
 Read the Developer Guide and try the Lambda and Amazon Kinesis
Tutorial:
 http://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html
 Send questions, comments, feedback to the AWS Lambda Forums
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!

Más contenido relacionado

La actualidad más candente

CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web Services
CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web ServicesCMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web Services
CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web ServicesAmazon Web Services
 
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017Amazon Web Services
 
ABD202_Best Practices for Building Serverless Big Data Applications
ABD202_Best Practices for Building Serverless Big Data ApplicationsABD202_Best Practices for Building Serverless Big Data Applications
ABD202_Best Practices for Building Serverless Big Data ApplicationsAmazon Web Services
 
WIN204-Simplifying Microsoft Architectures with AWS Services
WIN204-Simplifying Microsoft Architectures with AWS ServicesWIN204-Simplifying Microsoft Architectures with AWS Services
WIN204-Simplifying Microsoft Architectures with AWS ServicesAmazon Web Services
 
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017Amazon Web Services
 
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...Amazon Web Services
 
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...Amazon Web Services
 
DAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise Workloads
DAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise WorkloadsDAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise Workloads
DAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise WorkloadsAmazon Web Services
 
Migrating Microsoft Workloads to AWS
Migrating Microsoft Workloads to AWSMigrating Microsoft Workloads to AWS
Migrating Microsoft Workloads to AWSAmazon Web Services
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersAmazon Web Services
 
CON309_Containerized Machine Learning on AWS
CON309_Containerized Machine Learning on AWSCON309_Containerized Machine Learning on AWS
CON309_Containerized Machine Learning on AWSAmazon Web Services
 
CMP211_Getting Started with Serverless Architectures
CMP211_Getting Started with Serverless ArchitecturesCMP211_Getting Started with Serverless Architectures
CMP211_Getting Started with Serverless ArchitecturesAmazon Web Services
 
CON203_Driving Innovation with Containers
CON203_Driving Innovation with ContainersCON203_Driving Innovation with Containers
CON203_Driving Innovation with ContainersAmazon Web Services
 
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...Amazon Web Services
 
NET309_Best Practices for Securing an Amazon Virtual Private Cloud
NET309_Best Practices for Securing an Amazon Virtual Private CloudNET309_Best Practices for Securing an Amazon Virtual Private Cloud
NET309_Best Practices for Securing an Amazon Virtual Private CloudAmazon Web Services
 
STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansAmazon Web Services
 
ABD207 building a banking utility leveraging aws to fight financial crime and...
ABD207 building a banking utility leveraging aws to fight financial crime and...ABD207 building a banking utility leveraging aws to fight financial crime and...
ABD207 building a banking utility leveraging aws to fight financial crime and...Amazon Web Services
 
AWS Commercial Management and Cost Optimisation - Dec 2017
AWS Commercial Management and Cost Optimisation - Dec 2017AWS Commercial Management and Cost Optimisation - Dec 2017
AWS Commercial Management and Cost Optimisation - Dec 2017Amazon Web Services
 

La actualidad más candente (20)

CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web Services
CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web ServicesCMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web Services
CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web Services
 
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017
 
ABD202_Best Practices for Building Serverless Big Data Applications
ABD202_Best Practices for Building Serverless Big Data ApplicationsABD202_Best Practices for Building Serverless Big Data Applications
ABD202_Best Practices for Building Serverless Big Data Applications
 
WIN204-Simplifying Microsoft Architectures with AWS Services
WIN204-Simplifying Microsoft Architectures with AWS ServicesWIN204-Simplifying Microsoft Architectures with AWS Services
WIN204-Simplifying Microsoft Architectures with AWS Services
 
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
 
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
 
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
 
DAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise Workloads
DAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise WorkloadsDAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise Workloads
DAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise Workloads
 
Migrating Microsoft Workloads to AWS
Migrating Microsoft Workloads to AWSMigrating Microsoft Workloads to AWS
Migrating Microsoft Workloads to AWS
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWS
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million Users
 
CON309_Containerized Machine Learning on AWS
CON309_Containerized Machine Learning on AWSCON309_Containerized Machine Learning on AWS
CON309_Containerized Machine Learning on AWS
 
CMP211_Getting Started with Serverless Architectures
CMP211_Getting Started with Serverless ArchitecturesCMP211_Getting Started with Serverless Architectures
CMP211_Getting Started with Serverless Architectures
 
CON203_Driving Innovation with Containers
CON203_Driving Innovation with ContainersCON203_Driving Innovation with Containers
CON203_Driving Innovation with Containers
 
Introducing Amazon Fargate
Introducing Amazon FargateIntroducing Amazon Fargate
Introducing Amazon Fargate
 
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
 
NET309_Best Practices for Securing an Amazon Virtual Private Cloud
NET309_Best Practices for Securing an Amazon Virtual Private CloudNET309_Best Practices for Securing an Amazon Virtual Private Cloud
NET309_Best Practices for Securing an Amazon Virtual Private Cloud
 
STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data Oceans
 
ABD207 building a banking utility leveraging aws to fight financial crime and...
ABD207 building a banking utility leveraging aws to fight financial crime and...ABD207 building a banking utility leveraging aws to fight financial crime and...
ABD207 building a banking utility leveraging aws to fight financial crime and...
 
AWS Commercial Management and Cost Optimisation - Dec 2017
AWS Commercial Management and Cost Optimisation - Dec 2017AWS Commercial Management and Cost Optimisation - Dec 2017
AWS Commercial Management and Cost Optimisation - Dec 2017
 

Similar a GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, and What Comes Next

Real Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaReal Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaAmazon Web Services
 
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018Amazon Web Services
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsAmazon Web Services
 
ABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
ABD301-Analyzing Streaming Data in Real Time with Amazon KinesisABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
ABD301-Analyzing Streaming Data in Real Time with Amazon KinesisAmazon Web Services
 
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...Amazon Web Services
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Amazon Web Services
 
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Amazon Web Services
 
Dive deep into technical enhancements - re:Invent Come to London 2.0
Dive deep into technical enhancements - re:Invent Come to London 2.0Dive deep into technical enhancements - re:Invent Come to London 2.0
Dive deep into technical enhancements - re:Invent Come to London 2.0Amazon Web Services
 
SRV304_Building High-Throughput Serverless Data Processing Pipelines
SRV304_Building High-Throughput Serverless Data Processing PipelinesSRV304_Building High-Throughput Serverless Data Processing Pipelines
SRV304_Building High-Throughput Serverless Data Processing PipelinesAmazon Web Services
 
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...Amazon Web Services
 
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
Building Big Data Applications with Serverless Architectures -  June 2017 AWS...Building Big Data Applications with Serverless Architectures -  June 2017 AWS...
Building Big Data Applications with Serverless Architectures - June 2017 AWS...Amazon Web Services
 
Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon KinesisAmazon Web Services
 
Getting started with amazon kinesis
Getting started with amazon kinesisGetting started with amazon kinesis
Getting started with amazon kinesisJampp
 
SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study
 SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study
SRV316 Serverless Data Processing at Scale: An Amazon.com Case StudyAmazon Web Services
 
Getting Started with AWS Lambda Serverless Computing
Getting Started with AWS Lambda Serverless ComputingGetting Started with AWS Lambda Serverless Computing
Getting Started with AWS Lambda Serverless ComputingAmazon Web Services
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural PatternsAmazon Web Services
 
ATC301-Big Data & Analytics for Manufacturing Operations
ATC301-Big Data & Analytics for Manufacturing OperationsATC301-Big Data & Analytics for Manufacturing Operations
ATC301-Big Data & Analytics for Manufacturing OperationsAmazon Web Services
 

Similar a GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, and What Comes Next (20)

Real Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaReal Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS Lambda
 
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
ABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
ABD301-Analyzing Streaming Data in Real Time with Amazon KinesisABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
ABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
 
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
 
STG401_This Is My Architecture
STG401_This Is My ArchitectureSTG401_This Is My Architecture
STG401_This Is My Architecture
 
Dive deep into technical enhancements - re:Invent Come to London 2.0
Dive deep into technical enhancements - re:Invent Come to London 2.0Dive deep into technical enhancements - re:Invent Come to London 2.0
Dive deep into technical enhancements - re:Invent Come to London 2.0
 
SRV304_Building High-Throughput Serverless Data Processing Pipelines
SRV304_Building High-Throughput Serverless Data Processing PipelinesSRV304_Building High-Throughput Serverless Data Processing Pipelines
SRV304_Building High-Throughput Serverless Data Processing Pipelines
 
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
 
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
Building Big Data Applications with Serverless Architectures -  June 2017 AWS...Building Big Data Applications with Serverless Architectures -  June 2017 AWS...
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
 
Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon Kinesis
 
Getting started with amazon kinesis
Getting started with amazon kinesisGetting started with amazon kinesis
Getting started with amazon kinesis
 
SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study
 SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study
SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study
 
Getting Started with AWS Lambda Serverless Computing
Getting Started with AWS Lambda Serverless ComputingGetting Started with AWS Lambda Serverless Computing
Getting Started with AWS Lambda Serverless Computing
 
Serverless Developer Experience
Serverless Developer ExperienceServerless Developer Experience
Serverless Developer Experience
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural Patterns
 
ATC301-Big Data & Analytics for Manufacturing Operations
ATC301-Big Data & Analytics for Manufacturing OperationsATC301-Big Data & Analytics for Manufacturing Operations
ATC301-Big Data & Analytics for Manufacturing Operations
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, and What Comes Next

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:INVENT GPS: Real - Time Data Processin g with A WS Lambda Quickly, at Scale, and What Comes Next? J u a n V i l l a | P a r t n e r S o l u t i o n s A r c h i t e c t G P S T E C 3 1 3 N o v e m b e r 2 7 , 2 0 1 7
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. About Me—Juan Villa • Partner Solutions Architect • Focus on IoT, serverless, and machine learning • Love to build cool software and hardware projects • Ex-Software Developer
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real-Time vs Batch Processing? B a t c h P r o c e s s i n g IoT Data Financial Data Log Data NodeJS Python Java C# Clickstream Data Action Pipeline Time/Decision Time 15+ minutes ** DataSources ** application specific, and can be less than 15 minutes, or more than 1 day Serverless Automatically Scales Event Sources S3 Bucket Events Stored Scheduled
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real-Time vs Batch Processing? R e a l - T i m e ( E v e n t D r i v e n ) Action Pipeline Time / Decision Time 0 sec–15 minutes ** IoT Data Financial Data Log Data Clickstream Data DataSources ** as fast as possible, depends on nature of event Near real time? NodeJS Python Java C# Serverless Automatically Scales Event Sources EVENT SOURCE Real-Time Events
  • 5. AWS Lambda S e r v e r l e s s C o m p u t e Efficient performance at scale: Easy to author, deploy, maintain, secure, and manage. Focus on business logic to build back-end services that perform at scale. Bring your own code: Stateless, event-driven code with native support for Node.js, Java, Python, and C# languages. No Infrastructure to manage: Compute without managing infrastructure like Amazon EC2 instances and Auto Scaling groups. Cost-effective: Automatically matches capacity to request rate. Compute cost 100 ms increments. Triggered by events: Direct Sync & Async API calls, AWS service integrations, and third-party triggers.
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Sources and Applications Load Balancer Parse Load Balancer Access Logs looking for anomalous patterns Flow Logs Parse VPC Flow Logs to detect anomalies and abuse CloudTrail Parse CloudTrail events to detect suspicious patterns such as logins from another country CloudWatch (logs, events, alarms) Configure event triggering and alarms from sources such as CloudTrail or other event sources
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Sources and Applications -- Many More -- DynamoDB Stream of real-time changes to database tables DynamoDB S3 events, moderate image with Amazon Rekognition S3 Bucket SNS Trigger Lambda actions from SNS events AWS IoT IoT Rules, Lambda, and IoT device lifecycle events Kinesis Send events into a Kinesis stream for real-time processing Your Application Your own applications!
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lambda Event Driven Models Amazon S3 Amazon SNS ASYNCHRONOUS PUSH MODEL Async Invocation Amazon Alexa AWS IoT SYNCHRONOUS PUSH MODEL Sync Invocation Amazon DynamoDB Amazon Kinesis STREAM PULL MODEL Sync Invocation Mapping owned by Lambda Lambda function invokes when new records found on stream Lambda Execution role policy permissions Lambda polls the streams Invokes Lambda via Event Source API Mapping owned by Event Source Resource-based policy permissions Concurrent executions HOW IT WORKS
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pipeline Example #1 P a r s i n g S e c u r i t y E v e n t s CloudWatch Events CloudTrail Kinesis Stream Lambda SNS Notify when an EC2 instance tagged as production is terminated. Kinesis Firehose ElasticsearchEvent TargetEC2 Index audit trail events in ES, and search with Kibana and live dashboards. Event Source ~ 1 minute
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Configuring CloudWatch Event Rules P a r s i n g S e c u r i t y E v e n t s • Configure “Event Source” • EC2 CloudTrail “TerminateInstances” event • Can have more than one event type in source filter • Configure event “Targets” • Kinesis • SNS • Can have more than one target Configure Target Configure Source
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Configuring CloudWatch Event Rules P a r s i n g S e c u r i t y E v e n t s CloudWatch Event Source Pattern • Will get EC2 CloudTrail audit events on the account • CloudTrail audit events track API events, which includes the principal, target resource, and action that took place • Specifically filtering for “TerminateInstances” events • On average, events are emitted in ~1–2 minutes
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What Does the Event Look Like? P a r s i n g S e c u r i t y E v e n t s What do we need? Summarized • IAM user that terminated the instance • When the instance was terminated • IP address of user that terminated the instance • What instance was terminated
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lambda Function P a r s i n g S e c u r i t y E v e n t s • Send SMS message to administrator when a user terminates an instance tagger as a production instance • We could also send a message to a Slack Channel, or Pager Duty, or send an email • Got notified in 30 seconds! Imports Boto3 clients Parse Kinesis Record Extract Username and InstanceId Check if instance is Production Send SMS!
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Executing the Lambda Function P a r s i n g S e c u r i t y E v e n t s • Configure a Kinesis trigger for our Lambda function • Lambda service handles the Kinesis ingestion and execution of the Lambda function • Also handles errors and retries • Finally, we get our text message when an instance is terminated Huzzah!
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Deeper Look into Amazon Kinesis Real-time: Collect real-time data streams and promptly respond to key business events and operational triggers. Real-time latencies. Easy to use: Focus on quickly launching data streaming applications instead of managing infrastructure. Amazon Kinesis offering: Managed services for streaming data ingestion and processing. • Amazon Kinesis Streams: Build applications that process or analyze streaming data. • Amazon Kinesis Firehose: Load massive volumes of streaming data into Amazon S3, Amazon Redshift, and Elasticsearch. • Amazon Kinesis Analytics: Analyze data streams using SQL queries. LatestTrim Horizon (1–7 days retention) Record Publish Lambda Consumer Iterator
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Creating a Kinesis Stream Streams • Made up of shards • Each shard ingests/reads data up to 1 MB/sec • Each shard emits/writes data up to 2 MB/sec • Each shard supports 5 read transactions/sec Data • All data is stored and is replayable for 24 hours (default) • Retention window can be configured up to 7 days • Partition key used to distribute PUTs across shards • Even partition key distribution optimizes throughput Best practice Determine an initial size/shards to plan for expected maximum demand  Leverage “Help me decide how many shards I need” option in console  Use formula for number of shards: max(incoming_write_bandwidth_in_KB/1000, outgoing_read_bandwidth_in_KB / 2000)
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Creating Lambda Function Memory CPU allocation proportional to the memory configured Increasing memory makes your code execute faster (if CPU bound) Increasing memory allows for larger record sizes processed Timeout Increasing timeout allows for longer functions, but longer wait in case of errors Permission model Execution role defined for Lambda must have permission to access the stream Retries With Amazon Kinesis, Lambda retries until the data expires (24 hours) Best practice Write Lambda function code to be stateless Instantiate AWS clients and database clients outside the scope of the function handler to take advantage of connection re-use.
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Configuring Event Trigger Amazon Kinesis mapped as event source in Lambda Batch size Max number of records that Lambda will send to one invocation Not equivalent to effective batch size Effective batch size is every 250 ms—calculated as: MIN(records available, batch size, 6MB) Increasing batch size allows fewer Lambda function invocations with more data processed per function Best Practices Set to “Trim Horizon” for reading from start of stream (all data) Set to “Latest” for reading most recent data (LIFO) (latest data)
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How It Works Polling Concurrent polling and processing per shard Lambda polls every 250 ms if no records found Will grab as much data as possible in one GetRecords call (Batch) Batching Batches are passed for invocation to Lambda through function parameters Batch size may impact duration if the Lambda function takes longer to process more records Sub batch in memory for invocation payload Synchronous invocation Batches invoked as synchronous RequestResponse type Lambda honors Amazon Kinesis at least once semantics Each shard blocks in order of synchronous invocation 250 ms poll Batch Sync invocation
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tuning Throughput If put/ingestion rate is greater than the theoretical throughput, your processing is at risk of falling behind Maximum theoretical throughput # shards * 2 MB / Lambda function duration (s) Effective theoretical throughput # shards * batch size (MB) / Lambda function duration (s) … Shards Scale Amazon Kinesis by splitting and merging shards Lambda Lambda Functions Lambda will scale automatically Kinesis Kinesis Stream
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tuning Throughput with Retries Retries Will retry on execution failures until the record is expired Throttles and errors impact duration and directly impact throughput Best Practice Retry with exponential back-off of up to 60 s Effective theoretical throughput with retries ( # shards * batch size (MB) ) / ( function duration (s) * retries until expiry) … Shards Scale Amazon Kinesis by splitting and merging shards Lambda Lambda Functions Lambda will scale automatically Kinesis Kinesis Stream
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitoring Your Real-Time Pipeline •GetRecords: (effective throughput) •PutRecord: bytes, latency, records, etc. •GetRecords.IteratorAgeMilliseconds: how old your last processed records were Monitoring Amazon Kinesis Streams Monitoring Lambda functions •Invocation count: time function invoked •Duration: execution/processing time •Error count: number of errors •Throttle count: number of time function throttled •Iterator age: time elapsed from batch received and final record written to stream • Review all metrics • Make custom logs • View RAM consumed • Search for log events Debugging
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example Debugging with X-Ray Retries with Lambda/Exponential Back-offs • Total Duration = 6 seconds (five retries before success) • Lambda Invocation Duration = 2.2 seconds • Actual Throughput = 1 * (10000/6) = 1666 records/s • Ideal Throughput = 1 * (10000/2.2) = 4545 records/s 2.72x difference AWS X-Ray
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Falling Behind Not Processing Fast Enough • Producers are inserting 100k records per minute • 2 shards • Lambda functions take 1.7 seconds to process 1000 records • 2 * (1000/1.7) = 1176 records/s = ~70k/minute 100k produced per minute > ~70k processed per minute
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Resharding to Increase Throughput Increase Number of Shards to Catch Up • Increased number of shards to 4 (doubled) • Lambda functions take 1.7 seconds to process 1000 records • 4 * (1000/1.7) = 2352 records/s = ~141k / minute 100k produced per minute < ~141k processed per minute REAL TIME!
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Netflix and AWS Amazon Kinesis Streams processes multiple terabytes of log data each day, yet events show up in our analytics in seconds. We can discover and respond to issues in real time, ensuring high availability and a great customer experience. • Netflix needed a solution for ingesting, augmenting, and analyzing the multiple terabytes of data its network generates daily in the form of virtual private cloud (VPC) flow logs. • Netflix uses AWS to analyze billions of messages across more than 100,000 application instances daily in real time, enabling it to optimize user experience, reduce costs, and improve application resilience. –John Bennett Senior Software Engineer, Netflix ” “
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What about EMR? IoT Data Financial Data Log Data EVENT SOURCE Apache Spark Clickstream Data Action Pipeline DataSources Amazon EMR Apache Spark Kinesis Connector Real-Time Events
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why EMR + Apache Spark? • Mature features and libraries • Machine learning functionality with SparkML • Interact with Data using SparkSQL • Spark Streaming provides support for operators states such as sliding windows • Fault tolerance/recovery out of the box • Ability to combine batch and streaming • Multiple languages (Scala, Java, Python, R, more) SparkSQL (Python) SparkML (Python) GraphX (Scala)
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Zillow and AWS Zillow provides online home information to tens of millions of buyers and sellers every day. We can compute Zestimates in seconds, as opposed to hours, by using Amazon Kinesis Streams and Spark on Amazon EMR. • Needed to provide timely home valuations for all new homes • Runs Zestimate, its machine learning-based home- valuation tool, on AWS • Performs machine-learning jobs in hours instead of a day • Gives customers more accurate data on more than 100 million homes • Scales storage and compute capacity on demand –Jasjeet Thind Vice President of Data Science and Engineering ” “
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Next Steps  Learn more about AWS Serverless at https://aws.amazon.com/serverless  Explore the AWS Lambda Reference Architecture on GitHub:  Real-Time Streaming: https://github.com/awslabs/lambda-refarch- streamprocessing  Distributed Computing Reference Architecture (serverless MapReduce) https://github.com/awslabs/lambda-refarch-mapreduce
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Next Steps  Create an Amazon Kinesis stream. Visit the Amazon Kinesis Console and configure a stream to receive data Ex. data from Social media feeds.  Create and test a Lambda function to process streams from Amazon Kinesis by visiting Lambda console. First 1M requests each month are on us!  Read the Developer Guide and try the Lambda and Amazon Kinesis Tutorial:  http://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html  Send questions, comments, feedback to the AWS Lambda Forums
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!