SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ben Snively, Solutions Architect
Data and Analytics
N o v e m b e r , 2 0 1 7
AWS re:INVENT
Best Practices for Building
Serverless Big Data Applications
Agenda
Serverless – what and why?
Serverless – which service when?
Common big data applications
Fitting serverless into big data applications
Next steps…
Serverless Analytics Evolution…
Virtualized Managed Serverless
Provision
Servers
Configure
Clusters
Run
Analytics
No servers to provision
or manage
Scales with usage
Never pay for idle Availability and
fault-tolerance built in
Serverless characteristics
Serverless nicely fits into big data platforms
• Mix and match serverless, managed, and virtualized
• Leverage services to easily
• Rapidly ingest, categorize, and discover your data
• Allow easy query and analysis of your data
• Transform and Load data
• Provide custom event based handlers
• Serverless allows you to focuses more analytics and
not on infrastructure or servers
Lambda
• Run your code in the cloud—fully
managed and highly available
• Triggered through API or state
changes in your setup
• Scales automatically to match the
incoming event rate
• Node.js (JavaScript), Python, Java,
and C#
• Charged per 100-ms execution time
Serverless compute
Amazon
Athena
Interactive query service
• Query directly from Amazon S3
• Use ANSI SQL
• Serverless
• Multiple data formats
• Pay per query
AWS Glue
Serverless catalog and ETL/ELT service
Data Catalog
Job Authoring
Job Execution
Crawl, discover, and organize data
Integration with managed and serverless analytics
Serverless ETL – Pay for what you consume
Kinesis Streams
• For technical developers
• Build your own custom
applications that process
or analyze streaming
data
Kinesis Firehose
• For all developers and
data scientists
• Easily load massive
volumes of streaming data
into Amazon S3, Amazon
Redshift, and Amazon ES
Kinesis Analytics
• For all developers and data
scientists
• Easily analyze data
streams using standard
SQL queries
Amazon Kinesis: Streaming data made easy
Services make it easy to capture, deliver, and process streams on AWS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Artificial
Intelligence
Applying Serverless to Big Data Applications?
Characteristics of a big data applications
Future
Proof
Flexible
Access
Dive in
Anywhere
Collect
Anything
Components of big data applications
Catalog
& Search
Protect
& Secure
Access &
User InterfaceIngest & Store
Prepare &
Transform
Analyze &
Reason
Components of big data applications
Catalog
& Search
Protect
& Secure
Access &
User InterfaceIngest & Store
S3
Prepare &
Transform
Analyze &
Reason
Amazon
Kinesis
Components of big data applications
Protect
& Secure
Access &
User InterfaceIngest & Store
S3
Prepare &
Transform
AWS Glue
Data Catalog
Analyze &
Reason
Amazon
Kinesis
Components of big data applications
Protect
& Secure
Access &
User InterfaceIngest & Store
Amazon
Kinesis
S3
AWS Lambda
AWS Glue
Prepare &
Transform
AWS Glue
Data Catalog
Analyze &
Reason
Components of big data applications
Protect
& Secure
Access &
User Interface
Prepare &
TransformIngest & Store
Amazon
Kinesis
S3
AWS Lambda
AWS Glue
Kinesis Analytics
Amazon
Athena
AWS Glue
Data Catalog
Analyze &
Reason
Components of big data applications
Protect
& Secure
Access &
User Interface
Prepare &
TransformIngest & Store
Amazon
Kinesis
S3
AWS Lambda
AWS Glue
Kinesis Analytics
Amazon
Athena
Amazon
QuickSight
AWS Glue
Data Catalog
Analyze &
Reason
Components of big data applications
Protect
& Secure
Access &
User Interface
Prepare &
TransformIngest & Store
Amazon
Kinesis
S3
AWS Lambda
AWS Glue
Kinesis Analytics
Amazon
Athena
QuickSight
Glue Data
Catalog
Analyze &
Reason
Serverless real-time analytics
Prepare &
TransformIngest & Store
Kinesis AWS Lambda Kinesis Analytics
Preprocessing Logic
to transform real-
time data
Serverless Streams
and Firehose
SQL-based analytics
on streaming data
QuickSight
Analyze &
Reason
Demonstration
Lambda pre-processing
Transform, Enrich, Filter
Amazon Kinesis Analytics—SQL
In this example:
19 Lines of SQL = Serverless Realtime Analytics
Larger picture of what we showed:
Fitting into existing real-time analytics
Producer
Apache
Kafka
KCL
AWS Lambda
Spark
Streaming
Apache
Storm
Amazon
SNS
Notifications
Amazon
ElastiCache
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
Alert
Analytics
Output KPI
Serverless
Managed
DynamoDB
Streams
Kinesis
Streams
Virtualized
Kinesis
Analytics
Apache
FlinkSQS
Components of big data applications
Protect
& Secure
Access &
User Interface
Prepare &
TransformIngest & Store
Amazon
Kinesis
S3
AWS Lambda
AWS Glue
Kinesis Analytics
Amazon
Athena
Amazon
QuickSight
Glue Data
Catalog
Analyze &
Reason
Serverless interactive analytics
Prepare &
TransformIngest & Store
AWS Glue
Amazon
Athena
QuickSight
S3
AWS Glue
Data Catalog
Analyze &
Reason
Demonstration
Interactive analytics
Producer Amazon S3
Amazon
Redshift
Amazon EMR
Presto
Impala
Spark
Amazon
Athena
Serverless
Managed
Virtualized
Amazon QuickSight
Components of big data applications
Protect
& Secure
Access &
User Interface
Prepare &
TransformIngest & Store
Kinesis
S3
AWS Lambda
AWS Glue
Kinesis Analytics
Amazon
Athena
QuickSight
AWS Glue
Data Catalog
Integrated Pipeline
Analyze &
Reason
Catalog & Search
Access and search metadata
Access & User Interface
Give your users easy and secure access
DynamoDB Elasticsearch API Gateway Identity & Access
Management
Cognito
QuickSight Amazon AI EMR Redshift
Athena Kinesis RDS
Central Storage
Secure, cost-effective
Storage in Amazon S3
S3
Snowball Database Migration
Service
Kinesis Firehose Direct Connect
Data Ingestion
Get your data into S3
Quickly and securely
Protect and Secure
Use entitlements to ensure data is secure and users’ identities are verified
Processing & Analytics
Use of predictive and prescriptive
analytics to gain better understanding
Security Token
Service
CloudWatch CloudTrail Key Management
Service
Data lake reference architecture
Catalog & Search
Access and search metadata
Access & User Interface
Give your users easy and secure access
DynamoDB Elasticsearch API Gateway Identity & Access
Management
Cognito
QuickSight A
Central Storage
Secure, cost-effective
Storage in Amazon S3
Glue ETL
The right tool for the right job
Machine Learning/Deep Learning
Business
Reporting
Data Scientists
Data Engineer
IDE
Data
Catalog
Central
Storage
What about existing Hadoop Clusters?
The right tool for the right job
Machine Learning/Deep Learning
Business
Reporting
Data Scientists
Data Engineer
IDE
Data
Catalog
Central
Storage
Serverless nicely fits into big data platforms
• Mix and match serverless, managed, and virtualized services
• Leverage services to easily
• Rapidly ingest, categorize, and discover your data
• Allow easy query and analysis of your data
• Transform and Load data
• Provide custom event based handlers
• Serverless allows you to focuses more analytics and not on
infrastructure or servers
• Pay only for what you use
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
C L I C K T O A D D T E X T
C L I C K T O A D D T E X T
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

GPSWKS401_Designing a Cloud Enterprise Data Warehouse
GPSWKS401_Designing a Cloud Enterprise Data WarehouseGPSWKS401_Designing a Cloud Enterprise Data Warehouse
GPSWKS401_Designing a Cloud Enterprise Data Warehouse
 
ABD206-Building Visualizations and Dashboards with Amazon QuickSight
ABD206-Building Visualizations and Dashboards with Amazon QuickSightABD206-Building Visualizations and Dashboards with Amazon QuickSight
ABD206-Building Visualizations and Dashboards with Amazon QuickSight
 
GPSTEC307_Too Many Tools
GPSTEC307_Too Many ToolsGPSTEC307_Too Many Tools
GPSTEC307_Too Many Tools
 
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
 
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
 
HLC308_Refactoring to the Cloud
HLC308_Refactoring to the CloudHLC308_Refactoring to the Cloud
HLC308_Refactoring to the Cloud
 
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTHow TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
 
An Overview of Best Practices for Large Scale Migrations
An Overview of Best Practices for Large Scale MigrationsAn Overview of Best Practices for Large Scale Migrations
An Overview of Best Practices for Large Scale Migrations
 
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
 
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
 
From Batch to Streaming - How Amazon Flex Uses Real-time Analytics
From Batch to Streaming - How Amazon Flex Uses Real-time AnalyticsFrom Batch to Streaming - How Amazon Flex Uses Real-time Analytics
From Batch to Streaming - How Amazon Flex Uses Real-time Analytics
 
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
 
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
ABD208_Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores...
 
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
 
Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...
Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...
Easy and Scalable Log Analytics with Amazon Elasticsearch Service - ABD326 - ...
 
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
RET303_Drive Warehouse Efficiencies with the Same AWS IoT Technology that Pow...
 
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...
 
Migrating Microsoft Workloads to AWS
Migrating Microsoft Workloads to AWSMigrating Microsoft Workloads to AWS
Migrating Microsoft Workloads to AWS
 
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...Best Practices for Distributed Machine Learning and Predictive Analytics Usin...
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...
 
EUT305_Delivering the Future of Energy with Connected Home Products Using AWS...
EUT305_Delivering the Future of Energy with Connected Home Products Using AWS...EUT305_Delivering the Future of Energy with Connected Home Products Using AWS...
EUT305_Delivering the Future of Energy with Connected Home Products Using AWS...
 

Similar a ABD202_Best Practices for Building Serverless Big Data Applications

Track 6 Session 1_進入 AI 領域的第一步驟_資料平台的建置.pptx
Track 6 Session 1_進入 AI 領域的第一步驟_資料平台的建置.pptxTrack 6 Session 1_進入 AI 領域的第一步驟_資料平台的建置.pptx
Track 6 Session 1_進入 AI 領域的第一步驟_資料平台的建置.pptx
Amazon Web Services
 

Similar a ABD202_Best Practices for Building Serverless Big Data Applications (20)

Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsFinding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
 
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
 
BDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSBDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWS
 
Serverless Big Data Architectures: Serverless Data Analytics
Serverless Big Data Architectures: Serverless Data AnalyticsServerless Big Data Architectures: Serverless Data Analytics
Serverless Big Data Architectures: Serverless Data Analytics
 
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
 
The Beginner's Guide to Data Lakes in AWS
The Beginner's Guide to Data Lakes in AWSThe Beginner's Guide to Data Lakes in AWS
The Beginner's Guide to Data Lakes in AWS
 
BDA303 Serverless big data architectures: Design patterns and best practices
BDA303 Serverless big data architectures: Design patterns and best practicesBDA303 Serverless big data architectures: Design patterns and best practices
BDA303 Serverless big data architectures: Design patterns and best practices
 
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSight
 
Track 6 Session 1_進入 AI 領域的第一步驟_資料平台的建置.pptx
Track 6 Session 1_進入 AI 領域的第一步驟_資料平台的建置.pptxTrack 6 Session 1_進入 AI 領域的第一步驟_資料平台的建置.pptx
Track 6 Session 1_進入 AI 領域的第一步驟_資料平台的建置.pptx
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSight
 
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
 
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
 

Más de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

ABD202_Best Practices for Building Serverless Big Data Applications

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ben Snively, Solutions Architect Data and Analytics N o v e m b e r , 2 0 1 7 AWS re:INVENT Best Practices for Building Serverless Big Data Applications
  • 2. Agenda Serverless – what and why? Serverless – which service when? Common big data applications Fitting serverless into big data applications Next steps…
  • 3. Serverless Analytics Evolution… Virtualized Managed Serverless Provision Servers Configure Clusters Run Analytics
  • 4. No servers to provision or manage Scales with usage Never pay for idle Availability and fault-tolerance built in Serverless characteristics
  • 5. Serverless nicely fits into big data platforms • Mix and match serverless, managed, and virtualized • Leverage services to easily • Rapidly ingest, categorize, and discover your data • Allow easy query and analysis of your data • Transform and Load data • Provide custom event based handlers • Serverless allows you to focuses more analytics and not on infrastructure or servers
  • 6. Lambda • Run your code in the cloud—fully managed and highly available • Triggered through API or state changes in your setup • Scales automatically to match the incoming event rate • Node.js (JavaScript), Python, Java, and C# • Charged per 100-ms execution time Serverless compute
  • 7. Amazon Athena Interactive query service • Query directly from Amazon S3 • Use ANSI SQL • Serverless • Multiple data formats • Pay per query
  • 8. AWS Glue Serverless catalog and ETL/ELT service Data Catalog Job Authoring Job Execution Crawl, discover, and organize data Integration with managed and serverless analytics Serverless ETL – Pay for what you consume
  • 9. Kinesis Streams • For technical developers • Build your own custom applications that process or analyze streaming data Kinesis Firehose • For all developers and data scientists • Easily load massive volumes of streaming data into Amazon S3, Amazon Redshift, and Amazon ES Kinesis Analytics • For all developers and data scientists • Easily analyze data streams using standard SQL queries Amazon Kinesis: Streaming data made easy Services make it easy to capture, deliver, and process streams on AWS
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Artificial Intelligence Applying Serverless to Big Data Applications?
  • 11. Characteristics of a big data applications Future Proof Flexible Access Dive in Anywhere Collect Anything
  • 12. Components of big data applications Catalog & Search Protect & Secure Access & User InterfaceIngest & Store Prepare & Transform Analyze & Reason
  • 13. Components of big data applications Catalog & Search Protect & Secure Access & User InterfaceIngest & Store S3 Prepare & Transform Analyze & Reason Amazon Kinesis
  • 14. Components of big data applications Protect & Secure Access & User InterfaceIngest & Store S3 Prepare & Transform AWS Glue Data Catalog Analyze & Reason Amazon Kinesis
  • 15. Components of big data applications Protect & Secure Access & User InterfaceIngest & Store Amazon Kinesis S3 AWS Lambda AWS Glue Prepare & Transform AWS Glue Data Catalog Analyze & Reason
  • 16. Components of big data applications Protect & Secure Access & User Interface Prepare & TransformIngest & Store Amazon Kinesis S3 AWS Lambda AWS Glue Kinesis Analytics Amazon Athena AWS Glue Data Catalog Analyze & Reason
  • 17. Components of big data applications Protect & Secure Access & User Interface Prepare & TransformIngest & Store Amazon Kinesis S3 AWS Lambda AWS Glue Kinesis Analytics Amazon Athena Amazon QuickSight AWS Glue Data Catalog Analyze & Reason
  • 18. Components of big data applications Protect & Secure Access & User Interface Prepare & TransformIngest & Store Amazon Kinesis S3 AWS Lambda AWS Glue Kinesis Analytics Amazon Athena QuickSight Glue Data Catalog Analyze & Reason
  • 19. Serverless real-time analytics Prepare & TransformIngest & Store Kinesis AWS Lambda Kinesis Analytics Preprocessing Logic to transform real- time data Serverless Streams and Firehose SQL-based analytics on streaming data QuickSight Analyze & Reason
  • 22. Amazon Kinesis Analytics—SQL In this example: 19 Lines of SQL = Serverless Realtime Analytics
  • 23. Larger picture of what we showed:
  • 24. Fitting into existing real-time analytics Producer Apache Kafka KCL AWS Lambda Spark Streaming Apache Storm Amazon SNS Notifications Amazon ElastiCache Amazon DynamoDB Amazon RDS Amazon ES Alert Analytics Output KPI Serverless Managed DynamoDB Streams Kinesis Streams Virtualized Kinesis Analytics Apache FlinkSQS
  • 25. Components of big data applications Protect & Secure Access & User Interface Prepare & TransformIngest & Store Amazon Kinesis S3 AWS Lambda AWS Glue Kinesis Analytics Amazon Athena Amazon QuickSight Glue Data Catalog Analyze & Reason
  • 26. Serverless interactive analytics Prepare & TransformIngest & Store AWS Glue Amazon Athena QuickSight S3 AWS Glue Data Catalog Analyze & Reason
  • 28. Interactive analytics Producer Amazon S3 Amazon Redshift Amazon EMR Presto Impala Spark Amazon Athena Serverless Managed Virtualized Amazon QuickSight
  • 29. Components of big data applications Protect & Secure Access & User Interface Prepare & TransformIngest & Store Kinesis S3 AWS Lambda AWS Glue Kinesis Analytics Amazon Athena QuickSight AWS Glue Data Catalog Integrated Pipeline Analyze & Reason
  • 30. Catalog & Search Access and search metadata Access & User Interface Give your users easy and secure access DynamoDB Elasticsearch API Gateway Identity & Access Management Cognito QuickSight Amazon AI EMR Redshift Athena Kinesis RDS Central Storage Secure, cost-effective Storage in Amazon S3 S3 Snowball Database Migration Service Kinesis Firehose Direct Connect Data Ingestion Get your data into S3 Quickly and securely Protect and Secure Use entitlements to ensure data is secure and users’ identities are verified Processing & Analytics Use of predictive and prescriptive analytics to gain better understanding Security Token Service CloudWatch CloudTrail Key Management Service Data lake reference architecture Catalog & Search Access and search metadata Access & User Interface Give your users easy and secure access DynamoDB Elasticsearch API Gateway Identity & Access Management Cognito QuickSight A Central Storage Secure, cost-effective Storage in Amazon S3 Glue ETL
  • 31. The right tool for the right job Machine Learning/Deep Learning Business Reporting Data Scientists Data Engineer IDE Data Catalog Central Storage
  • 32. What about existing Hadoop Clusters?
  • 33. The right tool for the right job Machine Learning/Deep Learning Business Reporting Data Scientists Data Engineer IDE Data Catalog Central Storage
  • 34. Serverless nicely fits into big data platforms • Mix and match serverless, managed, and virtualized services • Leverage services to easily • Rapidly ingest, categorize, and discover your data • Allow easy query and analysis of your data • Transform and Load data • Provide custom event based handlers • Serverless allows you to focuses more analytics and not on infrastructure or servers • Pay only for what you use
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! C L I C K T O A D D T E X T C L I C K T O A D D T E X T
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!