SlideShare una empresa de Scribd logo
1 de 55
Descargar para leer sin conexión
Big	Data	Architectural	Pa0erns	and		
Best	Prac4ces	on	AWS	
Ran Tessler, AWS Solu0ons Architecture Manager
What to Expect from the Session
Big data challenges
How to simplify big data processing
What technologies should you use?
•  Why?
•  How?
Reference architecture
Design patterns
Ever Increasing Big Data
Volume
Velocity
Variety
Big Data Evolution
Batch	
	
Report	
	
Real-time		
	
Alerts	
Prediction	
	
Forecast
Plethora of Tools
Amazon
Glacier
S3
DynamoDB RDS
EMR
Amazon
Redshift
Amazon Kinesis
CloudSearch
Kinesis-
enabled
app
Lambda Amazon ML
SQS
ElastiCache
Data Pipeline
DynamoDB
Streams
Is there a reference architecture?
What tools should I use?
How?
Why?
Architectural Principles
•  Decoupled “data bus”
•  Data → Store → Process → Answers
•  Use the right tool for the job
•  Data structure, latency, throughput, access patterns
•  Use Lambda architecture ideas
•  Immutable (append-only) log, batch/speed/serving layer
•  Leverage AWS managed services
•  No/low admin
•  Big data ≠ big cost
Simplify Big Data Processing
ingest /
collect
store
process /
analyze
consume /
visualize
Time to Answer (Latency)
Throughput
Cost
Collect /
Ingest
Types of Data
•  Transactional
•  Database reads & writes (OLTP)
•  Cache
•  Search
•  Logs
•  Streams
•  File
•  Log files (/var/log)
•  Log collectors & frameworks
•  Stream
•  Log records
•  Sensors & IoT data
Database
File
Storage
Stream
Storage
A
iOS
 Android
Web Apps
Logstash
Logging
IoT
Applications
Transactional Data
File Data
Stream Data
Mobile
Apps
Search Data
Search
Collect Store
Logging
IoT
Store
Stream
Storage
A
iOS
 Android
Web Apps
Logstash
Amazon
RDS
Amazon
DynamoDB
Amazon
ES
Amazon

S3
Apache
Kafka
Amazon

Glacier
Amazon

Kinesis
Amazon

DynamoDB
Amazon
ElastiCache
SearchSQLNoSQLCache
StreamStorage
FileStorage
Transactional Data
File Data
Stream Data
Mobile
Apps
Search Data
Database
File
Storage
Search
Collect Store
Logging
IoT
Applications
ü 
Stream Storage Options
•  AWS managed services
•  Amazon Kinesis → streams
•  DynamoDB Streams → table + streams
•  Amazon SQS → queue
•  Amazon SNS → pub/sub
•  Unmanaged
•  Apache Kafka → stream
Why Stream Storage?
•  Decouple producers & consumers
•  Persistent buffer
•  Collect multiple streams
•  Preserve client ordering
•  Streaming MapReduce
•  Parallel consumption
4 4 3 3 2 2 1 1
4 3 2 1
4 3 2 1
4 3 2 1
4 3 2 1
4 4 3 3 2 2 1 1
Shard 1 / Partition 1
Shard 2 / Partition 2
Consumer 1
Count of
Red = 4
Count of
Violet = 4
Consumer 2
Count of
Blue = 4
Count of
Green = 4
DynamoDB Stream Kinesis Stream Kafka Topic
What About Queues & Pub/Sub ?
•  Decouple producers &
consumers/subscribers
•  Persistent buffer
•  Collect multiple streams
•  No client ordering
•  No parallel consumption for
Amazon SQS
•  Amazon SNS can route
to multiple queues or ʎ
functions
•  No streaming MapReduce
Consumers
Producers
Producers
Amazon SNS
Amazon SQS
queue
topic
function
ʎ
AWS Lambda
Amazon SQS
queue
Subscriber
Which stream storage should I use?
Amazon
Kinesis
DynamoDB
Streams
Amazon SQS
Amazon SNS
Kafka
Managed Yes Yes Yes No
Ordering Yes Yes No Yes
Delivery at-least-once exactly-once at-least-once at-least-once
Lifetime 7 days 24 hours 14 days Configurable
Replication 3 AZ 3 AZ 3 AZ Configurable
Throughput No Limit No Limit No Limit ~ Nodes
Parallel Clients Yes Yes No (SQS) Yes
MapReduce Yes Yes No Yes
Record size 1MB 400KB 256KB Configurable
Cost Low Higher(table cost) Low-Medium Low (+admin)
File
Storage
A
iOS
 Android
Web Apps
Logstash
Amazon
RDS
Amazon
DynamoDB
Amazon
ES
Amazon

S3
Apache
Kafka
Amazon

Glacier
Amazon

Kinesis
Amazon

DynamoDB
Amazon
ElastiCache
SearchSQLNoSQLCache
StreamStorage
FileStorage
Transactional Data
File Data
Stream Data
Mobile
Apps
Search Data
Database
Search
Collect Store
Logging
IoT
Applications
ü 
Why Is Amazon S3 Good for Big Data?
•  Natively supported by big data frameworks (Spark, Hive, Presto, etc.)
•  No need to run compute clusters for storage (unlike HDFS)
•  Can run transient Hadoop clusters & Amazon EC2 Spot instances
•  Multiple distinct (Spark, Hive, Presto) clusters can use the same data
•  Unlimited number of objects
•  Very high bandwidth – no aggregate throughput limit
•  Highly available – can tolerate AZ failure
•  Designed for 99.999999999% durability
•  Tired-storage (Standard, IA, Amazon Glacier) via life-cycle policy
•  Secure – SSL, client/server-side encryption at rest
•  Low cost
What about HDFS & Amazon Glacier?
•  Use HDFS for very frequently
accessed (hot) data
•  Use Amazon S3 Standard for
frequently accessed data
•  Use Amazon S3 Standard –
IA for infrequently accessed
data
•  Use Amazon Glacier for
archiving cold data
Database +
Search
Tier
A
iOS
 Android
Web Apps
Logstash
Amazon
RDS
Amazon
DynamoDB
Amazon
ES
Amazon

S3
Apache
Kafka
Amazon

Glacier
Amazon

Kinesis
Amazon

DynamoDB
Amazon
ElastiCache
SearchSQLNoSQLCache
StreamStorage
FileStorage
Transactional Data
File Data
Stream Data
Mobile
Apps
Search Data
Collect Store
ü 
Database + Search Tier Anti-pattern
RDBMS
Database + Search Tier
Applications
Best Practice — Use the Right Tool for the Job
Data Tier
Search
Amazon
Elasticsearch
Service
Amazon
CloudSearch
Cache
Redis
Memcached
SQL
Amazon Aurora
MySQL
PostgreSQL
Oracle
SQL Server
MariaDB
NoSQL
Cassandra
Amazon
DynamoDB
HBase
MongoDB
Applications
Database + Search Tier
Materialized Views
What Data Store Should I Use?
•  Data structure → Fixed schema, JSON, key-value
•  Access patterns → Store data in the format you will
access it
•  Data / access characteristics → Hot, warm, cold
•  Cost → Right cost
Data Structure and Access Patterns
Access Patterns What to use?
Put/Get (Key, Value) Cache, NoSQL
Simple relationships → 1:N, M:N NoSQL
Cross table joins, transaction, SQL SQL
Faceting, Search Search
Data Structure What to use?
Fixed schema NoSQL, SQL
Schema-free (JSON) NoSQL, Search
(Key, Value) NoSQL, Cache
What Is the Temperature of Your Data / Access ?
Data / Access Characteristics: Hot, Warm, Cold
Hot Warm Cold
Volume MB–GB GB–TB PB
Item size B–KB KB–MB KB–TB
Latency ms ms, sec min, hrs
Durability Low–High High Very High
Request rate Very High High Low
Cost/GB $$-$ $-¢¢ ¢
Hot Data Warm Data Cold Data
Cache
SQL
Request Rate
High Low
Cost/GB
High Low
Latency
Low High
Data Volume
Low High
GlacierStructure
NoSQL
Hot Data Warm Data Cold Data
Low
High
S3
Search
HDFS
What Data Store Should I Use?
Amazon
ElastiCache
Amazon
DynamoDB
Amazon
Aurora
Amazon
Elasticsearch
Amazon
EMR (HDFS)
Amazon S3 Amazon Glacier
Average
latency
ms ms ms, sec ms,sec sec,min,hrs ms,sec,min
(~ size)
hrs
Data volume GB GB–TBs
(no limit)
GB–TB
(64 TB
Max)
GB–TB GB–PB
(~nodes)
MB–PB
(no limit)
GB–PB
(no limit)
Item size B-KB KB
(400 KB
max)
KB
(64 KB)
KB
(1 MB max)
MB-GB KB-GB
(5 TB max)
GB
(40 TB max)
Request rate High -
Very High
Very High
(no limit)
High High Low – Very
High
Low –
Very High
(no limit)
Very Low
Storage cost
GB/month
$$ ¢¢ ¢¢ ¢¢ ¢ ¢ ¢/10
Durability Low -
Moderate
Very High Very High High High Very High Very High
Hot Data Warm Data Cold Data
Hot Data Warm Data Cold Data
Example: Should I use Amazon S3 or Amazon DynamoDB?
“I’m currently scoping out a project that will greatly increase
my team’s use of Amazon S3. Hoping you could answer
some questions. The current iteration of the design calls for
many small files, perhaps up to a billion during peak. The
total size would be on the order of 1.5 TB per month…”
Cost Conscious Design
https://calculator.s3.amazonaws.com/index.html
Example: Should I use Amazon S3 or Amazon DynamoDB?
Cost Conscious Design
Request rate
(Writes/sec)
Object size
(Bytes)
Total size
(GB/month)
Objects per month
300 2048 1483 777,600,000
Request rate
(Writes/sec)
Object size
(Bytes)
Total size
(GB/month)
Objects per
month
300 2,048 1,483 777,600,000
Amazon S3 or
Amazon
DynamoDB?
Request rate
(Writes/sec)
Object size
(Bytes)
Total size
(GB/month)
Objects per
month
Scenario 1300 2,048 1,483 777,600,000
Scenario 2300 32,768 23,730 777,600,000
Amazon S3
Amazon DynamoDB
use
use
Process /
Analyze
Process / Analyze
Analysis of data is a process of inspecting, cleaning,
transforming, and modeling data with the goal of discovering
useful information, suggesting conclusions, and supporting
decision-making.
Examples
•  Interactive dashboards → Interactive analytics
•  Daily/weekly/monthly reports → Batch analytics
•  Billing/fraud alerts, 1 minute metrics → Real-time analytics
•  Sentiment analysis, prediction models → Machine learning
Interactive Analytics
Takes large amount of (warm/cold) data
Takes seconds to get answers back
Example: Self-service dashboards
Batch Analytics
Takes large amount of (warm/cold) data
Takes minutes or hours to get answers back
Example: Generating daily, weekly, or monthly reports
Real-Time Analytics
Take small amount of hot data and ask questions
Takes short amount of time (milliseconds or seconds) to
get your answer back
•  Real-time (event)
•  Real-time response to events in data streams
•  Example: Billing/Fraud Alerts
•  Near real-time (micro-batch)
•  Near real-time operations on small batches of events in data
streams
•  Example: 1 Minute Metrics
Predictions via Machine Learning
ML gives computers the ability to learn without being explicitly
programmed
Machine Learning Algorithms:
-  Supervised Learning ← “teach” program
-  Classification ← Is this transaction fraud? (Yes/No)
-  Regression ← Customer Life-time value?
-  Unsupervised Learning ← let it learn by itself
-  Clustering ← Market Segmentation
Analysis Tools and Frameworks
Machine Learning
•  Mahout, Spark ML, Amazon ML
Interactive Analytics
•  Amazon Redshift, Presto, Impala, Spark
Batch Processing
•  MapReduce, Hive, Pig, Spark
Stream Processing
•  Micro-batch: Spark Streaming, KCL, Hive, Pig
•  Real-time: Storm, AWS Lambda, KCL
StreamProcessing
Batch
Interactive
ML
Analyze
Stream
Processing
Batch
Processing
Interactive
Analytics
MLAmazon Machine
Learning
Amazon
Redshift
Impala
AmazonElasticMapReduce
Pig
Streaming
Amazon

Kinesis
AWS
Lambda
Spark Streaming Apache Storm Amazon Kinesis
Client Library
AWS Lambda Amazon EMR
(Hive, Pig)
Scale /
Throughput
~ Nodes ~ Nodes ~ Nodes Automatic ~ Nodes
Batch or Real-
time
Real-time Real-time Real-time Real-time Batch
Manageability Yes (Amazon
EMR)
Do it yourself Amazon EC2 +
Auto Scaling
AWS managed Yes (Amazon
EMR)
Fault Tolerance Single AZ Configurable Multi-AZ Multi-AZ Single AZ
Programming
languages
Java, Python,
Scala
Any language
via Thrift
Java, via
MultiLangDaemon
( .Net, Python,
Ruby, Node.js)
Node.js, Java,
Python
Hive, Pig,
Streaming
languages
High
What Stream Processing Technology
Should I Use?
What Interactive/Batch Processing Technology
Should I Use?
Amazon
Redshift
Impala Presto Spark Hive
Query Latency Low Low Low Low Medium (Tez) – High
(MapReduce)
Durability High High High High High
Data Volume 2PB Max ~Nodes ~Nodes ~Nodes ~Nodes
Managed Yes Yes (EMR) Yes (EMR) Yes (EMR) Yes (EMR)
Storage Native HDFS / S3A* HDFS / S3 HDFS / S3 HDFS / S3
SQL
Compatibility
High Medium High Low (SparkSQL) Medium (HQL)
HighMedium
Spark Streaming
Apache Storm
AWS Lambda
KCL
Amazon
Redshift Spark
Impala
Presto
Hive
Amazon
Redshift
Hive
Spark
Presto
Impala
Amazon Kinesis
Apache Kafka
Amazon
DynamoDB
Amazon S3data
Hot Cold
Data Temperature
ProcessingLatency
Low
High Answers
Amazon EMR (HDFS)
Hive
Native
KCL
AWS Lambda
Data Temperature vs Processing Latency
Real-time
Interactive
Batch
Batch
What about ETL?
Store Analyze
https://aws.amazon.com/big-data/partner-solutions/
ETL
Consume /
Visualize
Consume
•  Predictions
•  Analysis and Visualization
•  Notebooks
•  IDE
•  Applications & API
Consume
Analysis&Visualization
Amazon
QuickSight
Notebooks
Predictions
Apps & APIs
IDE
Store Analyze ConsumeETL
Business
users
Data Scientist,
Developers
Putting It All Together
Collect Store Analyze Consume
A
iOS
 Android
Web Apps
Logstash
Amazon
RDS
Amazon
DynamoDB
Amazon
ES
Amazon

S3
Apache
Kafka
Amazon

Glacier
Amazon

Kinesis
Amazon

DynamoDB
Amazon
Redshift
Impala
Pig
Amazon ML
Streaming
Amazon

Kinesis
AWS
Lambda
AmazonElasticMapReduce
Amazon
ElastiCache
SearchSQLNoSQLCache
StreamProcessing
Batch
Interactive
Logging
StreamStorage
IoT
Applications
FileStorage
Analysis&Visualization
Hot
Cold
Warm
Hot
Slow
Hot
ML
Fast
Fast
Amazon
QuickSight
Transactional Data
File Data
Stream Data
Notebooks
Predictions
Apps & APIs
Mobile
Apps
IDE
Search Data
ETL
Reference Architecture
Design Patterns
Multi-Stage Decoupled “Data Bus”
•  Multiple stages
•  Storage decoupled from processing
Store Process Store Process
process
store
Multiple Processing Applications (or
Connectors) Can Read from or Write to the Same
Data Stores
Amazon
Kinesis
AWS
Lambda
Amazon
DynamoDB
Amazon
Kinesis S3
Connector
Amazon S3
process
store
Amazon
Kinesis
AWS
Lambda
Amazon
DynamoDB
Amazon
Kinesis S3
Connector
Amazon S3
process
store
Processing Frameworks (KCL, Storm, Hive,
Spark, etc.) Could Read from Multiple Data
Stores
Storm Hive Spark
Batch Layer
Amazon
Kinesis
data
process
store
Amazon
Kinesis S3
Connector
Amazon S3
A
p
p
l
i
c
a
t
i
o
n
s
Amazon
Redshift
Amazon EMR
Presto
Hive
Pig
Spark
answer
Speed Layer
answer
Serving
Layer
Amazon
ElastiCache
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
answer
Amazon
ML
KCL
AWS Lambda
Spark Streaming
Storm
Lambda
Architecture
Summary
•  Build decoupled “data bus”
•  Data → Store ↔ Process → Answers
•  Use the right tool for the job
•  Latency, throughput, access patterns
•  Use Lambda architecture ideas
•  Immutable (append-only) log, batch/speed/serving layer
•  Leverage AWS managed services
•  No/low admin
•  Be cost conscious
•  Big data ≠ big cost
Ran Tessler	
AWS Solu0ons Architecture Manager	
tesslerr@amazon.com

Más contenido relacionado

La actualidad más candente

A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon RedshiftKel Graham
 
Tackle Your Dark Data Challenge with AWS Glue - AWS Online Tech Talks
Tackle Your Dark Data  Challenge with AWS Glue - AWS Online Tech TalksTackle Your Dark Data  Challenge with AWS Glue - AWS Online Tech Talks
Tackle Your Dark Data Challenge with AWS Glue - AWS Online Tech TalksAmazon Web Services
 
DAT304_Amazon Aurora Performance Optimization with MySQL
DAT304_Amazon Aurora Performance Optimization with MySQLDAT304_Amazon Aurora Performance Optimization with MySQL
DAT304_Amazon Aurora Performance Optimization with MySQLKamal Gupta
 
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...Amazon Web Services
 
SQream DB, GPU-accelerated data warehouse
SQream DB, GPU-accelerated data warehouseSQream DB, GPU-accelerated data warehouse
SQream DB, GPU-accelerated data warehouseNAVER Engineering
 
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech TalksMigrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech TalksAmazon Web Services
 
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...Edureka!
 
AWS RDS Benchmark - Instance comparison
AWS RDS Benchmark - Instance comparisonAWS RDS Benchmark - Instance comparison
AWS RDS Benchmark - Instance comparisonRoberto Gaiser
 
AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!Chris Taylor
 
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...Amazon Web Services
 
Migrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeMigrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeAmazon Web Services
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon RedshiftAmazon Web Services
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark Mostafa
 
Amazon RDS & Amazon Aurora: Relational Databases on AWS - SRV206 - Atlanta AW...
Amazon RDS & Amazon Aurora: Relational Databases on AWS - SRV206 - Atlanta AW...Amazon RDS & Amazon Aurora: Relational Databases on AWS - SRV206 - Atlanta AW...
Amazon RDS & Amazon Aurora: Relational Databases on AWS - SRV206 - Atlanta AW...Amazon Web Services
 
Speed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSSpeed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSData Science Milan
 

La actualidad más candente (20)

A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
Tackle Your Dark Data Challenge with AWS Glue - AWS Online Tech Talks
Tackle Your Dark Data  Challenge with AWS Glue - AWS Online Tech TalksTackle Your Dark Data  Challenge with AWS Glue - AWS Online Tech Talks
Tackle Your Dark Data Challenge with AWS Glue - AWS Online Tech Talks
 
DAT304_Amazon Aurora Performance Optimization with MySQL
DAT304_Amazon Aurora Performance Optimization with MySQLDAT304_Amazon Aurora Performance Optimization with MySQL
DAT304_Amazon Aurora Performance Optimization with MySQL
 
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...
 
SQream DB, GPU-accelerated data warehouse
SQream DB, GPU-accelerated data warehouseSQream DB, GPU-accelerated data warehouse
SQream DB, GPU-accelerated data warehouse
 
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech TalksMigrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
 
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...
Amazon Redshift Tutorial | AWS Tutorial for Beginners | AWS Certification Tra...
 
Amazon Redshift
Amazon Redshift Amazon Redshift
Amazon Redshift
 
AWS RDS Benchmark - Instance comparison
AWS RDS Benchmark - Instance comparisonAWS RDS Benchmark - Instance comparison
AWS RDS Benchmark - Instance comparison
 
AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!
 
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
 
Migrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeMigrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data Lake
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
Amazon Aurora: Under the Hood
Amazon Aurora: Under the HoodAmazon Aurora: Under the Hood
Amazon Aurora: Under the Hood
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
Amazon RDS & Amazon Aurora: Relational Databases on AWS - SRV206 - Atlanta AW...
Amazon RDS & Amazon Aurora: Relational Databases on AWS - SRV206 - Atlanta AW...Amazon RDS & Amazon Aurora: Relational Databases on AWS - SRV206 - Atlanta AW...
Amazon RDS & Amazon Aurora: Relational Databases on AWS - SRV206 - Atlanta AW...
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Speed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSSpeed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWS
 
Building Data Lakes with AWS
Building Data Lakes with AWSBuilding Data Lakes with AWS
Building Data Lakes with AWS
 

Destacado

Introduction to IAM + Best Practices
Introduction to IAM + Best PracticesIntroduction to IAM + Best Practices
Introduction to IAM + Best PracticesAmazon Web Services
 
ENT203 Integrating On-Premise Resources - AWS re: Invent 2012
ENT203 Integrating On-Premise Resources - AWS re: Invent 2012ENT203 Integrating On-Premise Resources - AWS re: Invent 2012
ENT203 Integrating On-Premise Resources - AWS re: Invent 2012Amazon Web Services
 
Building Your Practice on AWS: An APN Breakfast Session
Building Your Practice on AWS: An APN Breakfast SessionBuilding Your Practice on AWS: An APN Breakfast Session
Building Your Practice on AWS: An APN Breakfast SessionAmazon Web Services
 
AWS Mobile Hub + AWS Device Farm
AWS Mobile Hub + AWS Device FarmAWS Mobile Hub + AWS Device Farm
AWS Mobile Hub + AWS Device FarmAmazon Web Services
 
Develping mobile services on aws - Pop-up Loft Tel Aviv
Develping mobile services on aws - Pop-up Loft Tel AvivDevelping mobile services on aws - Pop-up Loft Tel Aviv
Develping mobile services on aws - Pop-up Loft Tel AvivAmazon Web Services
 
DevOps as a Pathway to AWS | AWS Public Sector Summit 2016
DevOps as a Pathway to AWS | AWS Public Sector Summit 2016DevOps as a Pathway to AWS | AWS Public Sector Summit 2016
DevOps as a Pathway to AWS | AWS Public Sector Summit 2016Amazon Web Services
 
#EarthOnAWS: How the Cloud Is Transforming Earth Observation | AWS Public Sec...
#EarthOnAWS: How the Cloud Is Transforming Earth Observation | AWS Public Sec...#EarthOnAWS: How the Cloud Is Transforming Earth Observation | AWS Public Sec...
#EarthOnAWS: How the Cloud Is Transforming Earth Observation | AWS Public Sec...Amazon Web Services
 
Amazon S3 - Masterclass - Pop-up Loft Tel Aviv
Amazon S3 - Masterclass - Pop-up Loft Tel AvivAmazon S3 - Masterclass - Pop-up Loft Tel Aviv
Amazon S3 - Masterclass - Pop-up Loft Tel AvivAmazon Web Services
 
Application Delivery Patterns for Developers - Technical 401
Application Delivery Patterns for Developers - Technical 401Application Delivery Patterns for Developers - Technical 401
Application Delivery Patterns for Developers - Technical 401Amazon Web Services
 
Account Separation and Mandatory Access Control on AWS | Security Roadshow Du...
Account Separation and Mandatory Access Control on AWS | Security Roadshow Du...Account Separation and Mandatory Access Control on AWS | Security Roadshow Du...
Account Separation and Mandatory Access Control on AWS | Security Roadshow Du...Amazon Web Services
 
Faster Time to Science - Scaling BioMedical Research in the Cloud with SciOps...
Faster Time to Science - Scaling BioMedical Research in the Cloud with SciOps...Faster Time to Science - Scaling BioMedical Research in the Cloud with SciOps...
Faster Time to Science - Scaling BioMedical Research in the Cloud with SciOps...Amazon Web Services
 
Cloud First: New Architecture for New Infrastructure
Cloud First: New Architecture for New InfrastructureCloud First: New Architecture for New Infrastructure
Cloud First: New Architecture for New InfrastructureAmazon Web Services
 
Scaling by Design: AWS Web Services Patterns
Scaling by Design:AWS Web Services PatternsScaling by Design:AWS Web Services Patterns
Scaling by Design: AWS Web Services PatternsAmazon Web Services
 
Amazon Simple Work Flow Engine (SWF): How Beamr uses SWF for video optimizati...
Amazon Simple Work Flow Engine (SWF): How Beamr uses SWF for video optimizati...Amazon Simple Work Flow Engine (SWF): How Beamr uses SWF for video optimizati...
Amazon Simple Work Flow Engine (SWF): How Beamr uses SWF for video optimizati...Amazon Web Services
 
Another Day, Another Billion Packets
Another Day, Another Billion PacketsAnother Day, Another Billion Packets
Another Day, Another Billion PacketsAmazon Web Services
 

Destacado (20)

Introduction to IAM + Best Practices
Introduction to IAM + Best PracticesIntroduction to IAM + Best Practices
Introduction to IAM + Best Practices
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Ram Disk
Ram DiskRam Disk
Ram Disk
 
ENT203 Integrating On-Premise Resources - AWS re: Invent 2012
ENT203 Integrating On-Premise Resources - AWS re: Invent 2012ENT203 Integrating On-Premise Resources - AWS re: Invent 2012
ENT203 Integrating On-Premise Resources - AWS re: Invent 2012
 
Amazon EC2
Amazon EC2Amazon EC2
Amazon EC2
 
Building Your Practice on AWS: An APN Breakfast Session
Building Your Practice on AWS: An APN Breakfast SessionBuilding Your Practice on AWS: An APN Breakfast Session
Building Your Practice on AWS: An APN Breakfast Session
 
AWS Mobile Hub + AWS Device Farm
AWS Mobile Hub + AWS Device FarmAWS Mobile Hub + AWS Device Farm
AWS Mobile Hub + AWS Device Farm
 
Develping mobile services on aws - Pop-up Loft Tel Aviv
Develping mobile services on aws - Pop-up Loft Tel AvivDevelping mobile services on aws - Pop-up Loft Tel Aviv
Develping mobile services on aws - Pop-up Loft Tel Aviv
 
DevOps as a Pathway to AWS | AWS Public Sector Summit 2016
DevOps as a Pathway to AWS | AWS Public Sector Summit 2016DevOps as a Pathway to AWS | AWS Public Sector Summit 2016
DevOps as a Pathway to AWS | AWS Public Sector Summit 2016
 
#EarthOnAWS: How the Cloud Is Transforming Earth Observation | AWS Public Sec...
#EarthOnAWS: How the Cloud Is Transforming Earth Observation | AWS Public Sec...#EarthOnAWS: How the Cloud Is Transforming Earth Observation | AWS Public Sec...
#EarthOnAWS: How the Cloud Is Transforming Earth Observation | AWS Public Sec...
 
Amazon S3 - Masterclass - Pop-up Loft Tel Aviv
Amazon S3 - Masterclass - Pop-up Loft Tel AvivAmazon S3 - Masterclass - Pop-up Loft Tel Aviv
Amazon S3 - Masterclass - Pop-up Loft Tel Aviv
 
Keynote - Currency fair
Keynote - Currency fairKeynote - Currency fair
Keynote - Currency fair
 
Application Delivery Patterns for Developers - Technical 401
Application Delivery Patterns for Developers - Technical 401Application Delivery Patterns for Developers - Technical 401
Application Delivery Patterns for Developers - Technical 401
 
Account Separation and Mandatory Access Control on AWS | Security Roadshow Du...
Account Separation and Mandatory Access Control on AWS | Security Roadshow Du...Account Separation and Mandatory Access Control on AWS | Security Roadshow Du...
Account Separation and Mandatory Access Control on AWS | Security Roadshow Du...
 
Workshop: We love APIs
Workshop: We love APIsWorkshop: We love APIs
Workshop: We love APIs
 
Faster Time to Science - Scaling BioMedical Research in the Cloud with SciOps...
Faster Time to Science - Scaling BioMedical Research in the Cloud with SciOps...Faster Time to Science - Scaling BioMedical Research in the Cloud with SciOps...
Faster Time to Science - Scaling BioMedical Research in the Cloud with SciOps...
 
Cloud First: New Architecture for New Infrastructure
Cloud First: New Architecture for New InfrastructureCloud First: New Architecture for New Infrastructure
Cloud First: New Architecture for New Infrastructure
 
Scaling by Design: AWS Web Services Patterns
Scaling by Design:AWS Web Services PatternsScaling by Design:AWS Web Services Patterns
Scaling by Design: AWS Web Services Patterns
 
Amazon Simple Work Flow Engine (SWF): How Beamr uses SWF for video optimizati...
Amazon Simple Work Flow Engine (SWF): How Beamr uses SWF for video optimizati...Amazon Simple Work Flow Engine (SWF): How Beamr uses SWF for video optimizati...
Amazon Simple Work Flow Engine (SWF): How Beamr uses SWF for video optimizati...
 
Another Day, Another Billion Packets
Another Day, Another Billion PacketsAnother Day, Another Billion Packets
Another Day, Another Billion Packets
 

Similar a Big Data Architectural Patterns and Best Practices on AWS

AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...Amazon Web Services
 
February 2016 Webinar Series - Architectural Patterns for Big Data on AWS
February 2016 Webinar Series - Architectural Patterns for Big Data on AWSFebruary 2016 Webinar Series - Architectural Patterns for Big Data on AWS
February 2016 Webinar Series - Architectural Patterns for Big Data on AWSAmazon Web Services
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAmazon Web Services
 
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...Amazon Web Services
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...Amazon Web Services
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Amazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesAmazon Web Services
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesAmazon Web Services
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...Amazon Web Services
 
Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rasmus Ekman
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSAmazon Web Services
 
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB DayChoosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB DayAmazon Web Services Korea
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
 

Similar a Big Data Architectural Patterns and Best Practices on AWS (20)

AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
 
February 2016 Webinar Series - Architectural Patterns for Big Data on AWS
February 2016 Webinar Series - Architectural Patterns for Big Data on AWSFebruary 2016 Webinar Series - Architectural Patterns for Big Data on AWS
February 2016 Webinar Series - Architectural Patterns for Big Data on AWS
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
 
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Deep Dive in Big Data
Deep Dive in Big DataDeep Dive in Big Data
Deep Dive in Big Data
 
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
 
Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB DayChoosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Último

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 

Último (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 

Big Data Architectural Patterns and Best Practices on AWS

  • 2. What to Expect from the Session Big data challenges How to simplify big data processing What technologies should you use? •  Why? •  How? Reference architecture Design patterns
  • 3. Ever Increasing Big Data Volume Velocity Variety
  • 5. Plethora of Tools Amazon Glacier S3 DynamoDB RDS EMR Amazon Redshift Amazon Kinesis CloudSearch Kinesis- enabled app Lambda Amazon ML SQS ElastiCache Data Pipeline DynamoDB Streams
  • 6. Is there a reference architecture? What tools should I use? How? Why?
  • 7. Architectural Principles •  Decoupled “data bus” •  Data → Store → Process → Answers •  Use the right tool for the job •  Data structure, latency, throughput, access patterns •  Use Lambda architecture ideas •  Immutable (append-only) log, batch/speed/serving layer •  Leverage AWS managed services •  No/low admin •  Big data ≠ big cost
  • 8. Simplify Big Data Processing ingest / collect store process / analyze consume / visualize Time to Answer (Latency) Throughput Cost
  • 10. Types of Data •  Transactional •  Database reads & writes (OLTP) •  Cache •  Search •  Logs •  Streams •  File •  Log files (/var/log) •  Log collectors & frameworks •  Stream •  Log records •  Sensors & IoT data Database File Storage Stream Storage A iOS Android Web Apps Logstash Logging IoT Applications Transactional Data File Data Stream Data Mobile Apps Search Data Search Collect Store Logging IoT
  • 11. Store
  • 13. Stream Storage Options •  AWS managed services •  Amazon Kinesis → streams •  DynamoDB Streams → table + streams •  Amazon SQS → queue •  Amazon SNS → pub/sub •  Unmanaged •  Apache Kafka → stream
  • 14. Why Stream Storage? •  Decouple producers & consumers •  Persistent buffer •  Collect multiple streams •  Preserve client ordering •  Streaming MapReduce •  Parallel consumption 4 4 3 3 2 2 1 1 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 4 4 3 3 2 2 1 1 Shard 1 / Partition 1 Shard 2 / Partition 2 Consumer 1 Count of Red = 4 Count of Violet = 4 Consumer 2 Count of Blue = 4 Count of Green = 4 DynamoDB Stream Kinesis Stream Kafka Topic
  • 15. What About Queues & Pub/Sub ? •  Decouple producers & consumers/subscribers •  Persistent buffer •  Collect multiple streams •  No client ordering •  No parallel consumption for Amazon SQS •  Amazon SNS can route to multiple queues or ʎ functions •  No streaming MapReduce Consumers Producers Producers Amazon SNS Amazon SQS queue topic function ʎ AWS Lambda Amazon SQS queue Subscriber
  • 16. Which stream storage should I use? Amazon Kinesis DynamoDB Streams Amazon SQS Amazon SNS Kafka Managed Yes Yes Yes No Ordering Yes Yes No Yes Delivery at-least-once exactly-once at-least-once at-least-once Lifetime 7 days 24 hours 14 days Configurable Replication 3 AZ 3 AZ 3 AZ Configurable Throughput No Limit No Limit No Limit ~ Nodes Parallel Clients Yes Yes No (SQS) Yes MapReduce Yes Yes No Yes Record size 1MB 400KB 256KB Configurable Cost Low Higher(table cost) Low-Medium Low (+admin)
  • 18. Why Is Amazon S3 Good for Big Data? •  Natively supported by big data frameworks (Spark, Hive, Presto, etc.) •  No need to run compute clusters for storage (unlike HDFS) •  Can run transient Hadoop clusters & Amazon EC2 Spot instances •  Multiple distinct (Spark, Hive, Presto) clusters can use the same data •  Unlimited number of objects •  Very high bandwidth – no aggregate throughput limit •  Highly available – can tolerate AZ failure •  Designed for 99.999999999% durability •  Tired-storage (Standard, IA, Amazon Glacier) via life-cycle policy •  Secure – SSL, client/server-side encryption at rest •  Low cost
  • 19. What about HDFS & Amazon Glacier? •  Use HDFS for very frequently accessed (hot) data •  Use Amazon S3 Standard for frequently accessed data •  Use Amazon S3 Standard – IA for infrequently accessed data •  Use Amazon Glacier for archiving cold data
  • 20. Database + Search Tier A iOS Android Web Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon
 S3 Apache Kafka Amazon
 Glacier Amazon
 Kinesis Amazon
 DynamoDB Amazon ElastiCache SearchSQLNoSQLCache StreamStorage FileStorage Transactional Data File Data Stream Data Mobile Apps Search Data Collect Store ü 
  • 21. Database + Search Tier Anti-pattern RDBMS Database + Search Tier Applications
  • 22. Best Practice — Use the Right Tool for the Job Data Tier Search Amazon Elasticsearch Service Amazon CloudSearch Cache Redis Memcached SQL Amazon Aurora MySQL PostgreSQL Oracle SQL Server MariaDB NoSQL Cassandra Amazon DynamoDB HBase MongoDB Applications Database + Search Tier
  • 24. What Data Store Should I Use? •  Data structure → Fixed schema, JSON, key-value •  Access patterns → Store data in the format you will access it •  Data / access characteristics → Hot, warm, cold •  Cost → Right cost
  • 25. Data Structure and Access Patterns Access Patterns What to use? Put/Get (Key, Value) Cache, NoSQL Simple relationships → 1:N, M:N NoSQL Cross table joins, transaction, SQL SQL Faceting, Search Search Data Structure What to use? Fixed schema NoSQL, SQL Schema-free (JSON) NoSQL, Search (Key, Value) NoSQL, Cache
  • 26. What Is the Temperature of Your Data / Access ?
  • 27. Data / Access Characteristics: Hot, Warm, Cold Hot Warm Cold Volume MB–GB GB–TB PB Item size B–KB KB–MB KB–TB Latency ms ms, sec min, hrs Durability Low–High High Very High Request rate Very High High Low Cost/GB $$-$ $-¢¢ ¢ Hot Data Warm Data Cold Data
  • 28. Cache SQL Request Rate High Low Cost/GB High Low Latency Low High Data Volume Low High GlacierStructure NoSQL Hot Data Warm Data Cold Data Low High S3 Search HDFS
  • 29. What Data Store Should I Use? Amazon ElastiCache Amazon DynamoDB Amazon Aurora Amazon Elasticsearch Amazon EMR (HDFS) Amazon S3 Amazon Glacier Average latency ms ms ms, sec ms,sec sec,min,hrs ms,sec,min (~ size) hrs Data volume GB GB–TBs (no limit) GB–TB (64 TB Max) GB–TB GB–PB (~nodes) MB–PB (no limit) GB–PB (no limit) Item size B-KB KB (400 KB max) KB (64 KB) KB (1 MB max) MB-GB KB-GB (5 TB max) GB (40 TB max) Request rate High - Very High Very High (no limit) High High Low – Very High Low – Very High (no limit) Very Low Storage cost GB/month $$ ¢¢ ¢¢ ¢¢ ¢ ¢ ¢/10 Durability Low - Moderate Very High Very High High High Very High Very High Hot Data Warm Data Cold Data Hot Data Warm Data Cold Data
  • 30. Example: Should I use Amazon S3 or Amazon DynamoDB? “I’m currently scoping out a project that will greatly increase my team’s use of Amazon S3. Hoping you could answer some questions. The current iteration of the design calls for many small files, perhaps up to a billion during peak. The total size would be on the order of 1.5 TB per month…” Cost Conscious Design
  • 31. https://calculator.s3.amazonaws.com/index.html Example: Should I use Amazon S3 or Amazon DynamoDB? Cost Conscious Design Request rate (Writes/sec) Object size (Bytes) Total size (GB/month) Objects per month 300 2048 1483 777,600,000
  • 32. Request rate (Writes/sec) Object size (Bytes) Total size (GB/month) Objects per month 300 2,048 1,483 777,600,000 Amazon S3 or Amazon DynamoDB?
  • 33. Request rate (Writes/sec) Object size (Bytes) Total size (GB/month) Objects per month Scenario 1300 2,048 1,483 777,600,000 Scenario 2300 32,768 23,730 777,600,000 Amazon S3 Amazon DynamoDB use use
  • 35. Process / Analyze Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making. Examples •  Interactive dashboards → Interactive analytics •  Daily/weekly/monthly reports → Batch analytics •  Billing/fraud alerts, 1 minute metrics → Real-time analytics •  Sentiment analysis, prediction models → Machine learning
  • 36. Interactive Analytics Takes large amount of (warm/cold) data Takes seconds to get answers back Example: Self-service dashboards
  • 37. Batch Analytics Takes large amount of (warm/cold) data Takes minutes or hours to get answers back Example: Generating daily, weekly, or monthly reports
  • 38. Real-Time Analytics Take small amount of hot data and ask questions Takes short amount of time (milliseconds or seconds) to get your answer back •  Real-time (event) •  Real-time response to events in data streams •  Example: Billing/Fraud Alerts •  Near real-time (micro-batch) •  Near real-time operations on small batches of events in data streams •  Example: 1 Minute Metrics
  • 39. Predictions via Machine Learning ML gives computers the ability to learn without being explicitly programmed Machine Learning Algorithms: -  Supervised Learning ← “teach” program -  Classification ← Is this transaction fraud? (Yes/No) -  Regression ← Customer Life-time value? -  Unsupervised Learning ← let it learn by itself -  Clustering ← Market Segmentation
  • 40. Analysis Tools and Frameworks Machine Learning •  Mahout, Spark ML, Amazon ML Interactive Analytics •  Amazon Redshift, Presto, Impala, Spark Batch Processing •  MapReduce, Hive, Pig, Spark Stream Processing •  Micro-batch: Spark Streaming, KCL, Hive, Pig •  Real-time: Storm, AWS Lambda, KCL StreamProcessing Batch Interactive ML Analyze Stream Processing Batch Processing Interactive Analytics MLAmazon Machine Learning Amazon Redshift Impala AmazonElasticMapReduce Pig Streaming Amazon
 Kinesis AWS Lambda
  • 41. Spark Streaming Apache Storm Amazon Kinesis Client Library AWS Lambda Amazon EMR (Hive, Pig) Scale / Throughput ~ Nodes ~ Nodes ~ Nodes Automatic ~ Nodes Batch or Real- time Real-time Real-time Real-time Real-time Batch Manageability Yes (Amazon EMR) Do it yourself Amazon EC2 + Auto Scaling AWS managed Yes (Amazon EMR) Fault Tolerance Single AZ Configurable Multi-AZ Multi-AZ Single AZ Programming languages Java, Python, Scala Any language via Thrift Java, via MultiLangDaemon ( .Net, Python, Ruby, Node.js) Node.js, Java, Python Hive, Pig, Streaming languages High What Stream Processing Technology Should I Use?
  • 42. What Interactive/Batch Processing Technology Should I Use? Amazon Redshift Impala Presto Spark Hive Query Latency Low Low Low Low Medium (Tez) – High (MapReduce) Durability High High High High High Data Volume 2PB Max ~Nodes ~Nodes ~Nodes ~Nodes Managed Yes Yes (EMR) Yes (EMR) Yes (EMR) Yes (EMR) Storage Native HDFS / S3A* HDFS / S3 HDFS / S3 HDFS / S3 SQL Compatibility High Medium High Low (SparkSQL) Medium (HQL) HighMedium
  • 43. Spark Streaming Apache Storm AWS Lambda KCL Amazon Redshift Spark Impala Presto Hive Amazon Redshift Hive Spark Presto Impala Amazon Kinesis Apache Kafka Amazon DynamoDB Amazon S3data Hot Cold Data Temperature ProcessingLatency Low High Answers Amazon EMR (HDFS) Hive Native KCL AWS Lambda Data Temperature vs Processing Latency Real-time Interactive Batch Batch
  • 44. What about ETL? Store Analyze https://aws.amazon.com/big-data/partner-solutions/ ETL
  • 46. Consume •  Predictions •  Analysis and Visualization •  Notebooks •  IDE •  Applications & API Consume Analysis&Visualization Amazon QuickSight Notebooks Predictions Apps & APIs IDE Store Analyze ConsumeETL Business users Data Scientist, Developers
  • 47. Putting It All Together
  • 48. Collect Store Analyze Consume A iOS Android Web Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon
 S3 Apache Kafka Amazon
 Glacier Amazon
 Kinesis Amazon
 DynamoDB Amazon Redshift Impala Pig Amazon ML Streaming Amazon
 Kinesis AWS Lambda AmazonElasticMapReduce Amazon ElastiCache SearchSQLNoSQLCache StreamProcessing Batch Interactive Logging StreamStorage IoT Applications FileStorage Analysis&Visualization Hot Cold Warm Hot Slow Hot ML Fast Fast Amazon QuickSight Transactional Data File Data Stream Data Notebooks Predictions Apps & APIs Mobile Apps IDE Search Data ETL Reference Architecture
  • 50. Multi-Stage Decoupled “Data Bus” •  Multiple stages •  Storage decoupled from processing Store Process Store Process process store
  • 51. Multiple Processing Applications (or Connectors) Can Read from or Write to the Same Data Stores Amazon Kinesis AWS Lambda Amazon DynamoDB Amazon Kinesis S3 Connector Amazon S3 process store
  • 52. Amazon Kinesis AWS Lambda Amazon DynamoDB Amazon Kinesis S3 Connector Amazon S3 process store Processing Frameworks (KCL, Storm, Hive, Spark, etc.) Could Read from Multiple Data Stores Storm Hive Spark
  • 53. Batch Layer Amazon Kinesis data process store Amazon Kinesis S3 Connector Amazon S3 A p p l i c a t i o n s Amazon Redshift Amazon EMR Presto Hive Pig Spark answer Speed Layer answer Serving Layer Amazon ElastiCache Amazon DynamoDB Amazon RDS Amazon ES answer Amazon ML KCL AWS Lambda Spark Streaming Storm Lambda Architecture
  • 54. Summary •  Build decoupled “data bus” •  Data → Store ↔ Process → Answers •  Use the right tool for the job •  Latency, throughput, access patterns •  Use Lambda architecture ideas •  Immutable (append-only) log, batch/speed/serving layer •  Leverage AWS managed services •  No/low admin •  Be cost conscious •  Big data ≠ big cost
  • 55. Ran Tessler AWS Solu0ons Architecture Manager tesslerr@amazon.com