SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
DAT306 - How Amazon.com, with One of the
World’s Largest Data Warehouses, is Leveraging
Amazon Redshift
Erik Selberg (selberg@amazon.com) and
Abhishek Agrawal (abhagrwa@amazon.com)
November 14, 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Agenda
• Amazon Data Warehouse Overview
• Amazon Data Warehouse and Amazon Redshift
Integration Project
• Amazon Redshift Best Practices
• Conclusion
Amazon Data Warehouse
Overview
Erik Selberg <selberg@amazon.com>
Amazon Data Warehouse
• Authoritative repository of data for all Amazon
• Petabytes of data

• Existing EDW is Oracle RAC; also using Amazon Elastic MapReduce
and now Amazon Redshift
• Owns managing the hardware and software infrastructure
–

Apart from Oracle DB, just Amazon IP

• Not part of AWS
Introducing the Elephant…
• Mission: Provide customers the
best value
–
–

Leverage AWS only if it provides the
best value
We aren’t moving 100% to Amazon
Redshift

• Publish best practices
–

If AWS isn’t the best, we’ll say so

• There is a conflict of interest
Control Plane (ETL
Manager)

Existing
EDW

Amazon
EMR

Amazon
Redshift

Amazon Data Warehouse Architecture
Amazon Data Warehouse – Growth Story
• Petabytes of data
• Growth of data volume – YoY storage
requirements have grown 67%
• Growth of processing volume – YoY
processing demand has grown 47%
Long-Term Sustainable Scale
$$ Wasted

Demand
SAN-based

Redshift
Coping with Change
Growth
changes
Demand
SAN

Capacity
Unmet

Redshift
Amazon Data Warehouse – Cost per Job
• Our main efficiency metric – Cost per Job (CPJ)

$CapEx $DataCenter $VendorSup port
PeakJobsPe rDay
What Drives Cost per Job…
Up?

Down?

•

Number of disks
– Data gets bigger!

•

Bidding
– 2+ vendors

•

Number of servers

•

Moore’s Law
– Vendors fight this!

•

Short-sighted negotiations
– 4th year support…

•

Data design

Data Center costs (power, rent)

•

Software (e.g. DBM)

•
Current State and Problems
• Existing EDW
– Multiple multi-petabyte clusters (redundancy and jobs)
– Why not <x>? CPJ not lower

• Data stored in SANs (not Exadata)
• Performs poorly on scans of 10T+
• Long procurement cycles (3 month minimum)
Amazon Data Warehouse and Amazon Redshift
Integration Project
• Spent 2013 evaluating Amazon Redshift for Amazon data
warehouse
– Where does Amazon Redshift provide a better CPJ?
– Can Amazon Redshift solve some pain (without introducing new pain)?

• Picked 10K jobs and 275 tables to copy
Current State of Affairs
• Biggest cluster size: 20+1 8XL
• Peak daily jobs: 7211 (using all 4 clusters)
• 4159 extracts
• 3052 loads
Some Results
• Benchmarking for 4159 jobs
– Outperforming 2719
– Underperforming 1440
– Avg. runtime
• 4:43 mins in Amazon Redshift
• 17:38 mins in existing EDW

• LOADS are slower
• EXTRACTS are faster

Job Type

RS Performance
Category

Job Count by
Category

EXTRACT
EXTRACT
EXTRACT
EXTRACT
EXTRACT
EXTRACT
LOAD
LOAD
LOAD
LOAD
LOAD
LOAD

10X Faster
5X Faster
3X Faster
2X Faster
1X or same
2X Slower
10X Faster
5X Faster
3X Faster
2X Faster
1X or same
2X Slower

945
487
393
301
480
1150
7
15
23
23
45
290
Amazon Redshift Best Practices
Abhishek Agrawal <abhagrwa@amazon.com>
Amazon Redshift Integration Best Practices
•

Integrating via Amazon S3 (Manifests)

•

Primary key enforcement

•

Idempotent loads
–
–

MERGE via INSERT/UPDATE
Mimic Trunc-Load [Backfills]

•

Trunc-partition using sort keys

•

Administration automation

•

Ensuring data correctness
Integrating via Amazon S3
• S3 in the US Standard Region is eventually consistent!
• S3 LIST might not give the entire list of data right after
you save it (this WILL eventually happen to you!)
• Amazon Redshift loads everything it sees in a bucket
– You may see all data files, Amazon Redshift may not, which can cause
missing data
Best Practices – Using Amazon S3
• Read/COPY
–
–

System table validation – STL_LOAD_ERRORS,
Verify files loaded are ‘intended’ files

• Write/ UNLOAD
–
–

System table validation – STL_UNLOAD_LOG
Verify all files that has the data are on S3

• Manifests
–
–
–

Metadata to know what to exactly to read from S3
Provides authoritative reference to data
Powerful in terms of user metadata format, encryption, etc.
Primary Key Enforcement
• Amazon Redshift does not enforce primary key
– You will need to do this to ensure data quality

• Best practice
– Introduce temp table to check duplicates in incoming data
– Validate against incoming data to catch offenders
– Put the data in target table and validate target data in the same
transaction before commit

• Yes, this IS a lot of overhead
Idempotent Loads
• Idempotent Loads – doing a load 2+ times the same as
doing 1 load
– Needed to manage load failures

• MERGE – leverages primary key, row at a time

• TRUNC / INSERT – load a partition at a time
MERGE
• No native Amazon Redshift MERGE support
• Merge is implemented as a multi-step process
–
–
–
–

Load the data in temp table
Figure out inserts and load
Figure out updates and modify target table
Validation for duplicates
TRUNC - INSERT
• Solution
– Distribute randomly
– Use sort keys to align data (mimics partition)
– Selectively delete and insert

• Issues
– Inserts are in an “unsorted” bucket – performance degrades without
periodic VACUUM
– Very slow (effectively row at a time)
Other Temp Table Uses
• Partial column data load
• Filtered data load
• Column transformations
Automating Administration
• Stored procs / Oracle workflow used to do
admin task like retention, stats, etc.
• Solution
– We introduced a software layer to prepare the administrative
task statements based on defined inputs
– Execute using JDBC connection
– Can schedule work like stats collection, vacuum, etc.
2013 Results
• CPJ is 55% less on Amazon Redshift in general
–
–
–
–

We can’t share the math, sorry YMMV
Between Redshift and Amazon data warehouse, known improvements get us to ~66%
Big wins are in big queries
Loads are slow and expensive

• Moved ~10K jobs to ~60 8XLs (4 clusters)

• We could move at most 45% of our work to Amazon Redshift with
minimal changes
2014 Plan
• Focus on big tables (100T+)
– Need to solve data expiry and backfill challenges

• Solve problems with CPU bound
• Interactive analytics (third-party vendor apps
with Amazon Redshift + Oracle)
Please give us your feedback on this
presentation

DAT306
As a thank you, we will select prize
winners daily for completed surveys!

Más contenido relacionado

La actualidad más candente

Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Deep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceDeep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceAmazon Web Services
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSightAmazon Web Services
 
Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData FlyData Inc.
 
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & DataductAmazon Web Services
 
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon RedshiftAmazon Web Services
 
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon RedshiftAmazon Web Services
 
Getting Maximum Performance from Amazon Redshift: Complex Queries
Getting Maximum Performance from Amazon Redshift: Complex QueriesGetting Maximum Performance from Amazon Redshift: Complex Queries
Getting Maximum Performance from Amazon Redshift: Complex Queriestimonk
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftAmazon Web Services
 
AWS Webinar - Dynamo DB + Redshift 13_09_19
AWS Webinar - Dynamo DB + Redshift 13_09_19AWS Webinar - Dynamo DB + Redshift 13_09_19
AWS Webinar - Dynamo DB + Redshift 13_09_19Amazon Web Services
 
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced AnalyticsAWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced AnalyticsAmazon Web Services
 
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013Data Replication Options in AWS (ARC302) | AWS re:Invent 2013
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013Amazon Web Services
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...Amazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon Web Services
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - DatalakeLam Le
 
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesGetting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesAmazon Web Services
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)Amazon Web Services
 

La actualidad más candente (20)

Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Deep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceDeep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performance
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSight
 
Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData
 
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
 
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
 
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
(DAT308) Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Getting Maximum Performance from Amazon Redshift: Complex Queries
Getting Maximum Performance from Amazon Redshift: Complex QueriesGetting Maximum Performance from Amazon Redshift: Complex Queries
Getting Maximum Performance from Amazon Redshift: Complex Queries
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
AWS Webinar - Dynamo DB + Redshift 13_09_19
AWS Webinar - Dynamo DB + Redshift 13_09_19AWS Webinar - Dynamo DB + Redshift 13_09_19
AWS Webinar - Dynamo DB + Redshift 13_09_19
 
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced AnalyticsAWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013Data Replication Options in AWS (ARC302) | AWS re:Invent 2013
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesGetting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
 

Destacado

Secure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by IntelSecure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by IntelAmazon Web Services
 
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...Amazon Web Services
 
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...Amazon Web Services
 
Media Success Stories from the Cloud
Media Success Stories from the CloudMedia Success Stories from the Cloud
Media Success Stories from the CloudAmazon Web Services
 
Deep Dive: Amazon Virtual Private Cloud
Deep Dive: Amazon Virtual Private CloudDeep Dive: Amazon Virtual Private Cloud
Deep Dive: Amazon Virtual Private CloudAmazon Web Services
 
AWS Webcast - Using the AWS Cloud for Disaster recovery_Public Sector
AWS Webcast - Using the AWS Cloud for Disaster recovery_Public SectorAWS Webcast - Using the AWS Cloud for Disaster recovery_Public Sector
AWS Webcast - Using the AWS Cloud for Disaster recovery_Public SectorAmazon Web Services
 
DAT203 Optimizing Your MongoDB Database on AWS - AWS re: Invent 2012
DAT203 Optimizing Your MongoDB Database on AWS - AWS re: Invent 2012DAT203 Optimizing Your MongoDB Database on AWS - AWS re: Invent 2012
DAT203 Optimizing Your MongoDB Database on AWS - AWS re: Invent 2012Amazon Web Services
 
AWS Enterprise Summit Manila Windows .net
AWS Enterprise Summit Manila Windows .netAWS Enterprise Summit Manila Windows .net
AWS Enterprise Summit Manila Windows .netAmazon Web Services
 
Best practices for content delivery using amazon cloud front
Best practices for content delivery using amazon cloud frontBest practices for content delivery using amazon cloud front
Best practices for content delivery using amazon cloud frontAmazon Web Services
 
AWS Enterprise Day | Big Data Analytics
AWS Enterprise Day | Big Data AnalyticsAWS Enterprise Day | Big Data Analytics
AWS Enterprise Day | Big Data AnalyticsAmazon Web Services
 
The 2014 AWS Enterprise Summit Keynote
The 2014 AWS Enterprise Summit Keynote The 2014 AWS Enterprise Summit Keynote
The 2014 AWS Enterprise Summit Keynote Amazon Web Services
 
AWS Summit Auckland 2014 | Scaling on AWS for the First 10 Million Users
 AWS Summit Auckland 2014 | Scaling on AWS for the First 10 Million Users AWS Summit Auckland 2014 | Scaling on AWS for the First 10 Million Users
AWS Summit Auckland 2014 | Scaling on AWS for the First 10 Million UsersAmazon Web Services
 
AWS for Start-ups - Case Study - Go Squared
AWS for Start-ups - Case Study - Go SquaredAWS for Start-ups - Case Study - Go Squared
AWS for Start-ups - Case Study - Go SquaredAmazon Web Services
 
AWS Customer Presentation: Centrastage - AWS Summit 2012 - London Customer Ta...
AWS Customer Presentation: Centrastage - AWS Summit 2012 - London Customer Ta...AWS Customer Presentation: Centrastage - AWS Summit 2012 - London Customer Ta...
AWS Customer Presentation: Centrastage - AWS Summit 2012 - London Customer Ta...Amazon Web Services
 
REA Sydney Customer Appreciation Day
REA Sydney Customer Appreciation DayREA Sydney Customer Appreciation Day
REA Sydney Customer Appreciation DayAmazon Web Services
 

Destacado (20)

Scalability and Availability
Scalability and AvailabilityScalability and Availability
Scalability and Availability
 
Application Portfolio Migration
Application Portfolio MigrationApplication Portfolio Migration
Application Portfolio Migration
 
Secure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by IntelSecure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by Intel
 
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...
AWS Summit 2013 | Singapore - Delivering Search for Today's Local, Social, an...
 
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
 
Media Success Stories from the Cloud
Media Success Stories from the CloudMedia Success Stories from the Cloud
Media Success Stories from the Cloud
 
Masterclass Live: Amazon EC2
Masterclass Live: Amazon EC2 Masterclass Live: Amazon EC2
Masterclass Live: Amazon EC2
 
Deep Dive: Amazon Virtual Private Cloud
Deep Dive: Amazon Virtual Private CloudDeep Dive: Amazon Virtual Private Cloud
Deep Dive: Amazon Virtual Private Cloud
 
AWS Webcast - Using the AWS Cloud for Disaster recovery_Public Sector
AWS Webcast - Using the AWS Cloud for Disaster recovery_Public SectorAWS Webcast - Using the AWS Cloud for Disaster recovery_Public Sector
AWS Webcast - Using the AWS Cloud for Disaster recovery_Public Sector
 
DAT203 Optimizing Your MongoDB Database on AWS - AWS re: Invent 2012
DAT203 Optimizing Your MongoDB Database on AWS - AWS re: Invent 2012DAT203 Optimizing Your MongoDB Database on AWS - AWS re: Invent 2012
DAT203 Optimizing Your MongoDB Database on AWS - AWS re: Invent 2012
 
AWS Enterprise Summit Manila Windows .net
AWS Enterprise Summit Manila Windows .netAWS Enterprise Summit Manila Windows .net
AWS Enterprise Summit Manila Windows .net
 
Best practices for content delivery using amazon cloud front
Best practices for content delivery using amazon cloud frontBest practices for content delivery using amazon cloud front
Best practices for content delivery using amazon cloud front
 
AWS Enterprise Day | Big Data Analytics
AWS Enterprise Day | Big Data AnalyticsAWS Enterprise Day | Big Data Analytics
AWS Enterprise Day | Big Data Analytics
 
The 2014 AWS Enterprise Summit Keynote
The 2014 AWS Enterprise Summit Keynote The 2014 AWS Enterprise Summit Keynote
The 2014 AWS Enterprise Summit Keynote
 
AWS Summit Auckland 2014 | Scaling on AWS for the First 10 Million Users
 AWS Summit Auckland 2014 | Scaling on AWS for the First 10 Million Users AWS Summit Auckland 2014 | Scaling on AWS for the First 10 Million Users
AWS Summit Auckland 2014 | Scaling on AWS for the First 10 Million Users
 
Workshop part3 – IOT
Workshop part3 – IOTWorkshop part3 – IOT
Workshop part3 – IOT
 
AWS for Start-ups - Case Study - Go Squared
AWS for Start-ups - Case Study - Go SquaredAWS for Start-ups - Case Study - Go Squared
AWS for Start-ups - Case Study - Go Squared
 
Protecting Your Data in AWS
Protecting Your Data in AWSProtecting Your Data in AWS
Protecting Your Data in AWS
 
AWS Customer Presentation: Centrastage - AWS Summit 2012 - London Customer Ta...
AWS Customer Presentation: Centrastage - AWS Summit 2012 - London Customer Ta...AWS Customer Presentation: Centrastage - AWS Summit 2012 - London Customer Ta...
AWS Customer Presentation: Centrastage - AWS Summit 2012 - London Customer Ta...
 
REA Sydney Customer Appreciation Day
REA Sydney Customer Appreciation DayREA Sydney Customer Appreciation Day
REA Sydney Customer Appreciation Day
 

Similar a How Amazon.com is Leveraging Amazon Redshift (DAT306) | AWS re:Invent 2013

Java Developers, make the database work for you (NLJUG JFall 2010)
Java Developers, make the database work for you (NLJUG JFall 2010)Java Developers, make the database work for you (NLJUG JFall 2010)
Java Developers, make the database work for you (NLJUG JFall 2010)Lucas Jellema
 
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesMigrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesAmazon Web Services
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Amazon Web Services
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftJie Li
 
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Web Services
 
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best PracticesAmazon Web Services
 
Metail and Elastic MapReduce
Metail and Elastic MapReduceMetail and Elastic MapReduce
Metail and Elastic MapReduceGareth Rogers
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...Amazon Web Services
 
Piranha vs. mammoth predator appliances that chew up big data
Piranha vs. mammoth   predator appliances that chew up big dataPiranha vs. mammoth   predator appliances that chew up big data
Piranha vs. mammoth predator appliances that chew up big dataJack (Yaakov) Bezalel
 
Best Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWSBest Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWSAmazon Web Services
 
Scaling on AWS to the First 10 Million Users
Scaling on AWS to the First 10 Million Users Scaling on AWS to the First 10 Million Users
Scaling on AWS to the First 10 Million Users mauerbac
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceSATOSHI TAGOMORI
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...Amazon Web Services
 
Database 12c is ready for you... Are you ready for 12c?
Database 12c is ready for you... Are you ready for 12c?Database 12c is ready for you... Are you ready for 12c?
Database 12c is ready for you... Are you ready for 12c?Performance Tuning Corporation
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹Amazon Web Services
 

Similar a How Amazon.com is Leveraging Amazon Redshift (DAT306) | AWS re:Invent 2013 (20)

Java Developers, make the database work for you (NLJUG JFall 2010)
Java Developers, make the database work for you (NLJUG JFall 2010)Java Developers, make the database work for you (NLJUG JFall 2010)
Java Developers, make the database work for you (NLJUG JFall 2010)
 
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesMigrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
 
Breaking data
Breaking dataBreaking data
Breaking data
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
 
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
 
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
 
Metail and Elastic MapReduce
Metail and Elastic MapReduceMetail and Elastic MapReduce
Metail and Elastic MapReduce
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
 
Piranha vs. mammoth predator appliances that chew up big data
Piranha vs. mammoth   predator appliances that chew up big dataPiranha vs. mammoth   predator appliances that chew up big data
Piranha vs. mammoth predator appliances that chew up big data
 
Redshift deep dive
Redshift deep diveRedshift deep dive
Redshift deep dive
 
Best Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWSBest Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWS
 
Scaling on AWS to the First 10 Million Users
Scaling on AWS to the First 10 Million Users Scaling on AWS to the First 10 Million Users
Scaling on AWS to the First 10 Million Users
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
 
Database 12c is ready for you... Are you ready for 12c?
Database 12c is ready for you... Are you ready for 12c?Database 12c is ready for you... Are you ready for 12c?
Database 12c is ready for you... Are you ready for 12c?
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
 
Redshift overview
Redshift overviewRedshift overview
Redshift overview
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Último

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Último (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

How Amazon.com is Leveraging Amazon Redshift (DAT306) | AWS re:Invent 2013

  • 1. DAT306 - How Amazon.com, with One of the World’s Largest Data Warehouses, is Leveraging Amazon Redshift Erik Selberg (selberg@amazon.com) and Abhishek Agrawal (abhagrwa@amazon.com) November 14, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 2. Agenda • Amazon Data Warehouse Overview • Amazon Data Warehouse and Amazon Redshift Integration Project • Amazon Redshift Best Practices • Conclusion
  • 3. Amazon Data Warehouse Overview Erik Selberg <selberg@amazon.com>
  • 4. Amazon Data Warehouse • Authoritative repository of data for all Amazon • Petabytes of data • Existing EDW is Oracle RAC; also using Amazon Elastic MapReduce and now Amazon Redshift • Owns managing the hardware and software infrastructure – Apart from Oracle DB, just Amazon IP • Not part of AWS
  • 5. Introducing the Elephant… • Mission: Provide customers the best value – – Leverage AWS only if it provides the best value We aren’t moving 100% to Amazon Redshift • Publish best practices – If AWS isn’t the best, we’ll say so • There is a conflict of interest
  • 7. Amazon Data Warehouse – Growth Story • Petabytes of data • Growth of data volume – YoY storage requirements have grown 67% • Growth of processing volume – YoY processing demand has grown 47%
  • 8. Long-Term Sustainable Scale $$ Wasted Demand SAN-based Redshift
  • 10. Amazon Data Warehouse – Cost per Job • Our main efficiency metric – Cost per Job (CPJ) $CapEx $DataCenter $VendorSup port PeakJobsPe rDay
  • 11. What Drives Cost per Job… Up? Down? • Number of disks – Data gets bigger! • Bidding – 2+ vendors • Number of servers • Moore’s Law – Vendors fight this! • Short-sighted negotiations – 4th year support… • Data design Data Center costs (power, rent) • Software (e.g. DBM) •
  • 12. Current State and Problems • Existing EDW – Multiple multi-petabyte clusters (redundancy and jobs) – Why not <x>? CPJ not lower • Data stored in SANs (not Exadata) • Performs poorly on scans of 10T+ • Long procurement cycles (3 month minimum)
  • 13. Amazon Data Warehouse and Amazon Redshift Integration Project • Spent 2013 evaluating Amazon Redshift for Amazon data warehouse – Where does Amazon Redshift provide a better CPJ? – Can Amazon Redshift solve some pain (without introducing new pain)? • Picked 10K jobs and 275 tables to copy
  • 14. Current State of Affairs • Biggest cluster size: 20+1 8XL • Peak daily jobs: 7211 (using all 4 clusters) • 4159 extracts • 3052 loads
  • 15. Some Results • Benchmarking for 4159 jobs – Outperforming 2719 – Underperforming 1440 – Avg. runtime • 4:43 mins in Amazon Redshift • 17:38 mins in existing EDW • LOADS are slower • EXTRACTS are faster Job Type RS Performance Category Job Count by Category EXTRACT EXTRACT EXTRACT EXTRACT EXTRACT EXTRACT LOAD LOAD LOAD LOAD LOAD LOAD 10X Faster 5X Faster 3X Faster 2X Faster 1X or same 2X Slower 10X Faster 5X Faster 3X Faster 2X Faster 1X or same 2X Slower 945 487 393 301 480 1150 7 15 23 23 45 290
  • 16. Amazon Redshift Best Practices Abhishek Agrawal <abhagrwa@amazon.com>
  • 17. Amazon Redshift Integration Best Practices • Integrating via Amazon S3 (Manifests) • Primary key enforcement • Idempotent loads – – MERGE via INSERT/UPDATE Mimic Trunc-Load [Backfills] • Trunc-partition using sort keys • Administration automation • Ensuring data correctness
  • 18. Integrating via Amazon S3 • S3 in the US Standard Region is eventually consistent! • S3 LIST might not give the entire list of data right after you save it (this WILL eventually happen to you!) • Amazon Redshift loads everything it sees in a bucket – You may see all data files, Amazon Redshift may not, which can cause missing data
  • 19. Best Practices – Using Amazon S3 • Read/COPY – – System table validation – STL_LOAD_ERRORS, Verify files loaded are ‘intended’ files • Write/ UNLOAD – – System table validation – STL_UNLOAD_LOG Verify all files that has the data are on S3 • Manifests – – – Metadata to know what to exactly to read from S3 Provides authoritative reference to data Powerful in terms of user metadata format, encryption, etc.
  • 20. Primary Key Enforcement • Amazon Redshift does not enforce primary key – You will need to do this to ensure data quality • Best practice – Introduce temp table to check duplicates in incoming data – Validate against incoming data to catch offenders – Put the data in target table and validate target data in the same transaction before commit • Yes, this IS a lot of overhead
  • 21. Idempotent Loads • Idempotent Loads – doing a load 2+ times the same as doing 1 load – Needed to manage load failures • MERGE – leverages primary key, row at a time • TRUNC / INSERT – load a partition at a time
  • 22. MERGE • No native Amazon Redshift MERGE support • Merge is implemented as a multi-step process – – – – Load the data in temp table Figure out inserts and load Figure out updates and modify target table Validation for duplicates
  • 23. TRUNC - INSERT • Solution – Distribute randomly – Use sort keys to align data (mimics partition) – Selectively delete and insert • Issues – Inserts are in an “unsorted” bucket – performance degrades without periodic VACUUM – Very slow (effectively row at a time)
  • 24. Other Temp Table Uses • Partial column data load • Filtered data load • Column transformations
  • 25. Automating Administration • Stored procs / Oracle workflow used to do admin task like retention, stats, etc. • Solution – We introduced a software layer to prepare the administrative task statements based on defined inputs – Execute using JDBC connection – Can schedule work like stats collection, vacuum, etc.
  • 26. 2013 Results • CPJ is 55% less on Amazon Redshift in general – – – – We can’t share the math, sorry YMMV Between Redshift and Amazon data warehouse, known improvements get us to ~66% Big wins are in big queries Loads are slow and expensive • Moved ~10K jobs to ~60 8XLs (4 clusters) • We could move at most 45% of our work to Amazon Redshift with minimal changes
  • 27. 2014 Plan • Focus on big tables (100T+) – Need to solve data expiry and backfill challenges • Solve problems with CPU bound • Interactive analytics (third-party vendor apps with Amazon Redshift + Oracle)
  • 28. Please give us your feedback on this presentation DAT306 As a thank you, we will select prize winners daily for completed surveys!