SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Osemeke Isibor
Solutions Architect –Amazon Web Services
Building a Modern Data Warehouse:
Deep Dive on Amazon Redshift
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why Modernise?
Performance Scalability Cost
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Introducing
Amazon Redshift
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
• Fully managed
• High performance SQL
• Massively Parallel Processing
• Petabyte-scale data warehousing
service
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Analytics Architecture
Collect Store Analyze
Amazon Kinesis
Data Firehose
AWS Direct
Connect
Amazon
Snowball
Amazon Kinesis
Data Analytics
Amazon Kinesis
Data Streams
Amazon S3 Amazon Glacier
Amazon
CloudSearch
Amazon RDS,
Amazon Aurora
Amazon
DynamoDB
Amazon ES
Amazon EMR
Amazon
Redshift
Amazon
QuickSight
AWS Database Migration Service AWS Glue
Amazon Athena
Amazon AI
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift Fast
“After investigating Redshift, Snowflake, and
BigQuery, we found that Redshift offers top-of-the-
line performance at best-in-market price points”
“…[Redshift] performance has blown away everyone
here. We generally see 50-100X speedup over Hive”
Delivers fast results for all
types of workloads
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift is Cost Effective
No upfront costs, start small,
and pay as you go
“450,000 online queries 98 percent faster than
previous traditional data center, while reducing
infrastructure costs by 80 percent.”
“Most competing data warehousing solutions would
have cost us up to $1 million a year. By contrast,
Amazon Redshift costs us just $100,000 all-in,
representing a total cost savings of around 90%”
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Redshift is Easy to Use
Provisioning in
minutes
“With Amazon Redshift and Tableau, anyone in the
company can set up any queries they like - from how
users are reacting to a feature, to growth by demographic or
geography, to the impact sales efforts had in different areas”
“The doors were blown wide open to create custom
dashboards for anyone to instantly go in and see and
assess what is going in our ad delivery landscape,
something we have never been able to do until now.”
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Redshift has built in Durability and Availability
Automated backups Cross-region backups Cluster-level
mirroring
Streaming restore Monitoring Automatic patching
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Redshift is Secure
End-to-End
data encryption
Alerts & Notifications Virtual private cloud
AWS KMS & HSM Audit logging Certifications & Compliance
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The Forrester Wave™ is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave™ are trademarks of Forrester Research, Inc. The Forrester Wave™ is a graphical
representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor,
product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change.
“Amazon Redshift has the largest adoption
of BDW in the cloud.”
“With more than 5,000 deployments, Amazon
Redshift has the largest data warehouse
deployments in the cloud – some over 10
petabytes in size.”
AWS received a score of 5/5 (the highest
score possible) in the: customer base,
market awareness, ability to execute, road
map, support, and partners criteria
Forrester Wave Big Data Warehouse Q2 2017
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift is Widely Available
Ireland
Frankfurt
London
Beijing
Mumbai
Seoul
Singapore
Sydney
Tokyo
Sao Paulo
US East – N Virginia
US East – Ohio
US West – Oregon
US West – N California
AWS GovCloud (US)
Canada – Central, Montreal
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Selected Amazon Redshift Partners
Data Integration Systems IntegratorsBusiness Intelligence
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Leader node
• SQL endpoint
• Stores metadata
• Coordinates parallel SQL processing
Compute nodes
• Local, columnar storage
• Executes queries in parallel
• Load, backup, restore
• 2, 16, or 32 slices
Redshift Cluster
JDBC/ODBC
Leader Node
Compute Nodes
Efficient Data Loads
Streaming Backup/Restore
Amazon Redshift Architecture
Amazon S3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Slice 1 | Slice 2
Node 1
Slice 1 | Slice 2
Node 2
Slice 1 | Slice 2
Node 3
Virtual Core
7.5 GB RAM
Local Disk
Virtual Core
7.5 GB RAM
Local Disk
Virtual Core
7.5 GB RAM
Local Disk
Virtual Core
7.5 GB RAM
Local Disk
Virtual Core
7.5 GB RAM
Local Disk
Virtual Core
7.5 GB RAM
Local Disk
Redshift MPP Architecture
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Redshift Spectrum
Extend your data warehouse to your Amazon S3 data lake
Scale compute and storage separately
Join data across Amazon Redshift and Amazon S3
Amazon Redshift SQL queries against exabytes in
Amazon S3
Stable query performance and unlimited
concurrency
Parquet, ORC, Grok, Avro, & CSV data formats
Pay only for the amount of data scanned
Amazon S3
data lake
Amazon
Redshift data
Redshift Spectrum
query engine
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Life of A Query
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Query
SELECT COUNT(*)
FROM S3.EXT_TABLE
GROUP BY…
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive
Compatible Metastore
1
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive
Compatible Metastore
Query is optimized and compiled at the
leader node. Determine what gets run locally
and what goes to Amazon Redshift Spectrum
2
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive
Compatible Metastore
Query plan is sent to
all compute nodes3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive
Compatible Metastore
Compute nodes obtain partition info from
Data Catalog; dynamically prune
partitions
4
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive
Compatible Metastore
Each compute node issues
multiple requests to the Amazon
Redshift Spectrum layer
5
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive
Compatible Metastore
Amazon Redshift Spectrum nodes
scan your S3 data
6
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive
Compatible Metastore
7
Amazon Redshift
Spectrum projects,
filters, and aggregates
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive
Compatible Metastore
Final aggregations and joins
with local Amazon Redshift
tables done in-cluster
8
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive
Compatible Metastore
Result is sent back to client9
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Recently Released Features
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Dense Compute Nodes (DC2)
2x performance at the same price as DC1
3x more I/O with
30% better storage utilization
than DC1
“Amazon Redshift’s new DC2 node is giving
us a 100 percent performance increase,
allowing us to provide faster insights for our
retailers, more cost effectively, to drive
incremental revenue."
NVMe SSD DDR4 memory
Intel E5-2686 v4 (Broadwell)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Short Query Acceleration
Express Lane for Short Queries
• Short queries do not get stuck behind long running queries
• Higher throughput – Less variability
• Adapts to your workload
• Transparent – It just works!
Average Wait Queue Time for Small Queries (<1 sec.)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Short Query Acceleration
Express Lane for Short Queries
• Machine learning predicts
the runtime of queries
• Short queries are routed to
an express queue
• Elastic SQA: Resources
dynamically dedicated to
serve a burst of short queries
• Enable it today on your
AWS Management Console
• Dynamic timeout based on
your workload (coming soon)
How it works:
Analytics and
BI / Dashboard tools
Amazon
Redshift Machine Learning
Classifier
Machine learning
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
BI / Dashboard tools
Analytics and
Amazon
Redshift
Queries go to the leader node1
If the cache contains the query result,
it is returned with no processing
2
If the query result is not in cache, it is
executed, and the result is cached
3
RESULTS CACHE
QUERY_ID RESULT
QUERY_ID RESULT
Result-Set Caching
Sub-second repeat queries
How it works:
Result
cache
Caching frees up the Amazon Redshift cluster,
increasing performance for all queries
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
16% faster data ingestion and insertion
40% faster data commits for busy clusters
Nov Dec Jan Feb Mar
Total Commit Time by Month
ds2.8xlarge, cluster size: 10 and up, us-west-2
Clusters with more than 90 backups a day
p99 p95 p90 p50 Linear (p99)
-50%
-30%
-30%
-20%
Commit Duration Per Transaction for Busy Clusters
Commit Enhancements
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift is Self-Healing
Machine Learning based Prediction and
remediation of degraded disks, nodes,
clusters, and network issues.
Ensure overall cluster and query performance
Amazon
Redshift
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Nested Data (coming soon)
• Analyze nested and semi-structured data in Amazon S3 with Spectrum
• Allows easy ETL of nested data in to Amazon Redshift using CTAS
• Support for open file formats: Parquet, ORC, JSON, Ion, and AVRO
• Uses dot notation to extend your existing SQL
s3data.clickStream: <<
{ “session_time”: “20171013 14:05:00”,
“clicks”: [ {“page”: “/home”, “referrer”: “”},
{“page”: “/products”, “referrer”: “/home”} ]
},
{ “session_time”: “20171013 14:06:00”,
“clicks”: [ {“page”: “/contact”, “referrer”: “/home”} ]
} >>
SELECT c.page,
COUNT(*) AS count
FROM s3data.clickStream s,
s.clicks c
WHERE s.session_time > ‘2017-10-01 00:00:00’
AND c.referrer = “/home”
GROUP BY c.page;
Example: Find click frequency for links on “/home”:
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Nested Data (coming soon)
Improve query performance by analyzing nested data
OrderI
D
CustomerID OrderTime ShipMode
5 23 10.00 12.50
8 32 1.00 5.60
OrdersWithItems
ItemID Quantity Price
23 10.00 12.50
16 1.00 1.99
32 1.00 5.60
24 5.00 26.50
OrderItems
OrderID ItemID Quantity Price
5 23 10.00 12.50
8 32 1.00 5.60
5 16 1.00 1.99
8 24 5.00 26.50
OrderID CustomerID OrderTime ShipMode
5 23 10.00 12.50
8 32 1.00 5.60
Orders
OrderItems
To improve query
performance, the
new Orders table
includes the
OrdersWithItems as
a nested column,
eliminating join
processing
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Summary
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Find out more: https://aws.amazon.com/redshift/
Try Amazon Redshift
Get help with your Proof-of-Concept
Read Amazon Redshift blog articles:
https://aws.amazon.com/redshift/blog-posts/
Get Started with Amazon Redshift
Amazon
Redshift
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!

Más contenido relacionado

La actualidad más candente

Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineAmazon Web Services
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxSwathiPonugumati
 
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdfData & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdfChris Bingham
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxCalvinSim10
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSAmazon Web Services
 

La actualidad más candente (20)

Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Introduction to AWS Glue
Introduction to AWS GlueIntroduction to AWS Glue
Introduction to AWS Glue
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptx
 
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdfData & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
Data & Analytics ReInvent Recap [AWS Basel Meetup - Jan 2023].pdf
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
 
Data Mesh
Data MeshData Mesh
Data Mesh
 

Similar a Building a Modern Data Warehouse - Deep Dive on Amazon Redshift

Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift SpectrumModernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift SpectrumAmazon Web Services
 
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Web Services
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAmazon Web Services
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAmazon Web Services
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftAmazon Web Services
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeAmazon Web Services
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Amazon Web Services
 
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Amazon Web Services
 
Managed Relational Databases - Amazon RDS
Managed Relational Databases - Amazon RDSManaged Relational Databases - Amazon RDS
Managed Relational Databases - Amazon RDSAmazon Web Services
 
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...Amazon Web Services
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Amazon Web Services
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAdir Sharabi
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Amazon Web Services
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Amazon Web Services
 
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Amazon Web Services
 
Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018Amazon Web Services
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
Data Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksData Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksAmazon Web Services
 

Similar a Building a Modern Data Warehouse - Deep Dive on Amazon Redshift (20)

Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift SpectrumModernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
 
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_Singapore
 
Migrating database to cloud
Migrating database to cloudMigrating database to cloud
Migrating database to cloud
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
 
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
 
Managed Relational Databases - Amazon RDS
Managed Relational Databases - Amazon RDSManaged Relational Databases - Amazon RDS
Managed Relational Databases - Amazon RDS
 
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
 
AWS Database Services @ Scale
AWS Database Services @ ScaleAWS Database Services @ Scale
AWS Database Services @ Scale
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018
 
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
 
Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
Data Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksData Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech Talks
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Building a Modern Data Warehouse - Deep Dive on Amazon Redshift

  • 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Osemeke Isibor Solutions Architect –Amazon Web Services Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why Modernise? Performance Scalability Cost
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Introducing Amazon Redshift
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift • Fully managed • High performance SQL • Massively Parallel Processing • Petabyte-scale data warehousing service
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Analytics Architecture Collect Store Analyze Amazon Kinesis Data Firehose AWS Direct Connect Amazon Snowball Amazon Kinesis Data Analytics Amazon Kinesis Data Streams Amazon S3 Amazon Glacier Amazon CloudSearch Amazon RDS, Amazon Aurora Amazon DynamoDB Amazon ES Amazon EMR Amazon Redshift Amazon QuickSight AWS Database Migration Service AWS Glue Amazon Athena Amazon AI
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift Fast “After investigating Redshift, Snowflake, and BigQuery, we found that Redshift offers top-of-the- line performance at best-in-market price points” “…[Redshift] performance has blown away everyone here. We generally see 50-100X speedup over Hive” Delivers fast results for all types of workloads
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift is Cost Effective No upfront costs, start small, and pay as you go “450,000 online queries 98 percent faster than previous traditional data center, while reducing infrastructure costs by 80 percent.” “Most competing data warehousing solutions would have cost us up to $1 million a year. By contrast, Amazon Redshift costs us just $100,000 all-in, representing a total cost savings of around 90%”
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Redshift is Easy to Use Provisioning in minutes “With Amazon Redshift and Tableau, anyone in the company can set up any queries they like - from how users are reacting to a feature, to growth by demographic or geography, to the impact sales efforts had in different areas” “The doors were blown wide open to create custom dashboards for anyone to instantly go in and see and assess what is going in our ad delivery landscape, something we have never been able to do until now.”
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Redshift has built in Durability and Availability Automated backups Cross-region backups Cluster-level mirroring Streaming restore Monitoring Automatic patching
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Redshift is Secure End-to-End data encryption Alerts & Notifications Virtual private cloud AWS KMS & HSM Audit logging Certifications & Compliance
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The Forrester Wave™ is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave™ are trademarks of Forrester Research, Inc. The Forrester Wave™ is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change. “Amazon Redshift has the largest adoption of BDW in the cloud.” “With more than 5,000 deployments, Amazon Redshift has the largest data warehouse deployments in the cloud – some over 10 petabytes in size.” AWS received a score of 5/5 (the highest score possible) in the: customer base, market awareness, ability to execute, road map, support, and partners criteria Forrester Wave Big Data Warehouse Q2 2017
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift is Widely Available Ireland Frankfurt London Beijing Mumbai Seoul Singapore Sydney Tokyo Sao Paulo US East – N Virginia US East – Ohio US West – Oregon US West – N California AWS GovCloud (US) Canada – Central, Montreal
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Selected Amazon Redshift Partners Data Integration Systems IntegratorsBusiness Intelligence
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Leader node • SQL endpoint • Stores metadata • Coordinates parallel SQL processing Compute nodes • Local, columnar storage • Executes queries in parallel • Load, backup, restore • 2, 16, or 32 slices Redshift Cluster JDBC/ODBC Leader Node Compute Nodes Efficient Data Loads Streaming Backup/Restore Amazon Redshift Architecture Amazon S3
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Slice 1 | Slice 2 Node 1 Slice 1 | Slice 2 Node 2 Slice 1 | Slice 2 Node 3 Virtual Core 7.5 GB RAM Local Disk Virtual Core 7.5 GB RAM Local Disk Virtual Core 7.5 GB RAM Local Disk Virtual Core 7.5 GB RAM Local Disk Virtual Core 7.5 GB RAM Local Disk Virtual Core 7.5 GB RAM Local Disk Redshift MPP Architecture
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Redshift Spectrum Extend your data warehouse to your Amazon S3 data lake Scale compute and storage separately Join data across Amazon Redshift and Amazon S3 Amazon Redshift SQL queries against exabytes in Amazon S3 Stable query performance and unlimited concurrency Parquet, ORC, Grok, Avro, & CSV data formats Pay only for the amount of data scanned Amazon S3 data lake Amazon Redshift data Redshift Spectrum query engine
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Life of A Query
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Query SELECT COUNT(*) FROM S3.EXT_TABLE GROUP BY… Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Compatible Metastore 1
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Compatible Metastore Query is optimized and compiled at the leader node. Determine what gets run locally and what goes to Amazon Redshift Spectrum 2
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Compatible Metastore Query plan is sent to all compute nodes3
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Compatible Metastore Compute nodes obtain partition info from Data Catalog; dynamically prune partitions 4
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Compatible Metastore Each compute node issues multiple requests to the Amazon Redshift Spectrum layer 5
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Compatible Metastore Amazon Redshift Spectrum nodes scan your S3 data 6
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Compatible Metastore 7 Amazon Redshift Spectrum projects, filters, and aggregates
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Compatible Metastore Final aggregations and joins with local Amazon Redshift tables done in-cluster 8
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Compatible Metastore Result is sent back to client9
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Recently Released Features
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Dense Compute Nodes (DC2) 2x performance at the same price as DC1 3x more I/O with 30% better storage utilization than DC1 “Amazon Redshift’s new DC2 node is giving us a 100 percent performance increase, allowing us to provide faster insights for our retailers, more cost effectively, to drive incremental revenue." NVMe SSD DDR4 memory Intel E5-2686 v4 (Broadwell)
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Short Query Acceleration Express Lane for Short Queries • Short queries do not get stuck behind long running queries • Higher throughput – Less variability • Adapts to your workload • Transparent – It just works! Average Wait Queue Time for Small Queries (<1 sec.)
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Short Query Acceleration Express Lane for Short Queries • Machine learning predicts the runtime of queries • Short queries are routed to an express queue • Elastic SQA: Resources dynamically dedicated to serve a burst of short queries • Enable it today on your AWS Management Console • Dynamic timeout based on your workload (coming soon) How it works: Analytics and BI / Dashboard tools Amazon Redshift Machine Learning Classifier Machine learning
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. BI / Dashboard tools Analytics and Amazon Redshift Queries go to the leader node1 If the cache contains the query result, it is returned with no processing 2 If the query result is not in cache, it is executed, and the result is cached 3 RESULTS CACHE QUERY_ID RESULT QUERY_ID RESULT Result-Set Caching Sub-second repeat queries How it works: Result cache Caching frees up the Amazon Redshift cluster, increasing performance for all queries
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. 16% faster data ingestion and insertion 40% faster data commits for busy clusters Nov Dec Jan Feb Mar Total Commit Time by Month ds2.8xlarge, cluster size: 10 and up, us-west-2 Clusters with more than 90 backups a day p99 p95 p90 p50 Linear (p99) -50% -30% -30% -20% Commit Duration Per Transaction for Busy Clusters Commit Enhancements
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift is Self-Healing Machine Learning based Prediction and remediation of degraded disks, nodes, clusters, and network issues. Ensure overall cluster and query performance Amazon Redshift
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Nested Data (coming soon) • Analyze nested and semi-structured data in Amazon S3 with Spectrum • Allows easy ETL of nested data in to Amazon Redshift using CTAS • Support for open file formats: Parquet, ORC, JSON, Ion, and AVRO • Uses dot notation to extend your existing SQL s3data.clickStream: << { “session_time”: “20171013 14:05:00”, “clicks”: [ {“page”: “/home”, “referrer”: “”}, {“page”: “/products”, “referrer”: “/home”} ] }, { “session_time”: “20171013 14:06:00”, “clicks”: [ {“page”: “/contact”, “referrer”: “/home”} ] } >> SELECT c.page, COUNT(*) AS count FROM s3data.clickStream s, s.clicks c WHERE s.session_time > ‘2017-10-01 00:00:00’ AND c.referrer = “/home” GROUP BY c.page; Example: Find click frequency for links on “/home”:
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Nested Data (coming soon) Improve query performance by analyzing nested data OrderI D CustomerID OrderTime ShipMode 5 23 10.00 12.50 8 32 1.00 5.60 OrdersWithItems ItemID Quantity Price 23 10.00 12.50 16 1.00 1.99 32 1.00 5.60 24 5.00 26.50 OrderItems OrderID ItemID Quantity Price 5 23 10.00 12.50 8 32 1.00 5.60 5 16 1.00 1.99 8 24 5.00 26.50 OrderID CustomerID OrderTime ShipMode 5 23 10.00 12.50 8 32 1.00 5.60 Orders OrderItems To improve query performance, the new Orders table includes the OrdersWithItems as a nested column, eliminating join processing
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Summary
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Find out more: https://aws.amazon.com/redshift/ Try Amazon Redshift Get help with your Proof-of-Concept Read Amazon Redshift blog articles: https://aws.amazon.com/redshift/blog-posts/ Get Started with Amazon Redshift Amazon Redshift
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!