SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Greg Khairallah (gregkh@amazon.com)
SRV337
Building a Modern Data Warehouse
Head of Business Development, Database, and Analytics, Amazon Web Services
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Driving value from data
Traditional analytics
Data lakes extend the traditional approach
Amazon Redshift & Amazon Redshift Spectrum
Recently released features
Agenda
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Organizations that successfully generate business
value from their data will outperform their peers. An
Aberdeen survey saw organizations who implemented
a data lake outperforming similar companies by 9% in
organic revenue growth.*
24%
15%
Leaders Followers
Organic revenue growth
*Aberdeen: Angling for Insight in Today’s Data Lake, Michael Lock, SVP Analytics and Business Intelligence
Most Important: Driving Value from Data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Is Changing
Capture and store
new data at PB-EB scale
Do new type of analytics in
a cost effective way
• Machine learning
• Big data processing
• Real-time analytics
• Full-text search
New types of
analytics
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Traditionally, Analytics Used to Look Like This
OLTP ERP CRM LOB
Data warehouse
Business intelligence • Relational data
• TBs–PBs scale
• Schema defined prior to data load
• Operational reporting and ad hoc
• Large initial capex + $10K–$50K/TB/year
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Lakes Extend the Traditional Approach
Data warehouse
Business intelligence
OLTP ERP CRM LOB
• Relational and non-relational data
• TBs–EBs scale
• Diverse analytical engines
• Low-cost storage & analytics
Devices Web Sensors Social
Data lake
Big data processing,
real-time, machine learning
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Lakes from AWS
Analytics
• Unmatched durability, and availability at EB scale
• Best security, compliance, and audit capabilities
• Object-level controls for fine-grain access
• Fastest performance by retrieving subsets of data
• The most ways to bring data in
• 2x as many integrations with partners
• Analyze with broadest set of analytics & ML services
Machine
learning
Real-time dataOn-premises
Data lake
on AWS
movementdata movement
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS makes data management and analysis easy
Central Storage
Secure, cost-effective storage with
life cycle policies
Data Ingestion
Get your data into Amazon S3 quickly and securely
Amazon Kinesis Date Firehose, AWS Direct Connect, AWS
Snowball, AWS Database Migration Service
Catalog & Search
Access and search metadata
Processing & Analytics
Predictive and prescriptive analytics
AWS Glue, Amazon DynamoDB,
Amazon Elasticsearch Service
Amazon Athena, Amazon QuickSight,
Amazon EMR, Amazon Redshift
IAM, Amazon CloudWatch, AWS CloudTrail, AWS KMS
Protect & Secure
Use entitlements to ensure data is secure and users’
identities are verified
Amazon S3
Amazon Glacier,
Glacier Select, and
Vault Lock
Archive
Extremely low-cost, durable storage
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
10x faster at 1/10th the cost
Fast
Delivers fast results for all types
of workloads
Cost-effective
No upfront costs, start small,
and pay as you go
Integrated Secure
Audit everything; encrypt data
end-to-end; extensive
certification and compliance
Integrated with S3 data lakes, AWS
services and third-party tools
$
Simple
Create and start using a data
warehouse in minutes
Scalable
Gigabytes to petabytes
to exabytes
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift Spectrum
Extend the data warehouse to your Amazon S3 data lake
Scale compute and storage separately
Join data across Amazon Redshift and Amazon S3
Exabyte-scale Redshift SQL queries against Amazon S3
Stable query performance and unlimited concurrency
Parquet, ORC, JSON, Grok, Avro, & CSV formats
Pay only for the amount of data scanned
S3 data lakeRedshift data
Redshift Spectrum
query engine
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue
Amazon
S3
Data
sources
Amazon
Redshift
Redshift
Spectrum
BI Tools
Amazon Redshift Spectrum is a game changer for us. Reports that took minutes to produce are now delivered in
seconds. We like the ability to scale compute on-demand to query Petabytes of data in Amazon S3 in various
open file formats.”
NUVIAD is a mobile marketing platform providing
professional marketers, agencies, and local businesses with
hyper-targeted analytics at petabyte scale
Data Lake Analytics with Amazon Redshift Spectrum
• Seamlessly analyzing open file formats directly
in Amazon S3 to provide fresh, up-to-the-
minute insights
• Unlimited analytics and query concurrency
with Amazon Redshift
• Unlimited data capacity with Amazon S3
• 80% performance gain using Parquet data
format
– Rafi Ton
CEO, NUVIAD
“
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Query
SELECT COUNT(*)
FROM S3.EXT_TABLE
GROUP BY…
Life of a Spectrum
Query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
1
Query is optimized and compiled at
the leader node. Determine what gets
run locally and what goes to Amazon
Redshift Spectrum.
2
Query plan is sent to
all compute nodes.
3
Compute nodes obtain partition info from
Data Catalog; dynamically prune partitions.
4
Each compute node issues multiple
requests to the Amazon Redshift
Spectrum layer.
5
Amazon Redshift Spectrum nodes
scan your Amazon S3 data.
6
7
Amazon Redshift
Spectrum projects,
filters, joins, and
aggregates.
Final aggregations and joins
with local Amazon Redshift
tables done in-cluster.
8
Result is sent back to client.9
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multiple Redshift Clusters Scaling Performance
Amazon S3
Exabyte-scale object storage
AWS Glue
Data Catalog
Apache Hive Metastore
Amazon
Redshift
Amazon
Redshift
Amazon
Redshift
...
1 2 3 4 N
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift Spectrum – Current Support
File formats
• Avro, CSV, Grok, Ion,
JSON, ORC, Parquet,
RCFile, RegexSerDe,
SequenceFile, TextFile,
and TSV
Compression
• Gzip
• Snappy
• Bz2
• Brotli
Encryption
• SSE with AES256
• SSE KMS with default
key
Column types
• Numeric: bigint, int, smallint, float, double
and decimal
• Char/varchar/string
• Timestamp
• Boolean
• DATE type can be used only as a
partitioning key
Table type
• Non-partitioned table
(s3://mybucket/orders/..)
• Partitioned table
(s3://mybucket/orders/date=YYYY-MM-
DD/..)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Recently Released Features
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Dense Compute Nodes (DC2)
2x performance as DC1 at the same price
3x more I/O with
Upgrade at no cost
30% better storage utilization
than DC1
“Amazon Redshift’s new DC2 node is giving us a
100 percent performance increase, allowing us to
provide faster insights for our retailers, more cost
effectively, to drive incremental revenue."
NVMe SSD DDR4 memory
Intel E5-2686 v4 (Broadwell)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Short Query Acceleration
Express Lane for Short Queries
• Machine learning predicts the
runtime of queries
• Short queries are routed to an
express queue
• Resources are dynamically
dedicated to short queries
• Enable it today from your
AWS Management Console
How it works:
Analytics and
BI / Dashboard tools
Amazon
Redshift Machine Learning
Classifier
Machine learning
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Result-Set Caching
Subsecond repeat queries
• Amazon Redshift customers can now serve 35% more queries on average,
using the same compute resources
• Tens of thousands of compute hours are freed up daily to serve the
remaining queries and data ingestion
• Transparent – it just works!
“With Amazon Redshift result caching, 20 percent of our
queries now complete in less than one second,” said
Greg Rokita, Executive Director for Technology, Edmunds
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Commit Enhancements
50% faster data commits for busy clusters
16% faster data ingestion and insertion
Commit Duration Per Transaction for Busy Clusters
Nov Jan Mar
Total Commit Time by Month
ds2.8xlarge, cluster size: 10 and up, us-west-2
Clusters with more than 90 backups a day
p99 p95 p90 p50 Linear (p99)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Query Performance Improvements
• Faster hash joins
• Improvements to hash algorithm (Jan '18)
• Significant improvement in memory utilization (Feb '18)
• Cache line prefetching to improve join performance (Mar '18)
• Join-intensive workloads like TPC-H and TPC-DS show a performance improvement
ranging from 28% to 2x for several queries
• 64x reduction of memory footprint fleet wide for hash joins and aggregations.
Significant improvement to overall throughput
• Read and write queries can now hop WLM queues
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift Advisor
Advisor provides
automated
recommendations to help
you optimize database
performance and
decrease operating costs.
Shows up to seven
recommendations to help
you optimize your cluster.
Available via the
Amazon Redshift console
at no charge.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
New Amazon CloudWatch metrics for easy visualization of
cluster performance
• Monitor the performance and
health of your Amazon Redshift
cluster with two new Amazon
CloudWatch metrics, Query
Throughput and Query Duration.
• Query Throughput measures the
average number of queries
completed per second. Query
Duration measures the average
time taken to complete a query.
By observing these metrics, you
can easily determine how your
cluster is performing at any time.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Redshift Spectrum Enhancements
• Available in 14 AWS Regions
• Added support for processing scalar JSON and ION file formats in S3
• In addition to Parquet, ORC, Avro, CSV, Grok, RCFile, RegexSerDe,
OpenCSV, SequenceFile, TextFile, and TSV
• Support for DATE data type
• Support for IAM role-chaining to assume cross-account roles
• COPY from Parquet, ORC
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Coming Soon: Nested Data Support
• Analyze nested and semi-structured data in Amazon S3 with Spectrum
• Allows easy ETL of nested data in to Redshift using CTAS
• Support for open file formats: Parquet, ORC, JSON and Ion
• Uses dot notation to extend your existing SQL
s3data.clickStream: <<
{ “session_time”: “20171013 14:05:00”,
“clicks”: [ {“page”: “/home”, “referrer”: “”},
{“page”: “/products”, “referrer”: “/home”} ]
},
{ “session_time”: “20171013 14:06:00”,
“clicks”: [ {“page”: “/contact”, “referrer”: “/home”} ]
} >>
SELECT c.page,
COUNT(*) AS count
FROM s3data.clickStream s,
s.clicks c
WHERE s.session_time > ‘2017-10-01 00:00:00’
AND c.referrer = “/home”
GROUP BY c.page;
Example: Find click frequency for links on “/home”:
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Coming Soon: Nested Data Support
Improve query performance by analyzing nested data
OrderID CustomerID OrderTime ShipMode
5 23 10.00 12.50
8 32 1.00 5.60
OrdersWithItems
ItemID Quantity Price
23 10.00 12.50
16 1.00 1.99
32 1.00 5.60
24 5.00 26.50
OrderItems
OrderID ItemID Quantity Price
5 23 10.00 12.50
8 32 1.00 5.60
5 16 1.00 1.99
8 24 5.00 26.50
OrderID CustomerID OrderTime ShipMode
5 23 10.00 12.50
8 32 1.00 5.60
Orders
OrderItems
To improve query
performance, the
new Orders table
includes the
OrdersWithItems as
a nested column,
eliminating join
processing
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Big Data Blog—Amazon Redshift
Amazon Redshift Engineering’s Advanced Table Design Playbook
https://aws.amazon.com/blogs/big-data/amazon-redshift-engineerings-advanced-table-design-
playbook-preamble-prerequisites-and-prioritization/
- Zach Christopherson
Top 10 Performance Tuning Techniques for Amazon Redshift
https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-techniques-for-amazon-redshift/
- Ian Meyers and Zach Christopherson
10 Best Practices for Amazon Redshift Spectrum
https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/
- Po Hong and Peter Dalton
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Find out more: https://aws.amazon.com/redshift/
Try Amazon Redshift
Get help with your Proof-of-Concept
Read Amazon Redshift blog articles:
https://aws.amazon.com/redshift/blog-posts/
Get Started with Amazon Redshift
Amazon
Redshift
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Submit Session Feedback
1. Tap the Schedule icon.
2. Select the session you attended.
3. Tap Session Evaluation to submit your
feedback.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!

Más contenido relacionado

La actualidad más candente

Data Warehousing with Amazon Redshift: Data Analytics Week at the SF Loft
Data Warehousing with Amazon Redshift: Data Analytics Week at the SF LoftData Warehousing with Amazon Redshift: Data Analytics Week at the SF Loft
Data Warehousing with Amazon Redshift: Data Analytics Week at the SF LoftAmazon Web Services
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with LabAmazon Web Services
 
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Web Services
 
Deep Dive on Amazon Aurora with PostgreSQL Compatibility (DAT305-R1) - AWS re...
Deep Dive on Amazon Aurora with PostgreSQL Compatibility (DAT305-R1) - AWS re...Deep Dive on Amazon Aurora with PostgreSQL Compatibility (DAT305-R1) - AWS re...
Deep Dive on Amazon Aurora with PostgreSQL Compatibility (DAT305-R1) - AWS re...Amazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftAmazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
How Amazon.com Uses AWS Analytics
How Amazon.com Uses AWS AnalyticsHow Amazon.com Uses AWS Analytics
How Amazon.com Uses AWS AnalyticsAmazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduceAmazon Web Services
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseAmazon Web Services
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with RedshiftAmazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
BigData: AWS RedShift with S3, EC2
BigData: AWS RedShift with S3, EC2BigData: AWS RedShift with S3, EC2
BigData: AWS RedShift with S3, EC2Paulraj Pappaiah
 

La actualidad más candente (20)

Data Warehousing with Amazon Redshift: Data Analytics Week at the SF Loft
Data Warehousing with Amazon Redshift: Data Analytics Week at the SF LoftData Warehousing with Amazon Redshift: Data Analytics Week at the SF Loft
Data Warehousing with Amazon Redshift: Data Analytics Week at the SF Loft
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with Lab
 
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Masterclass - Redshift
Masterclass - RedshiftMasterclass - Redshift
Masterclass - Redshift
 
Deep Dive on Amazon Aurora with PostgreSQL Compatibility (DAT305-R1) - AWS re...
Deep Dive on Amazon Aurora with PostgreSQL Compatibility (DAT305-R1) - AWS re...Deep Dive on Amazon Aurora with PostgreSQL Compatibility (DAT305-R1) - AWS re...
Deep Dive on Amazon Aurora with PostgreSQL Compatibility (DAT305-R1) - AWS re...
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
How Amazon.com Uses AWS Analytics
How Amazon.com Uses AWS AnalyticsHow Amazon.com Uses AWS Analytics
How Amazon.com Uses AWS Analytics
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Amazon Redshift Masterclass
Amazon Redshift MasterclassAmazon Redshift Masterclass
Amazon Redshift Masterclass
 
(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data Warehouse
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with Redshift
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
BigData: AWS RedShift with S3, EC2
BigData: AWS RedShift with S3, EC2BigData: AWS RedShift with S3, EC2
BigData: AWS RedShift with S3, EC2
 

Similar a Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chicago AWS Summit

Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Amazon Web Services
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAmazon Web Services
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAmazon Web Services
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftAmazon Web Services
 
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Web Services
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceAmazon Web Services
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Amazon Web Services
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Amazon Web Services
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Amazon Web Services
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Modern Cloud Data Warehousing ft. Intuit: Optimize Analytics Practices (ANT20...
Modern Cloud Data Warehousing ft. Intuit: Optimize Analytics Practices (ANT20...Modern Cloud Data Warehousing ft. Intuit: Optimize Analytics Practices (ANT20...
Modern Cloud Data Warehousing ft. Intuit: Optimize Analytics Practices (ANT20...Amazon Web Services
 
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift SpectrumModernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift SpectrumAmazon Web Services
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeAmazon Web Services
 
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...Database Freedom. Database migration approaches to get to the Cloud - Marcus ...
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...Amazon Web Services
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Amazon Web Services
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAdir Sharabi
 

Similar a Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chicago AWS Summit (20)

Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
 
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
 
Modern Cloud Data Warehousing ft. Intuit: Optimize Analytics Practices (ANT20...
Modern Cloud Data Warehousing ft. Intuit: Optimize Analytics Practices (ANT20...Modern Cloud Data Warehousing ft. Intuit: Optimize Analytics Practices (ANT20...
Modern Cloud Data Warehousing ft. Intuit: Optimize Analytics Practices (ANT20...
 
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift SpectrumModernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_Singapore
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...Database Freedom. Database migration approaches to get to the Cloud - Marcus ...
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chicago AWS Summit

  • 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Greg Khairallah (gregkh@amazon.com) SRV337 Building a Modern Data Warehouse Head of Business Development, Database, and Analytics, Amazon Web Services
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Driving value from data Traditional analytics Data lakes extend the traditional approach Amazon Redshift & Amazon Redshift Spectrum Recently released features Agenda
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Organizations that successfully generate business value from their data will outperform their peers. An Aberdeen survey saw organizations who implemented a data lake outperforming similar companies by 9% in organic revenue growth.* 24% 15% Leaders Followers Organic revenue growth *Aberdeen: Angling for Insight in Today’s Data Lake, Michael Lock, SVP Analytics and Business Intelligence Most Important: Driving Value from Data
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Is Changing Capture and store new data at PB-EB scale Do new type of analytics in a cost effective way • Machine learning • Big data processing • Real-time analytics • Full-text search New types of analytics
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Traditionally, Analytics Used to Look Like This OLTP ERP CRM LOB Data warehouse Business intelligence • Relational data • TBs–PBs scale • Schema defined prior to data load • Operational reporting and ad hoc • Large initial capex + $10K–$50K/TB/year
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Lakes Extend the Traditional Approach Data warehouse Business intelligence OLTP ERP CRM LOB • Relational and non-relational data • TBs–EBs scale • Diverse analytical engines • Low-cost storage & analytics Devices Web Sensors Social Data lake Big data processing, real-time, machine learning
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Lakes from AWS Analytics • Unmatched durability, and availability at EB scale • Best security, compliance, and audit capabilities • Object-level controls for fine-grain access • Fastest performance by retrieving subsets of data • The most ways to bring data in • 2x as many integrations with partners • Analyze with broadest set of analytics & ML services Machine learning Real-time dataOn-premises Data lake on AWS movementdata movement
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS makes data management and analysis easy Central Storage Secure, cost-effective storage with life cycle policies Data Ingestion Get your data into Amazon S3 quickly and securely Amazon Kinesis Date Firehose, AWS Direct Connect, AWS Snowball, AWS Database Migration Service Catalog & Search Access and search metadata Processing & Analytics Predictive and prescriptive analytics AWS Glue, Amazon DynamoDB, Amazon Elasticsearch Service Amazon Athena, Amazon QuickSight, Amazon EMR, Amazon Redshift IAM, Amazon CloudWatch, AWS CloudTrail, AWS KMS Protect & Secure Use entitlements to ensure data is secure and users’ identities are verified Amazon S3 Amazon Glacier, Glacier Select, and Vault Lock Archive Extremely low-cost, durable storage
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift 10x faster at 1/10th the cost Fast Delivers fast results for all types of workloads Cost-effective No upfront costs, start small, and pay as you go Integrated Secure Audit everything; encrypt data end-to-end; extensive certification and compliance Integrated with S3 data lakes, AWS services and third-party tools $ Simple Create and start using a data warehouse in minutes Scalable Gigabytes to petabytes to exabytes
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift Spectrum Extend the data warehouse to your Amazon S3 data lake Scale compute and storage separately Join data across Amazon Redshift and Amazon S3 Exabyte-scale Redshift SQL queries against Amazon S3 Stable query performance and unlimited concurrency Parquet, ORC, JSON, Grok, Avro, & CSV formats Pay only for the amount of data scanned S3 data lakeRedshift data Redshift Spectrum query engine
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Amazon S3 Data sources Amazon Redshift Redshift Spectrum BI Tools Amazon Redshift Spectrum is a game changer for us. Reports that took minutes to produce are now delivered in seconds. We like the ability to scale compute on-demand to query Petabytes of data in Amazon S3 in various open file formats.” NUVIAD is a mobile marketing platform providing professional marketers, agencies, and local businesses with hyper-targeted analytics at petabyte scale Data Lake Analytics with Amazon Redshift Spectrum • Seamlessly analyzing open file formats directly in Amazon S3 to provide fresh, up-to-the- minute insights • Unlimited analytics and query concurrency with Amazon Redshift • Unlimited data capacity with Amazon S3 • 80% performance gain using Parquet data format – Rafi Ton CEO, NUVIAD “
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Query SELECT COUNT(*) FROM S3.EXT_TABLE GROUP BY… Life of a Spectrum Query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Metastore 1 Query is optimized and compiled at the leader node. Determine what gets run locally and what goes to Amazon Redshift Spectrum. 2 Query plan is sent to all compute nodes. 3 Compute nodes obtain partition info from Data Catalog; dynamically prune partitions. 4 Each compute node issues multiple requests to the Amazon Redshift Spectrum layer. 5 Amazon Redshift Spectrum nodes scan your Amazon S3 data. 6 7 Amazon Redshift Spectrum projects, filters, joins, and aggregates. Final aggregations and joins with local Amazon Redshift tables done in-cluster. 8 Result is sent back to client.9
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Multiple Redshift Clusters Scaling Performance Amazon S3 Exabyte-scale object storage AWS Glue Data Catalog Apache Hive Metastore Amazon Redshift Amazon Redshift Amazon Redshift ... 1 2 3 4 N
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift Spectrum – Current Support File formats • Avro, CSV, Grok, Ion, JSON, ORC, Parquet, RCFile, RegexSerDe, SequenceFile, TextFile, and TSV Compression • Gzip • Snappy • Bz2 • Brotli Encryption • SSE with AES256 • SSE KMS with default key Column types • Numeric: bigint, int, smallint, float, double and decimal • Char/varchar/string • Timestamp • Boolean • DATE type can be used only as a partitioning key Table type • Non-partitioned table (s3://mybucket/orders/..) • Partitioned table (s3://mybucket/orders/date=YYYY-MM- DD/..)
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Recently Released Features
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Dense Compute Nodes (DC2) 2x performance as DC1 at the same price 3x more I/O with Upgrade at no cost 30% better storage utilization than DC1 “Amazon Redshift’s new DC2 node is giving us a 100 percent performance increase, allowing us to provide faster insights for our retailers, more cost effectively, to drive incremental revenue." NVMe SSD DDR4 memory Intel E5-2686 v4 (Broadwell)
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Short Query Acceleration Express Lane for Short Queries • Machine learning predicts the runtime of queries • Short queries are routed to an express queue • Resources are dynamically dedicated to short queries • Enable it today from your AWS Management Console How it works: Analytics and BI / Dashboard tools Amazon Redshift Machine Learning Classifier Machine learning
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Result-Set Caching Subsecond repeat queries • Amazon Redshift customers can now serve 35% more queries on average, using the same compute resources • Tens of thousands of compute hours are freed up daily to serve the remaining queries and data ingestion • Transparent – it just works! “With Amazon Redshift result caching, 20 percent of our queries now complete in less than one second,” said Greg Rokita, Executive Director for Technology, Edmunds
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Commit Enhancements 50% faster data commits for busy clusters 16% faster data ingestion and insertion Commit Duration Per Transaction for Busy Clusters Nov Jan Mar Total Commit Time by Month ds2.8xlarge, cluster size: 10 and up, us-west-2 Clusters with more than 90 backups a day p99 p95 p90 p50 Linear (p99)
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Query Performance Improvements • Faster hash joins • Improvements to hash algorithm (Jan '18) • Significant improvement in memory utilization (Feb '18) • Cache line prefetching to improve join performance (Mar '18) • Join-intensive workloads like TPC-H and TPC-DS show a performance improvement ranging from 28% to 2x for several queries • 64x reduction of memory footprint fleet wide for hash joins and aggregations. Significant improvement to overall throughput • Read and write queries can now hop WLM queues
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift Advisor Advisor provides automated recommendations to help you optimize database performance and decrease operating costs. Shows up to seven recommendations to help you optimize your cluster. Available via the Amazon Redshift console at no charge.
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. New Amazon CloudWatch metrics for easy visualization of cluster performance • Monitor the performance and health of your Amazon Redshift cluster with two new Amazon CloudWatch metrics, Query Throughput and Query Duration. • Query Throughput measures the average number of queries completed per second. Query Duration measures the average time taken to complete a query. By observing these metrics, you can easily determine how your cluster is performing at any time.
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Redshift Spectrum Enhancements • Available in 14 AWS Regions • Added support for processing scalar JSON and ION file formats in S3 • In addition to Parquet, ORC, Avro, CSV, Grok, RCFile, RegexSerDe, OpenCSV, SequenceFile, TextFile, and TSV • Support for DATE data type • Support for IAM role-chaining to assume cross-account roles • COPY from Parquet, ORC
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Coming Soon: Nested Data Support • Analyze nested and semi-structured data in Amazon S3 with Spectrum • Allows easy ETL of nested data in to Redshift using CTAS • Support for open file formats: Parquet, ORC, JSON and Ion • Uses dot notation to extend your existing SQL s3data.clickStream: << { “session_time”: “20171013 14:05:00”, “clicks”: [ {“page”: “/home”, “referrer”: “”}, {“page”: “/products”, “referrer”: “/home”} ] }, { “session_time”: “20171013 14:06:00”, “clicks”: [ {“page”: “/contact”, “referrer”: “/home”} ] } >> SELECT c.page, COUNT(*) AS count FROM s3data.clickStream s, s.clicks c WHERE s.session_time > ‘2017-10-01 00:00:00’ AND c.referrer = “/home” GROUP BY c.page; Example: Find click frequency for links on “/home”:
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Coming Soon: Nested Data Support Improve query performance by analyzing nested data OrderID CustomerID OrderTime ShipMode 5 23 10.00 12.50 8 32 1.00 5.60 OrdersWithItems ItemID Quantity Price 23 10.00 12.50 16 1.00 1.99 32 1.00 5.60 24 5.00 26.50 OrderItems OrderID ItemID Quantity Price 5 23 10.00 12.50 8 32 1.00 5.60 5 16 1.00 1.99 8 24 5.00 26.50 OrderID CustomerID OrderTime ShipMode 5 23 10.00 12.50 8 32 1.00 5.60 Orders OrderItems To improve query performance, the new Orders table includes the OrdersWithItems as a nested column, eliminating join processing
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Big Data Blog—Amazon Redshift Amazon Redshift Engineering’s Advanced Table Design Playbook https://aws.amazon.com/blogs/big-data/amazon-redshift-engineerings-advanced-table-design- playbook-preamble-prerequisites-and-prioritization/ - Zach Christopherson Top 10 Performance Tuning Techniques for Amazon Redshift https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-techniques-for-amazon-redshift/ - Ian Meyers and Zach Christopherson 10 Best Practices for Amazon Redshift Spectrum https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/ - Po Hong and Peter Dalton
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Find out more: https://aws.amazon.com/redshift/ Try Amazon Redshift Get help with your Proof-of-Concept Read Amazon Redshift blog articles: https://aws.amazon.com/redshift/blog-posts/ Get Started with Amazon Redshift Amazon Redshift
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Submit Session Feedback 1. Tap the Schedule icon. 2. Select the session you attended. 3. Tap Session Evaluation to submit your feedback.
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!