SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
Deep Dive: Amazon Redshift
Julien Simon"
Principal Technical Evangelist
julsimon@amazon.fr
@julsimon
Agenda
•  Architecture overview
•  Optimization
•  Schema / table design
•  Ingestion
•  Maintenance and database tuning
Architecture overview
Relational data warehouse"
Massively parallel; petabyte scale"
Fully managed"
HDD and SSD platforms"
$1,000/TB/year; starts at $0.25/hour
Amazon "
Redshift
a lot faster
a lot simpler
a lot cheaper
Selected Amazon Redshift customers
Amazon Redshift architecture
Leader node
•  SQL endpoint
•  Stores metadata
•  Coordinates query execution
Compute nodes
•  Local, columnar storage
•  Executes queries in parallel
•  Load, backup, restore via "
Amazon S3; load from "
Amazon DynamoDB, Amazon EMR, or SSH
Two hardware platforms
•  Optimized for data processing
•  DS2: HDD; scale from 2 TB to 2 PB
•  DC1: SSD; scale from 160 GB to 326 TB
10 GigE
(HPC)
Ingestion
Backup
Restore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
S3/EMR/DynamoDB/SSH
JDBC/ODBC
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
Leader
Node
A deeper look at compute node architecture
Each node contains multiple slices
•  DS2 – 2 slices on XL, 16 on 8 XL
•  DC1 – 2 slices on L, 32 on 8 XL
Each slice is allocated CPU and
table data
Each slice processes a piece of the
workload in parallel


https://docs.aws.amazon.com/fr_fr/redshift/latest/dg/c_high_level_system_architecture.html
The best way to look through large
amounts of data is to NOT look
through large amounts of data
Amazon Redshift dramatically reduces I/O
Data compression

Zone maps



ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
Calculating SUM(Amount) with "
row storage:
–  Need to read everything
–  Unnecessary I/O
ID Age State Amount
Amazon Redshift dramatically reduces I/O
Data compression

Zone maps



ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
Calculating SUM(Amount) with
column storage:
–  Only scan the necessary
blocks
ID Age State Amount
Amazon Redshift dramatically reduces I/O
Column storage
Data compression

Zone maps



 Columnar compression
–  Effective thanks to similar data 
–  Reduces storage requirements
–  Reduces I/O 
ID Age State Amount
analyze compression orders;
Table | Column | Encoding
--------+-------------+----------
orders | id | mostly32
orders | age | mostly32
orders | state | lzo
orders | amount | mostly32
https://docs.aws.amazon.com/redshift/latest/dg/c_Compression_encodings.html
https://docs.aws.amazon.com/fr_fr/redshift/latest/dg/c_columnar_storage_disk_mem_mgmnt.html 
Column storage: less I/O, more compression
Amazon Redshift dramatically reduces I/O
Column storage
Data compression

Zone maps



•  In-memory block metadata
•  Contains per-block MIN and MAX value
•  Effectively ignores blocks that don’t
contain data for a given query
•  Minimizes unnecessary I/O
ID Age State Amount
•  Zone maps work best
with sort keys J
Optimization: schema design"
"
Distribution"
Sorting"
Compression
Keep Your Columns as Narrow as Possible
•  Buffers allocated based on declared
column width
•  Wider than needed columns mean
memory is wasted
•  Fewer rows fit into memory; increased
likelihood of queries spilling to disk
•  Check
SVV_TABLE_INFO(max_varchar)
https://docs.aws.amazon.com/redshift/latest/dg/r_SVV_TABLE_INFO.html
Goals of distribution
•  Distribute data evenly for parallel processing
•  Minimize data movement: co-located joins, localized aggregations
Distribution key
 All
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Full table data on first
slice of every node
Same key to same location
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Even
Round-robin 
distribution
Choosing a distribution style
Key
•  Large fact tables
•  Rapidly changing tables used in
joins
•  Localize columns used within
aggregations

All
•  Have slowly changing data
•  Reasonable size (i.e., few
millions but not 100s of millions
of rows)
•  No common distribution key for
frequent joins
•  Typical use case: joined
dimension table without a
common distribution key

Even
•  Tables not frequently joined or
aggregated
•  Large tables without
acceptable candidate keys
Table skew
•  If data is not spread evenly across slices, "
you have table skew
•  Workload is unbalanced: some nodes will word harder
than others
•  Query is as fast as the slowest slice…
https://docs.aws.amazon.com/redshift/latest/dg/c_analyzing-table-design.html
Goals of sorting
•  Physically sort data within blocks and throughout a table
•  Enable range scans to ignore blocks by leveraging zone
maps
•  Optimal SORTKEY is dependent on:
•  Query patterns
•  Data profile
•  Business requirements
COMPOUND
•  Most common
•  1st column in compound key
always used
•  Prefix / Time-series data
Choosing a SORTKEY
INTERLEAVED
•  No common filter criteria
•  Key columns are “equivalent”
•  No prefix / Non time-series data
•  Primarily as a query predicate (date, identifier, …)
•  Optionally, choose a column frequently used for aggregates
•  Optionally, choose same as distribution key column for most efficient
joins (merge join)
https://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data-compare-sort-styles.html
Compressing data
•  COPY automatically analyzes and compresses data when loading
into empty tables
•  CREATE TABLE AS supports automatic compression (Nov’16)
•  ANALYZE COMPRESSION estimates storage space savings (Nov’16)
•  Be careful: this locks the table!
•  Changing column encoding requires a table rebuild
•  Zstandard compression encoding, very good for large VARCHAR like
product descriptions, JSON, etc. (Jan’17)
https://aws.amazon.com/about-aws/whats-new/2016/11/amazon-redshift-announces-new-data-compression-connection-management-and-data-loading-
features/
https://aws.amazon.com/about-aws/whats-new/2017/01/amazon-redshift-now-supports-the-zstandard-high-data-compression-encoding-and-two-new-
aggregate-functions/
Automatic compression is a good thing (mostly)
“The definition of insanity is doing the same thing over and over again, but
expecting different results”.

- Albert Einstein 

If you have a regular ETL process and you use temporary tables or staging
tables, turn off automatic compression
•  COMPDUPDATE OFF in COPY command
•  Use ANALYZE COMPRESSION to determine the right encodings
•  Use CREATE TABLE … LIKE with the appropriate compression setting
Automatic compression is a good thing (mostly)
•  From the zone maps we know:
•  Which blocks contain the range
•  Which row offsets to scan
•  If SORTKEYs are much more compressed than other
columns:
•  Many rows per block 
•  Large row offset
•  Unnecessary I/O

Consider reducing/not using compression on SORTKEYs "
à less SORTKEYs per block"
à smaller offsets"
à finer granularity"
à less I/O
Optimization: ingestion
Amazon Redshift loading data overview
AWS CloudCorporate data center
Amazon
DynamoDB
Amazon S3
Data
volume
Amazon EMR
Amazon
RDS
Amazon
Redshift
Amazon
Glacier
logs/files
Source DBs
VPN
Connection
AWS Direct
Connect
Amazon S3
Multipart Upload
AWS Import/
Export
Amazon EC2
or on-premises
(using SSH)
Parallelism is a function of load files
Each slice can load one file at a time:
•  Streaming decompression
•  Parse
•  Distribute
•  Write

A single input file means "
only one slice is ingesting data

Realizing only partial cluster usage as 6.25% of slices are active 


0 2 4 6 8 10 12 141 3 5 7 9 11 13 15
Maximize throughput with multiple files
Use at least as many input files as
there are slices in the cluster

With 16 input files, all slices are
working so you maximize
throughput

COPY continues to scale linearly
as you add nodes


0 2 4 6 8 10 12 141 3 5 7 9 11 13 15
Optimization: performance tuning
Statistics
•  Amazon Redshift query optimizer relies on up-to-date
statistics
•  Statistics are necessary only for data that you are accessing
•  Updated stats are important on:
•  SORTKEY
•  DISTKEY
•  Columns in query predicates
ANALYZE tables regularly
•  After every single load for popular columns
•  Weekly for all columns
•  Look at SVV_TABLE_INFO(stats_off) for stale stats
•  Look at stl_alert_event_log for missing stats
•  Look for uncompressed columns

https://docs.aws.amazon.com/redshift/latest/dg/t_Analyzing_tables.html 
https://docs.aws.amazon.com/redshift/latest/dg/r_STL_ALERT_EVENT_LOG.html
VACUUM tables regularly
•  Weekly is a good target
•  Number of unsorted blocks is a good indicator
•  Look at SVV_TABLE_INFO(unsorted, empty)
•  Deep copy might be faster for high percent unsorted
•  20% unsorted or more : usually faster to deep copy
•  CREATE TABLE AS
https://docs.aws.amazon.com/redshift/latest/dg/r_VACUUM_command.html
WLM queue
•  Identify short/long-running queries
and prioritize them
•  Define multiple queues to route
queries appropriately
•  Default concurrency of 5
•  Use wlm_apex_hourly.sql to
tune WLM based on peak
concurrency requirements
Cluster status: commits and WLM
Commit queue
•  How long is your commit queue?"

•  Identify needless transactions
•  Group dependent statements
within a single transaction
•  Offload operational workloads
•  STL_COMMIT_STATS
Open source tools
https://github.com/awslabs/amazon-redshift-utils
https://github.com/awslabs/amazon-redshift-monitoring
https://github.com/awslabs/amazon-redshift-udfs

Admin scripts
Collection of utilities for running diagnostics on your cluster
Admin views
Collection of utilities for managing your cluster, generating schema DDL, etc.
ColumnEncodingUtility
Gives you the ability to apply optimal column encoding to an established schema
with data already loaded
Tuning Your Workload
top_queries.sql
perf_alerts.sql
analyze-vacuum-schema.py
Must-read articles
Amazon Redshift Engineering’s Advanced Table Design Playbook
https://aws.amazon.com/blogs/big-data/amazon-redshift-engineerings-advanced-table-
design-playbook-preamble-prerequisites-and-prioritization/ 

Tutorial: Tuning Table Design
https://docs.aws.amazon.com/redshift/latest/dg/tutorial-tuning-tables.html 

Top 10 Performance Tuning Techniques for Amazon Redshift
https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-techniques-for-
amazon-redshift/ 

Interleaved Sort Keys
https://aws.amazon.com/blogs/aws/quickly-filter-data-in-amazon-redshift-using-
interleaved-sorting/
Julien Simon
julsimon@amazon.fr
@julsimon 
Your feedback 
is important to us!

Más contenido relacionado

La actualidad más candente

Getting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute ServicesGetting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute ServicesAmazon Web Services
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWSAmazon Web Services
 
Melhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon RedshiftMelhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon RedshiftAmazon Web Services LATAM
 
Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)Julien SIMON
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.Amazon Web Services
 
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013Amazon Web Services
 
AWS Webcast - Migrating to RDS Oracle
AWS Webcast - Migrating to RDS OracleAWS Webcast - Migrating to RDS Oracle
AWS Webcast - Migrating to RDS OracleAmazon Web Services
 
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Amazon Web Services
 
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMRAmazon Web Services
 
AWS Webcast - Amazon RDS for Oracle: Best Practices and Migration
AWS Webcast - Amazon RDS for Oracle: Best Practices and Migration  AWS Webcast - Amazon RDS for Oracle: Best Practices and Migration
AWS Webcast - Amazon RDS for Oracle: Best Practices and Migration Amazon Web Services
 
Advanced Task Scheduling with Amazon ECS (June 2017)
Advanced Task Scheduling with Amazon ECS (June 2017)Advanced Task Scheduling with Amazon ECS (June 2017)
Advanced Task Scheduling with Amazon ECS (June 2017)Julien SIMON
 
Advanced Task Scheduling with Amazon ECS (June 2017)
Advanced Task Scheduling with Amazon ECS (June 2017)Advanced Task Scheduling with Amazon ECS (June 2017)
Advanced Task Scheduling with Amazon ECS (June 2017)Julien SIMON
 
AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)
AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)
AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)Amazon Web Services
 
Optimizing costs with spot instances
Optimizing costs with spot instancesOptimizing costs with spot instances
Optimizing costs with spot instancesAmazon Web Services
 
Amazon EC2 Systems Manager (March 2017)
Amazon EC2 Systems Manager (March 2017)Amazon EC2 Systems Manager (March 2017)
Amazon EC2 Systems Manager (March 2017)Julien SIMON
 
Deep Dive on Amazon EC2 Instances - January 2017 AWS Online Tech Talks
Deep Dive on Amazon EC2 Instances - January 2017 AWS Online Tech TalksDeep Dive on Amazon EC2 Instances - January 2017 AWS Online Tech Talks
Deep Dive on Amazon EC2 Instances - January 2017 AWS Online Tech TalksAmazon Web Services
 
NEW LAUNCH! Introducing PostgreSQL compatibility for Amazon Aurora
NEW LAUNCH! Introducing PostgreSQL compatibility for Amazon AuroraNEW LAUNCH! Introducing PostgreSQL compatibility for Amazon Aurora
NEW LAUNCH! Introducing PostgreSQL compatibility for Amazon AuroraAmazon Web Services
 
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...Amazon Web Services
 
Advanced data migration techniques for Amazon RDS
Advanced data migration techniques for Amazon RDSAdvanced data migration techniques for Amazon RDS
Advanced data migration techniques for Amazon RDSTom Laszewski
 

La actualidad más candente (20)

Getting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute ServicesGetting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute Services
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
 
Melhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon RedshiftMelhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon Redshift
 
Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
 
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
 
AWS Webcast - Migrating to RDS Oracle
AWS Webcast - Migrating to RDS OracleAWS Webcast - Migrating to RDS Oracle
AWS Webcast - Migrating to RDS Oracle
 
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
 
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
AWS Webcast - Amazon RDS for Oracle: Best Practices and Migration
AWS Webcast - Amazon RDS for Oracle: Best Practices and Migration  AWS Webcast - Amazon RDS for Oracle: Best Practices and Migration
AWS Webcast - Amazon RDS for Oracle: Best Practices and Migration
 
Advanced Task Scheduling with Amazon ECS (June 2017)
Advanced Task Scheduling with Amazon ECS (June 2017)Advanced Task Scheduling with Amazon ECS (June 2017)
Advanced Task Scheduling with Amazon ECS (June 2017)
 
Advanced Task Scheduling with Amazon ECS (June 2017)
Advanced Task Scheduling with Amazon ECS (June 2017)Advanced Task Scheduling with Amazon ECS (June 2017)
Advanced Task Scheduling with Amazon ECS (June 2017)
 
AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)
AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)
AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)
 
Optimizing costs with spot instances
Optimizing costs with spot instancesOptimizing costs with spot instances
Optimizing costs with spot instances
 
Amazon EC2 Systems Manager (March 2017)
Amazon EC2 Systems Manager (March 2017)Amazon EC2 Systems Manager (March 2017)
Amazon EC2 Systems Manager (March 2017)
 
Deep Dive on Amazon EC2 Instances - January 2017 AWS Online Tech Talks
Deep Dive on Amazon EC2 Instances - January 2017 AWS Online Tech TalksDeep Dive on Amazon EC2 Instances - January 2017 AWS Online Tech Talks
Deep Dive on Amazon EC2 Instances - January 2017 AWS Online Tech Talks
 
NEW LAUNCH! Introducing PostgreSQL compatibility for Amazon Aurora
NEW LAUNCH! Introducing PostgreSQL compatibility for Amazon AuroraNEW LAUNCH! Introducing PostgreSQL compatibility for Amazon Aurora
NEW LAUNCH! Introducing PostgreSQL compatibility for Amazon Aurora
 
Startup Showcase - Mojang
Startup Showcase - MojangStartup Showcase - Mojang
Startup Showcase - Mojang
 
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...
 
Advanced data migration techniques for Amazon RDS
Advanced data migration techniques for Amazon RDSAdvanced data migration techniques for Amazon RDS
Advanced data migration techniques for Amazon RDS
 

Destacado

Amazon AI (March 2017)
Amazon AI (March 2017)Amazon AI (March 2017)
Amazon AI (March 2017)Julien SIMON
 
AWS Security Best Practices (March 2017)
AWS Security Best Practices (March 2017)AWS Security Best Practices (March 2017)
AWS Security Best Practices (March 2017)Julien SIMON
 
Fascinating Tales of a Strange Tomorrow
Fascinating Tales of a Strange TomorrowFascinating Tales of a Strange Tomorrow
Fascinating Tales of a Strange TomorrowJulien SIMON
 
Deep Dive: Amazon Virtual Private Cloud (March 2017)
Deep Dive: Amazon Virtual Private Cloud (March 2017)Deep Dive: Amazon Virtual Private Cloud (March 2017)
Deep Dive: Amazon Virtual Private Cloud (March 2017)Julien SIMON
 
Deep Dive: Amazon Relational Database Service (March 2017)
Deep Dive: Amazon Relational Database Service (March 2017)Deep Dive: Amazon Relational Database Service (March 2017)
Deep Dive: Amazon Relational Database Service (March 2017)Julien SIMON
 
Advanced Task Scheduling with Amazon ECS
Advanced Task Scheduling with Amazon ECSAdvanced Task Scheduling with Amazon ECS
Advanced Task Scheduling with Amazon ECSJulien SIMON
 
Viadeo - Cost Driven Development
Viadeo - Cost Driven DevelopmentViadeo - Cost Driven Development
Viadeo - Cost Driven DevelopmentJulien SIMON
 
IoT: it's all about Data!
IoT: it's all about Data!IoT: it's all about Data!
IoT: it's all about Data!Julien SIMON
 
Building Serverless APIs (January 2017)
Building Serverless APIs (January 2017)Building Serverless APIs (January 2017)
Building Serverless APIs (January 2017)Julien SIMON
 
Building serverless apps with Node.js
Building serverless apps with Node.jsBuilding serverless apps with Node.js
Building serverless apps with Node.jsJulien SIMON
 
Amazon AI (February 2017)
Amazon AI (February 2017)Amazon AI (February 2017)
Amazon AI (February 2017)Julien SIMON
 
Hands-on with AWS IoT
Hands-on with AWS IoTHands-on with AWS IoT
Hands-on with AWS IoTJulien SIMON
 
Continuous Deployment with Amazon Web Services
Continuous Deployment with Amazon Web ServicesContinuous Deployment with Amazon Web Services
Continuous Deployment with Amazon Web ServicesJulien SIMON
 
Infrastructure as code with Amazon Web Services
Infrastructure as code with Amazon Web ServicesInfrastructure as code with Amazon Web Services
Infrastructure as code with Amazon Web ServicesJulien SIMON
 
The AWS DevOps combo (January 2017)
The AWS DevOps combo (January 2017)The AWS DevOps combo (January 2017)
The AWS DevOps combo (January 2017)Julien SIMON
 
Devops with Amazon Web Services (January 2017)
Devops with Amazon Web Services (January 2017)Devops with Amazon Web Services (January 2017)
Devops with Amazon Web Services (January 2017)Julien SIMON
 
Bonnes pratiques pour la gestion des opérations de sécurité AWS
Bonnes pratiques pour la gestion des opérations de sécurité AWSBonnes pratiques pour la gestion des opérations de sécurité AWS
Bonnes pratiques pour la gestion des opérations de sécurité AWSJulien SIMON
 
Attribution d'actions gratuites : nouvelle fiscalité !
Attribution d'actions gratuites : nouvelle fiscalité !Attribution d'actions gratuites : nouvelle fiscalité !
Attribution d'actions gratuites : nouvelle fiscalité !Vie Plus
 

Destacado (20)

Amazon AI (March 2017)
Amazon AI (March 2017)Amazon AI (March 2017)
Amazon AI (March 2017)
 
AWS Security Best Practices (March 2017)
AWS Security Best Practices (March 2017)AWS Security Best Practices (March 2017)
AWS Security Best Practices (March 2017)
 
Fascinating Tales of a Strange Tomorrow
Fascinating Tales of a Strange TomorrowFascinating Tales of a Strange Tomorrow
Fascinating Tales of a Strange Tomorrow
 
Deep Dive: Amazon Virtual Private Cloud (March 2017)
Deep Dive: Amazon Virtual Private Cloud (March 2017)Deep Dive: Amazon Virtual Private Cloud (March 2017)
Deep Dive: Amazon Virtual Private Cloud (March 2017)
 
Deep Dive: Amazon Relational Database Service (March 2017)
Deep Dive: Amazon Relational Database Service (March 2017)Deep Dive: Amazon Relational Database Service (March 2017)
Deep Dive: Amazon Relational Database Service (March 2017)
 
Advanced Task Scheduling with Amazon ECS
Advanced Task Scheduling with Amazon ECSAdvanced Task Scheduling with Amazon ECS
Advanced Task Scheduling with Amazon ECS
 
Viadeo - Cost Driven Development
Viadeo - Cost Driven DevelopmentViadeo - Cost Driven Development
Viadeo - Cost Driven Development
 
IoT: it's all about Data!
IoT: it's all about Data!IoT: it's all about Data!
IoT: it's all about Data!
 
Building Serverless APIs (January 2017)
Building Serverless APIs (January 2017)Building Serverless APIs (January 2017)
Building Serverless APIs (January 2017)
 
Building serverless apps with Node.js
Building serverless apps with Node.jsBuilding serverless apps with Node.js
Building serverless apps with Node.js
 
Amazon AI (February 2017)
Amazon AI (February 2017)Amazon AI (February 2017)
Amazon AI (February 2017)
 
Hands-on with AWS IoT
Hands-on with AWS IoTHands-on with AWS IoT
Hands-on with AWS IoT
 
Continuous Deployment with Amazon Web Services
Continuous Deployment with Amazon Web ServicesContinuous Deployment with Amazon Web Services
Continuous Deployment with Amazon Web Services
 
Infrastructure as code with Amazon Web Services
Infrastructure as code with Amazon Web ServicesInfrastructure as code with Amazon Web Services
Infrastructure as code with Amazon Web Services
 
The AWS DevOps combo (January 2017)
The AWS DevOps combo (January 2017)The AWS DevOps combo (January 2017)
The AWS DevOps combo (January 2017)
 
Devops with Amazon Web Services (January 2017)
Devops with Amazon Web Services (January 2017)Devops with Amazon Web Services (January 2017)
Devops with Amazon Web Services (January 2017)
 
Bonnes pratiques pour la gestion des opérations de sécurité AWS
Bonnes pratiques pour la gestion des opérations de sécurité AWSBonnes pratiques pour la gestion des opérations de sécurité AWS
Bonnes pratiques pour la gestion des opérations de sécurité AWS
 
3 Benefits Of Hiring Marketing Contractors
3 Benefits Of Hiring Marketing Contractors3 Benefits Of Hiring Marketing Contractors
3 Benefits Of Hiring Marketing Contractors
 
Webinar Authority
Webinar AuthorityWebinar Authority
Webinar Authority
 
Attribution d'actions gratuites : nouvelle fiscalité !
Attribution d'actions gratuites : nouvelle fiscalité !Attribution d'actions gratuites : nouvelle fiscalité !
Attribution d'actions gratuites : nouvelle fiscalité !
 

Similar a Deep Dive: Amazon Redshift (March 2017)

AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAmazon Web Services
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features Amazon Web Services
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationVolodymyr Rovetskiy
 
How to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftHow to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftAWS Germany
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Amazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Amazon Web Services
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Amazon Web Services
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseAmazon Web Services
 
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Amazon Web Services
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Web Services
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftAmazon Web Services
 
Optimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceOptimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceAmazon Web Services
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Michael Rys
 

Similar a Deep Dive: Amazon Redshift (March 2017) (20)

AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentation
 
How to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftHow to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon Redshift
 
Redshift overview
Redshift overviewRedshift overview
Redshift overview
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Processing and Analytics
Processing and AnalyticsProcessing and Analytics
Processing and Analytics
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data Warehouse
 
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
Optimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceOptimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak Performance
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 

Más de Julien SIMON

An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceJulien SIMON
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersJulien SIMON
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with TransformersJulien SIMON
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Julien SIMON
 
Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Julien SIMON
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Julien SIMON
 
An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)Julien SIMON
 
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...Julien SIMON
 
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)Julien SIMON
 
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...Julien SIMON
 
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)Julien SIMON
 
Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Julien SIMON
 
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Julien SIMON
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)Julien SIMON
 
Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)Julien SIMON
 
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Julien SIMON
 
Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)Julien SIMON
 
Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)Julien SIMON
 
Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)Julien SIMON
 
Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Julien SIMON
 

Más de Julien SIMON (20)

An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging Face
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)
 
Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)
 
An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)
 
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
 
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
 
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
 
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)
 
Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)
 
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)
 
Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)
 
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
 
Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)
 
Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)
 
Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)
 
Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)
 

Último

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Último (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Deep Dive: Amazon Redshift (March 2017)

  • 1. Deep Dive: Amazon Redshift Julien Simon" Principal Technical Evangelist julsimon@amazon.fr @julsimon
  • 2. Agenda •  Architecture overview •  Optimization •  Schema / table design •  Ingestion •  Maintenance and database tuning
  • 4. Relational data warehouse" Massively parallel; petabyte scale" Fully managed" HDD and SSD platforms" $1,000/TB/year; starts at $0.25/hour Amazon " Redshift a lot faster a lot simpler a lot cheaper
  • 6. Amazon Redshift architecture Leader node •  SQL endpoint •  Stores metadata •  Coordinates query execution Compute nodes •  Local, columnar storage •  Executes queries in parallel •  Load, backup, restore via " Amazon S3; load from " Amazon DynamoDB, Amazon EMR, or SSH Two hardware platforms •  Optimized for data processing •  DS2: HDD; scale from 2 TB to 2 PB •  DC1: SSD; scale from 160 GB to 326 TB 10 GigE (HPC) Ingestion Backup Restore SQL Clients/BI Tools 128GB RAM 16TB disk 16 cores S3/EMR/DynamoDB/SSH JDBC/ODBC 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node Leader Node
  • 7. A deeper look at compute node architecture Each node contains multiple slices •  DS2 – 2 slices on XL, 16 on 8 XL •  DC1 – 2 slices on L, 32 on 8 XL Each slice is allocated CPU and table data Each slice processes a piece of the workload in parallel https://docs.aws.amazon.com/fr_fr/redshift/latest/dg/c_high_level_system_architecture.html
  • 8. The best way to look through large amounts of data is to NOT look through large amounts of data
  • 9. Amazon Redshift dramatically reduces I/O Data compression Zone maps ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375 Calculating SUM(Amount) with " row storage: –  Need to read everything –  Unnecessary I/O ID Age State Amount
  • 10. Amazon Redshift dramatically reduces I/O Data compression Zone maps ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375 Calculating SUM(Amount) with column storage: –  Only scan the necessary blocks ID Age State Amount
  • 11. Amazon Redshift dramatically reduces I/O Column storage Data compression Zone maps Columnar compression –  Effective thanks to similar data –  Reduces storage requirements –  Reduces I/O ID Age State Amount analyze compression orders; Table | Column | Encoding --------+-------------+---------- orders | id | mostly32 orders | age | mostly32 orders | state | lzo orders | amount | mostly32 https://docs.aws.amazon.com/redshift/latest/dg/c_Compression_encodings.html
  • 13. Amazon Redshift dramatically reduces I/O Column storage Data compression Zone maps •  In-memory block metadata •  Contains per-block MIN and MAX value •  Effectively ignores blocks that don’t contain data for a given query •  Minimizes unnecessary I/O ID Age State Amount •  Zone maps work best with sort keys J
  • 15. Keep Your Columns as Narrow as Possible •  Buffers allocated based on declared column width •  Wider than needed columns mean memory is wasted •  Fewer rows fit into memory; increased likelihood of queries spilling to disk •  Check SVV_TABLE_INFO(max_varchar) https://docs.aws.amazon.com/redshift/latest/dg/r_SVV_TABLE_INFO.html
  • 16. Goals of distribution •  Distribute data evenly for parallel processing •  Minimize data movement: co-located joins, localized aggregations Distribution key All Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Full table data on first slice of every node Same key to same location Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Even Round-robin distribution
  • 17. Choosing a distribution style Key •  Large fact tables •  Rapidly changing tables used in joins •  Localize columns used within aggregations All •  Have slowly changing data •  Reasonable size (i.e., few millions but not 100s of millions of rows) •  No common distribution key for frequent joins •  Typical use case: joined dimension table without a common distribution key Even •  Tables not frequently joined or aggregated •  Large tables without acceptable candidate keys
  • 18. Table skew •  If data is not spread evenly across slices, " you have table skew •  Workload is unbalanced: some nodes will word harder than others •  Query is as fast as the slowest slice… https://docs.aws.amazon.com/redshift/latest/dg/c_analyzing-table-design.html
  • 19. Goals of sorting •  Physically sort data within blocks and throughout a table •  Enable range scans to ignore blocks by leveraging zone maps •  Optimal SORTKEY is dependent on: •  Query patterns •  Data profile •  Business requirements
  • 20. COMPOUND •  Most common •  1st column in compound key always used •  Prefix / Time-series data Choosing a SORTKEY INTERLEAVED •  No common filter criteria •  Key columns are “equivalent” •  No prefix / Non time-series data •  Primarily as a query predicate (date, identifier, …) •  Optionally, choose a column frequently used for aggregates •  Optionally, choose same as distribution key column for most efficient joins (merge join) https://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data-compare-sort-styles.html
  • 21. Compressing data •  COPY automatically analyzes and compresses data when loading into empty tables •  CREATE TABLE AS supports automatic compression (Nov’16) •  ANALYZE COMPRESSION estimates storage space savings (Nov’16) •  Be careful: this locks the table! •  Changing column encoding requires a table rebuild •  Zstandard compression encoding, very good for large VARCHAR like product descriptions, JSON, etc. (Jan’17) https://aws.amazon.com/about-aws/whats-new/2016/11/amazon-redshift-announces-new-data-compression-connection-management-and-data-loading- features/ https://aws.amazon.com/about-aws/whats-new/2017/01/amazon-redshift-now-supports-the-zstandard-high-data-compression-encoding-and-two-new- aggregate-functions/
  • 22. Automatic compression is a good thing (mostly) “The definition of insanity is doing the same thing over and over again, but expecting different results”. - Albert Einstein If you have a regular ETL process and you use temporary tables or staging tables, turn off automatic compression •  COMPDUPDATE OFF in COPY command •  Use ANALYZE COMPRESSION to determine the right encodings •  Use CREATE TABLE … LIKE with the appropriate compression setting
  • 23. Automatic compression is a good thing (mostly) •  From the zone maps we know: •  Which blocks contain the range •  Which row offsets to scan •  If SORTKEYs are much more compressed than other columns: •  Many rows per block •  Large row offset •  Unnecessary I/O Consider reducing/not using compression on SORTKEYs " à less SORTKEYs per block" à smaller offsets" à finer granularity" à less I/O
  • 25. Amazon Redshift loading data overview AWS CloudCorporate data center Amazon DynamoDB Amazon S3 Data volume Amazon EMR Amazon RDS Amazon Redshift Amazon Glacier logs/files Source DBs VPN Connection AWS Direct Connect Amazon S3 Multipart Upload AWS Import/ Export Amazon EC2 or on-premises (using SSH)
  • 26. Parallelism is a function of load files Each slice can load one file at a time: •  Streaming decompression •  Parse •  Distribute •  Write A single input file means " only one slice is ingesting data Realizing only partial cluster usage as 6.25% of slices are active 0 2 4 6 8 10 12 141 3 5 7 9 11 13 15
  • 27. Maximize throughput with multiple files Use at least as many input files as there are slices in the cluster With 16 input files, all slices are working so you maximize throughput COPY continues to scale linearly as you add nodes 0 2 4 6 8 10 12 141 3 5 7 9 11 13 15
  • 29. Statistics •  Amazon Redshift query optimizer relies on up-to-date statistics •  Statistics are necessary only for data that you are accessing •  Updated stats are important on: •  SORTKEY •  DISTKEY •  Columns in query predicates
  • 30. ANALYZE tables regularly •  After every single load for popular columns •  Weekly for all columns •  Look at SVV_TABLE_INFO(stats_off) for stale stats •  Look at stl_alert_event_log for missing stats •  Look for uncompressed columns https://docs.aws.amazon.com/redshift/latest/dg/t_Analyzing_tables.html https://docs.aws.amazon.com/redshift/latest/dg/r_STL_ALERT_EVENT_LOG.html
  • 31. VACUUM tables regularly •  Weekly is a good target •  Number of unsorted blocks is a good indicator •  Look at SVV_TABLE_INFO(unsorted, empty) •  Deep copy might be faster for high percent unsorted •  20% unsorted or more : usually faster to deep copy •  CREATE TABLE AS https://docs.aws.amazon.com/redshift/latest/dg/r_VACUUM_command.html
  • 32. WLM queue •  Identify short/long-running queries and prioritize them •  Define multiple queues to route queries appropriately •  Default concurrency of 5 •  Use wlm_apex_hourly.sql to tune WLM based on peak concurrency requirements Cluster status: commits and WLM Commit queue •  How long is your commit queue?" •  Identify needless transactions •  Group dependent statements within a single transaction •  Offload operational workloads •  STL_COMMIT_STATS
  • 33. Open source tools https://github.com/awslabs/amazon-redshift-utils https://github.com/awslabs/amazon-redshift-monitoring https://github.com/awslabs/amazon-redshift-udfs Admin scripts Collection of utilities for running diagnostics on your cluster Admin views Collection of utilities for managing your cluster, generating schema DDL, etc. ColumnEncodingUtility Gives you the ability to apply optimal column encoding to an established schema with data already loaded
  • 35. Must-read articles Amazon Redshift Engineering’s Advanced Table Design Playbook https://aws.amazon.com/blogs/big-data/amazon-redshift-engineerings-advanced-table- design-playbook-preamble-prerequisites-and-prioritization/ Tutorial: Tuning Table Design https://docs.aws.amazon.com/redshift/latest/dg/tutorial-tuning-tables.html Top 10 Performance Tuning Techniques for Amazon Redshift https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-techniques-for- amazon-redshift/ Interleaved Sort Keys https://aws.amazon.com/blogs/aws/quickly-filter-data-in-amazon-redshift-using- interleaved-sorting/
  • 36. Julien Simon julsimon@amazon.fr @julsimon Your feedback is important to us!