SlideShare una empresa de Scribd logo
1 de 54
Descargar para leer sin conexión
Azure
SQL Data Warehouse
1982 I started working with computers
1988 I started my professional career in computers industry
1996 I started working with SQL Server 6.0
1998 I earned my first certification at Microsoft as
Microsoft Certified Solution Developer (3rd in Greece)
1999 I started my career as Microsoft Certified Trainer (MCT) with
more than 30.000 hours of training until now!
2010 I became for first time Microsoft MVP on Data Platform
I created the SQL School Greece www.sqlschool.gr
2012 I became MCT Regional Lead by Microsoft Learning Program.
2013 I was certified as MCSE : Data Platform
I was certified as MCSE : Business Intelligence
2016 I was certified as MCSE: Data Management & Analytics
Antonios
Chatzipavlis
SQL Server Expert and Evangelist
Data Platform MVP
MCT, MCSE, MCITP, MCPD, MCSD, MCDBA,
MCSA, MCTS, MCAD, MCP, OCA, ITIL-F
Μια πηγή ενημέρωσης για τον Microsoft SQL Server
προς τους Έλληνες IT Professionals, DBAs,
Developers, Information Workers αλλά και απλούς
χομπίστες που απλά τους αρέσει ο SQL Server.
Help line : help@sqlschool.gr
• Articles about SQL Server
• SQL Server News
• SQL Nights
• Webcasts
• Downloads
• Resources
What we are doing here Follow us in socials
fb/sqlschoolgr
fb/groups/sqlschool
@antoniosch
@sqlschool
yt/c/SqlschoolGr
SQL School Greece group
S E L E C T K N O W L E D G E F R O M S Q L S E R V E R
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Presentation Content
5
• First Look on Azure SQL DW
• Designing for Azure SQL DW
• Loading Data on Azure SQL DW
• Querying and Tuning Azure SQL DW
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
First Look on
Azure SQL Data Warehouse
6
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
What is Azure SQL Data Warehouse?
7
Service in
Microsoft Azure
It’s a PAAS
offering
It’s a Massively
Parallel Processing
System
Distribute
Storage
Distributed
Compute
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
SMP vs MPP
8
Symmetric Multiprocessing Massively Parallel Processing
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Data Warehousing Unit
9
A measure of the
underlying compute
power of the database
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Data Warehousing Unit
10
For Example
50 100
100 DWU 500 DWU
3 table loaded in 15 min
20 minutes to run a report
3 table loaded in 3 min
4 minutes to run a report
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Why Choose Cloud Over On-Premises DW?
11
• Doesn’t need large CAPEX to get started
• Doesn’t need large OPEX
• We can scale storage and compute up or down
on demand
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
What and How do you pay for this Service ?
12
• Storage
– Storage is billed by GB
– Standard or Premium Geo Redundant
– No cost for storage transactions
– Outbound data transfer is billed
• Compute Power
– Compute is billed by DWUs
– Can go from 100 to 2000
– Billed per hour
When not in use, compute
power of the DW can be
completely paused for
maximum savings
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Provisioning Azure SQL Data Warehouse
13
Select a
Region
Select or
Create a
Server
Pick
origin of
the data
Pick
DWU
level
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Methods of Provisioning
14
• Azure Portal
– Select New > Data + Storage
• PowerShell
– New AzureRmSqlDatabase Cmdlet
• T-SQL
– CREATE DATABASE Command
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Provision a Data Warehouse
15
DEMO
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Designing for
Azure SQL Data Warehouse
16
SQL Server Azure SQL DW!=
An Azure SQL DW database requires design
decisions that are different from SQL Server
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Distribution Key
18
Determines the method in which Azure
SQL Data Warehouse spreads the data
across multiple nodes
Azure SQL Data Warehouse
uses up to 60 distributions
when loading data into the
system
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Hash Distribution
19
RecordNo CustomerID InvoiceDate
1 1000 2017-04-21
2 1000 2017-04-22
3 2000 2017-04-22
4 3000 2017-04-22
5 4000 2017-04-22
Hashing by CustomerID
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Round-Robin Distribution
20
RecordNo CustomerID InvoiceDate
1 1000 2017-04-21
2 1000 2017-04-22
3 2000 2017-04-22
4 3000 2017-04-22
5 4000 2017-04-22
Rows distributed to all nodes
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Data Distribution best practice
21
Even DistributionOdd Distribution
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Good Hash Key
22
Distributes
Evenly
Used for
Grouping
Used as
Join Condition
Is Not
Updated
Has more than
60
distinct values
Round-Robin will always provide a uniform distribution but not necessarily the best performance
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Data Types
23
Use the smallest
data type which will
support your data
Avoid defining all
character columns
to a large default
length
Define columns as
VARCHAR instead of
NVARCHAR if you
don’t need Unicode
The goal is to not only save space but also move data as efficiently as possible
Some complex data types (xml, geography, etc) are not supported yet
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Table Types
24
Clustered
Columnstore
Default table
type
High
compression
ratio
Ideally
Segments of
1M rows
No secondary
indexes Heap
No index on
the data
Fast Load
No
compression
Allows
secondrary
indexes
Clustered
B-Tree
Sorted index
on the data
Fast singleton
lookup
No
compression
Allows
secondary
indexes
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Table Partitioning
25
1. Ease of loading and removal of data from a partitioned table
2. Targeting specific partitions on table maintenance operation
3. Performance improvements due to partition elimination
Partitioning is very common in SQL Server Data Warehouses for three reasons:
A highly granular partitioning scheme can work in SQL Server but hurt performance in Azure SQL DW
60 Distributions 365 Partitions 21.900 Data Buckets
21.900 Data Buckets Ideal Segment Size
(1M Rows)
21.900.000.000
Rows
Lower Granularity (week, month) can perform better depending on how much data you have
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
How do we apply these principles to a Dimensional Model?
26
• Fact Table
– Large ones are better as Columnstores
– Distributed through Has key as much as possible as long as it is even
– Partitioned only if the is large enough to fill up each segment
• Dimension Tables
– Can be Hash distributed or Round-Robin if there is no clear candidate
join key
– Columnstore for large dimensions
– Heap or Clustered Index for small dimensions
– Add secondary indexes for alternate join columns
– Partitioning not recommended
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Analyzing distribution and data types for DW tables
27
DEMO
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Loading Data on
Azure SQL Data Warehouse
28
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Loading an MPP System
29
The main principle of loading
data into Azure DW is to do as
much work in parallel as possible
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Data Warehouse Readers
30
100 200 300 400 500 600 1000 1200 1500 2000
Readers 8 16 24 32 40 48 60 60 60 60
Writers 60 60 60 60 60 60 60 60 60 60
DWU
Your DWUs have a direct impact on how fast you can load data in parallel
- Azure SQL Data Warehouse introduces the concept of Data Warehouse
Readers.
- These are threads that will be reading data in parallel and then passing it
off to Writer threads.
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Optimize Insert Batch Size
31
• Avoid trickle insert pattern
– Ideal batch size is 1 million or more direct or in a file
• Avoid Ordered Data
– Data ordered by distribution key can introduce hot spots that slow down the load
operation
• Using Temporary Tables
– Stage and transform on a Temp Heap table before moving to permanent storage
• Use the CREATE TABLE AS statement
– Fully parallel operation
– It’s minimally logged
– It can change: distribution, table type, partitioning
CREATE TABLE #fact_tmp
WITH
(
DISTRIBUTION = ROUND_ROBIN
)
AS
SELECT *
FROM dbo.FactInternetSales;
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
User Resource Class
33
Class Smallrc Mediumrc Largerc Xlargerc
Default 8 16 24 32
Memory 100 MB 100-1600 MB 200-3200 MB 400-6400 MB
The lower range corresponds to DWU100 the upper range to DWU2000
User Resource classes as database roles that govern how many resources
are given to a query
For fast and high quality loads create a user just for loading which utilize a medium or large RC
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Loading Methods
34
• Single-client loading methods
– SSIS
– Azure Data Factory
– BCP
– Can add some parallel capabilities but are bottleneck at the Control node
• Parallel readers loading methods
– PolyBase
– Reads from Azure Blob Storage and loads the content into Azure SQL DW
– Bypasses the Control node and loads directly into the Compute Nodes
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Control Node
35
The Control Node
receives connections
and orchestrates the
queries
The Compute Nodes
do processing on the
data and scale with
the DWUs
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Loading with SSIS
36
SSIS Control
Node
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Loading data with SSIS
37
DEMO
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Loading with PolyBase
38
Control
Node Azure
Blob Storage
PolyBase can load data from
UTF-8 delimited text files and
popular Hadoop file formats
(RC file, ORC and Parquet)
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Loading data with PolyBase
39
DEMO
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Migration Utility
40
• Supports SQL Server 2012+ and
Azure SQL Database
• Provides a migration report pointing
out possible issues
• Assists with schema migration
• Assists with data migration
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Using the Azure SQL DW migration utility
41
DEMO
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Querying and Tuning
Azure SQL Data Warehouse
42
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Workload Management Principles
43
User Resource
Class
Concurrency
Model
Transaction Size
TwoMaximumLimits
1024 Connections
32 Concurrent Queries
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Concurrency Queries and Concurrency Slots
44
100 200 300 400 500 600 1000 1200 1500 2000
Slots 4 8 12 16 20 24 40 48 60 80
DWU
Queries Executing Queries Incoming Queries Queued
DW200 7 2 1
DW1000 32 2 2
Examples
The above examples assumes that each query is consuming 1 concurrency slot
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Resource Class and Concurrency Slots
45
Class Smallrc Mediumrc Largerc Xlargerc
DWUs 100-2000 100-2000 100-2000 100-2000
Slots 1 1-6 2-32 4-64
SELECT queries against system views, stats and other management commands do not use concurrency slots
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Transaction Size Limits
46
100 200 300 400 500 600 1000 1200 1500 2000
GB /
Distribution
1 1,5 2,25 3 3,75 4,5 7,5 9 11,25 15
DWU
A DW200 transaction doing equal work per distribution could
consume 60 x 1,5 GB = 90 GB of space
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Maintaining Statistics
47
• The service does not create or maintain stats
automatically 
• Creating New stats
– Sampled single column stats is a good start
– Multi columns stats for joins involving multiple columns
– Focus on columns used in JOINs, GROUP BY, HAVING and WHERE clauses
– Increase the sample if necessary
• Updating existing stats
– If new dates or dimension categories added
– If new data loads have completed
– If an UPDATE or DELETE changes the distribution of data
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Index Defrag
48
• Heap
– Does not have a defrag option
• B-Tree Index
– Useful for removing low levels of fragmentation
• Columnstore
– Proactively compresses CLOSED rowgroups
• On a large table with heavy fragmentation it is often faster to recreate the
table with the CREATE TABLE AS SELECT and switch it with the older
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Index Rebuild
49
• Heap
– Can be rebuilt to remove forward pointers
• B-Tree Index
– Will remove high levels of fragmentation
• Columnstore
– Can increase the density of segments
• Rebuilding as index is an OFFLINE operation in Azure SQL DW
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Scaling Performance
50
• Increase the User Resource Class
– EXEC sp_addrolemember ‘largerc’, ‘loaduser’;
– Higher Resource Class – more memory and CPU
– More concurrency slots – less concurrent queries
– The highest role assigned takes precedence
• Increase the Data Warehouse Units
– ALTER DATABASE AWDW MODIFY (SERVICE_OBJECTIVE=‘DW1000’);
– It is an OFFLINE operation
– Make sure there are no loads or transactions in progress
– Can also be done through the Azure Portal
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Tracking Queries with Lables
51
SELECT sum(Qty)
FROM dbo.FactInternetSales
OPTION (LABEL=‘mylabel’);
SELECT *
FROM sys.dm_pdw_exec_requests
WHERE label=‘mylabel’);
User Query
Admin Query
Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Labeling a query and tracking its execution
52
DEMO
https://aka.ms/cc9cf1
Thank You
S E L E C T K N O W L E D G E F R O M S Q L S E R V E R

Más contenido relacionado

La actualidad más candente

Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn
 

La actualidad más candente (20)

Migrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeMigrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data Lake
 
Introduction to Hadoop and Spark - اسلاید کارگاه آموزش هدوپ و اسپارک شیراز
Introduction to Hadoop and Spark - اسلاید کارگاه آموزش هدوپ و اسپارک شیرازIntroduction to Hadoop and Spark - اسلاید کارگاه آموزش هدوپ و اسپارک شیراز
Introduction to Hadoop and Spark - اسلاید کارگاه آموزش هدوپ و اسپارک شیراز
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With Data
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
PySpark in practice slides
PySpark in practice slidesPySpark in practice slides
PySpark in practice slides
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
Big data on aws
Big data on awsBig data on aws
Big data on aws
 
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI Initiatives
 

Destacado (10)

Introduction to azure document db
Introduction to azure document dbIntroduction to azure document db
Introduction to azure document db
 
Exploring sql server 2016
Exploring sql server 2016Exploring sql server 2016
Exploring sql server 2016
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 
Row level security
Row level securityRow level security
Row level security
 
Introduction to sql database on azure
Introduction to sql database on azureIntroduction to sql database on azure
Introduction to sql database on azure
 
Live Query Statistics & Query Store in SQL Server 2016
Live Query Statistics & Query Store in SQL Server 2016Live Query Statistics & Query Store in SQL Server 2016
Live Query Statistics & Query Store in SQL Server 2016
 
Introduction to Machine Learning on Azure
Introduction to Machine Learning on AzureIntroduction to Machine Learning on Azure
Introduction to Machine Learning on Azure
 
Exploring sql server 2016 bi
Exploring sql server 2016 biExploring sql server 2016 bi
Exploring sql server 2016 bi
 
Dynamic data masking sql server 2016
Dynamic data masking sql server 2016Dynamic data masking sql server 2016
Dynamic data masking sql server 2016
 
Microsoft SQL Family and GDPR
Microsoft SQL Family and GDPRMicrosoft SQL Family and GDPR
Microsoft SQL Family and GDPR
 

Similar a Azure SQL Data Warehouse

Similar a Azure SQL Data Warehouse (20)

Optimiser votre infrastructure SQL Server avec Azure
Optimiser votre infrastructure SQL Server avec AzureOptimiser votre infrastructure SQL Server avec Azure
Optimiser votre infrastructure SQL Server avec Azure
 
Windows on AWS
Windows on AWSWindows on AWS
Windows on AWS
 
Reference for data migration pls choose and
Reference for data migration pls choose andReference for data migration pls choose and
Reference for data migration pls choose and
 
SQL to Azure Migrations
SQL to Azure MigrationsSQL to Azure Migrations
SQL to Azure Migrations
 
Afternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data ServicesAfternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data Services
 
Sql server 2016 Discovery Day
Sql server 2016 Discovery DaySql server 2016 Discovery Day
Sql server 2016 Discovery Day
 
Azure SQL DWH
Azure SQL DWHAzure SQL DWH
Azure SQL DWH
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure Infrastructures
 
Integration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeIntegration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data Lake
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BI
 
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSISMicrosoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
 
AZURE Data Related Services
AZURE Data Related ServicesAZURE Data Related Services
AZURE Data Related Services
 
2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure
 
Azure fundamental -Introduction
Azure fundamental -IntroductionAzure fundamental -Introduction
Azure fundamental -Introduction
 
Analyzing StackExchange Data with Azure Data Lake (Tom Kerkhove @ Integration...
Analyzing StackExchange Data with Azure Data Lake (Tom Kerkhove @ Integration...Analyzing StackExchange Data with Azure Data Lake (Tom Kerkhove @ Integration...
Analyzing StackExchange Data with Azure Data Lake (Tom Kerkhove @ Integration...
 
A to z for sql azure databases
A to z for sql azure databasesA to z for sql azure databases
A to z for sql azure databases
 
Azure SQL Data Warehouse for beginners
Azure SQL Data Warehouse for beginnersAzure SQL Data Warehouse for beginners
Azure SQL Data Warehouse for beginners
 
Introduction to Microsoft Azure
Introduction to Microsoft AzureIntroduction to Microsoft Azure
Introduction to Microsoft Azure
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL Azure
 

Más de Antonios Chatzipavlis

Más de Antonios Chatzipavlis (20)

Data virtualization using polybase
Data virtualization using polybaseData virtualization using polybase
Data virtualization using polybase
 
SQL server Backup Restore Revealed
SQL server Backup Restore RevealedSQL server Backup Restore Revealed
SQL server Backup Restore Revealed
 
Migrate SQL Workloads to Azure
Migrate SQL Workloads to AzureMigrate SQL Workloads to Azure
Migrate SQL Workloads to Azure
 
Machine Learning in SQL Server 2019
Machine Learning in SQL Server 2019Machine Learning in SQL Server 2019
Machine Learning in SQL Server 2019
 
Workload Management in SQL Server 2019
Workload Management in SQL Server 2019Workload Management in SQL Server 2019
Workload Management in SQL Server 2019
 
Loading Data into Azure SQL DW (Synapse Analytics)
Loading Data into Azure SQL DW (Synapse Analytics)Loading Data into Azure SQL DW (Synapse Analytics)
Loading Data into Azure SQL DW (Synapse Analytics)
 
Introduction to DAX Language
Introduction to DAX LanguageIntroduction to DAX Language
Introduction to DAX Language
 
Building diagnostic queries using DMVs and DMFs
Building diagnostic queries using DMVs and DMFs Building diagnostic queries using DMVs and DMFs
Building diagnostic queries using DMVs and DMFs
 
Exploring T-SQL Anti-Patterns
Exploring T-SQL Anti-Patterns Exploring T-SQL Anti-Patterns
Exploring T-SQL Anti-Patterns
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
SQLServer Database Structures
SQLServer Database Structures SQLServer Database Structures
SQLServer Database Structures
 
Sqlschool 2017 recap - 2018 plans
Sqlschool 2017 recap - 2018 plansSqlschool 2017 recap - 2018 plans
Sqlschool 2017 recap - 2018 plans
 
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018 Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
 
Statistics and Indexes Internals
Statistics and Indexes InternalsStatistics and Indexes Internals
Statistics and Indexes Internals
 
Implementing Mobile Reports in SQL Sserver 2016 Reporting Services
Implementing Mobile Reports in SQL Sserver 2016 Reporting ServicesImplementing Mobile Reports in SQL Sserver 2016 Reporting Services
Implementing Mobile Reports in SQL Sserver 2016 Reporting Services
 
Auditing Data Access in SQL Server
Auditing Data Access in SQL ServerAuditing Data Access in SQL Server
Auditing Data Access in SQL Server
 
Stretch db sql server 2016 (sn0028)
Stretch db   sql server 2016 (sn0028)Stretch db   sql server 2016 (sn0028)
Stretch db sql server 2016 (sn0028)
 
Troubleshooting sql server
Troubleshooting sql serverTroubleshooting sql server
Troubleshooting sql server
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

Azure SQL Data Warehouse

  • 1.
  • 3. 1982 I started working with computers 1988 I started my professional career in computers industry 1996 I started working with SQL Server 6.0 1998 I earned my first certification at Microsoft as Microsoft Certified Solution Developer (3rd in Greece) 1999 I started my career as Microsoft Certified Trainer (MCT) with more than 30.000 hours of training until now! 2010 I became for first time Microsoft MVP on Data Platform I created the SQL School Greece www.sqlschool.gr 2012 I became MCT Regional Lead by Microsoft Learning Program. 2013 I was certified as MCSE : Data Platform I was certified as MCSE : Business Intelligence 2016 I was certified as MCSE: Data Management & Analytics Antonios Chatzipavlis SQL Server Expert and Evangelist Data Platform MVP MCT, MCSE, MCITP, MCPD, MCSD, MCDBA, MCSA, MCTS, MCAD, MCP, OCA, ITIL-F
  • 4. Μια πηγή ενημέρωσης για τον Microsoft SQL Server προς τους Έλληνες IT Professionals, DBAs, Developers, Information Workers αλλά και απλούς χομπίστες που απλά τους αρέσει ο SQL Server. Help line : help@sqlschool.gr • Articles about SQL Server • SQL Server News • SQL Nights • Webcasts • Downloads • Resources What we are doing here Follow us in socials fb/sqlschoolgr fb/groups/sqlschool @antoniosch @sqlschool yt/c/SqlschoolGr SQL School Greece group S E L E C T K N O W L E D G E F R O M S Q L S E R V E R
  • 5. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Presentation Content 5 • First Look on Azure SQL DW • Designing for Azure SQL DW • Loading Data on Azure SQL DW • Querying and Tuning Azure SQL DW
  • 6. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 First Look on Azure SQL Data Warehouse 6
  • 7. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 What is Azure SQL Data Warehouse? 7 Service in Microsoft Azure It’s a PAAS offering It’s a Massively Parallel Processing System Distribute Storage Distributed Compute
  • 8. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 SMP vs MPP 8 Symmetric Multiprocessing Massively Parallel Processing
  • 9. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Data Warehousing Unit 9 A measure of the underlying compute power of the database
  • 10. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Data Warehousing Unit 10 For Example 50 100 100 DWU 500 DWU 3 table loaded in 15 min 20 minutes to run a report 3 table loaded in 3 min 4 minutes to run a report
  • 11. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Why Choose Cloud Over On-Premises DW? 11 • Doesn’t need large CAPEX to get started • Doesn’t need large OPEX • We can scale storage and compute up or down on demand
  • 12. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 What and How do you pay for this Service ? 12 • Storage – Storage is billed by GB – Standard or Premium Geo Redundant – No cost for storage transactions – Outbound data transfer is billed • Compute Power – Compute is billed by DWUs – Can go from 100 to 2000 – Billed per hour When not in use, compute power of the DW can be completely paused for maximum savings
  • 13. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Provisioning Azure SQL Data Warehouse 13 Select a Region Select or Create a Server Pick origin of the data Pick DWU level
  • 14. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Methods of Provisioning 14 • Azure Portal – Select New > Data + Storage • PowerShell – New AzureRmSqlDatabase Cmdlet • T-SQL – CREATE DATABASE Command
  • 15. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Provision a Data Warehouse 15 DEMO
  • 16. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Designing for Azure SQL Data Warehouse 16
  • 17. SQL Server Azure SQL DW!= An Azure SQL DW database requires design decisions that are different from SQL Server
  • 18. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Distribution Key 18 Determines the method in which Azure SQL Data Warehouse spreads the data across multiple nodes Azure SQL Data Warehouse uses up to 60 distributions when loading data into the system
  • 19. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Hash Distribution 19 RecordNo CustomerID InvoiceDate 1 1000 2017-04-21 2 1000 2017-04-22 3 2000 2017-04-22 4 3000 2017-04-22 5 4000 2017-04-22 Hashing by CustomerID
  • 20. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Round-Robin Distribution 20 RecordNo CustomerID InvoiceDate 1 1000 2017-04-21 2 1000 2017-04-22 3 2000 2017-04-22 4 3000 2017-04-22 5 4000 2017-04-22 Rows distributed to all nodes
  • 21. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Data Distribution best practice 21 Even DistributionOdd Distribution
  • 22. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Good Hash Key 22 Distributes Evenly Used for Grouping Used as Join Condition Is Not Updated Has more than 60 distinct values Round-Robin will always provide a uniform distribution but not necessarily the best performance
  • 23. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Data Types 23 Use the smallest data type which will support your data Avoid defining all character columns to a large default length Define columns as VARCHAR instead of NVARCHAR if you don’t need Unicode The goal is to not only save space but also move data as efficiently as possible Some complex data types (xml, geography, etc) are not supported yet
  • 24. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Table Types 24 Clustered Columnstore Default table type High compression ratio Ideally Segments of 1M rows No secondary indexes Heap No index on the data Fast Load No compression Allows secondrary indexes Clustered B-Tree Sorted index on the data Fast singleton lookup No compression Allows secondary indexes
  • 25. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Table Partitioning 25 1. Ease of loading and removal of data from a partitioned table 2. Targeting specific partitions on table maintenance operation 3. Performance improvements due to partition elimination Partitioning is very common in SQL Server Data Warehouses for three reasons: A highly granular partitioning scheme can work in SQL Server but hurt performance in Azure SQL DW 60 Distributions 365 Partitions 21.900 Data Buckets 21.900 Data Buckets Ideal Segment Size (1M Rows) 21.900.000.000 Rows Lower Granularity (week, month) can perform better depending on how much data you have
  • 26. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 How do we apply these principles to a Dimensional Model? 26 • Fact Table – Large ones are better as Columnstores – Distributed through Has key as much as possible as long as it is even – Partitioned only if the is large enough to fill up each segment • Dimension Tables – Can be Hash distributed or Round-Robin if there is no clear candidate join key – Columnstore for large dimensions – Heap or Clustered Index for small dimensions – Add secondary indexes for alternate join columns – Partitioning not recommended
  • 27. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Analyzing distribution and data types for DW tables 27 DEMO
  • 28. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Loading Data on Azure SQL Data Warehouse 28
  • 29. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Loading an MPP System 29 The main principle of loading data into Azure DW is to do as much work in parallel as possible
  • 30. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Data Warehouse Readers 30 100 200 300 400 500 600 1000 1200 1500 2000 Readers 8 16 24 32 40 48 60 60 60 60 Writers 60 60 60 60 60 60 60 60 60 60 DWU Your DWUs have a direct impact on how fast you can load data in parallel - Azure SQL Data Warehouse introduces the concept of Data Warehouse Readers. - These are threads that will be reading data in parallel and then passing it off to Writer threads.
  • 31. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Optimize Insert Batch Size 31 • Avoid trickle insert pattern – Ideal batch size is 1 million or more direct or in a file • Avoid Ordered Data – Data ordered by distribution key can introduce hot spots that slow down the load operation • Using Temporary Tables – Stage and transform on a Temp Heap table before moving to permanent storage • Use the CREATE TABLE AS statement – Fully parallel operation – It’s minimally logged – It can change: distribution, table type, partitioning
  • 32. CREATE TABLE #fact_tmp WITH ( DISTRIBUTION = ROUND_ROBIN ) AS SELECT * FROM dbo.FactInternetSales;
  • 33. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 User Resource Class 33 Class Smallrc Mediumrc Largerc Xlargerc Default 8 16 24 32 Memory 100 MB 100-1600 MB 200-3200 MB 400-6400 MB The lower range corresponds to DWU100 the upper range to DWU2000 User Resource classes as database roles that govern how many resources are given to a query For fast and high quality loads create a user just for loading which utilize a medium or large RC
  • 34. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Loading Methods 34 • Single-client loading methods – SSIS – Azure Data Factory – BCP – Can add some parallel capabilities but are bottleneck at the Control node • Parallel readers loading methods – PolyBase – Reads from Azure Blob Storage and loads the content into Azure SQL DW – Bypasses the Control node and loads directly into the Compute Nodes
  • 35. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Control Node 35 The Control Node receives connections and orchestrates the queries The Compute Nodes do processing on the data and scale with the DWUs
  • 36. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Loading with SSIS 36 SSIS Control Node
  • 37. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Loading data with SSIS 37 DEMO
  • 38. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Loading with PolyBase 38 Control Node Azure Blob Storage PolyBase can load data from UTF-8 delimited text files and popular Hadoop file formats (RC file, ORC and Parquet)
  • 39. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Loading data with PolyBase 39 DEMO
  • 40. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Migration Utility 40 • Supports SQL Server 2012+ and Azure SQL Database • Provides a migration report pointing out possible issues • Assists with schema migration • Assists with data migration
  • 41. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Using the Azure SQL DW migration utility 41 DEMO
  • 42. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Querying and Tuning Azure SQL Data Warehouse 42
  • 43. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Workload Management Principles 43 User Resource Class Concurrency Model Transaction Size TwoMaximumLimits 1024 Connections 32 Concurrent Queries
  • 44. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Concurrency Queries and Concurrency Slots 44 100 200 300 400 500 600 1000 1200 1500 2000 Slots 4 8 12 16 20 24 40 48 60 80 DWU Queries Executing Queries Incoming Queries Queued DW200 7 2 1 DW1000 32 2 2 Examples The above examples assumes that each query is consuming 1 concurrency slot
  • 45. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Resource Class and Concurrency Slots 45 Class Smallrc Mediumrc Largerc Xlargerc DWUs 100-2000 100-2000 100-2000 100-2000 Slots 1 1-6 2-32 4-64 SELECT queries against system views, stats and other management commands do not use concurrency slots
  • 46. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Transaction Size Limits 46 100 200 300 400 500 600 1000 1200 1500 2000 GB / Distribution 1 1,5 2,25 3 3,75 4,5 7,5 9 11,25 15 DWU A DW200 transaction doing equal work per distribution could consume 60 x 1,5 GB = 90 GB of space
  • 47. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Maintaining Statistics 47 • The service does not create or maintain stats automatically  • Creating New stats – Sampled single column stats is a good start – Multi columns stats for joins involving multiple columns – Focus on columns used in JOINs, GROUP BY, HAVING and WHERE clauses – Increase the sample if necessary • Updating existing stats – If new dates or dimension categories added – If new data loads have completed – If an UPDATE or DELETE changes the distribution of data
  • 48. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Index Defrag 48 • Heap – Does not have a defrag option • B-Tree Index – Useful for removing low levels of fragmentation • Columnstore – Proactively compresses CLOSED rowgroups • On a large table with heavy fragmentation it is often faster to recreate the table with the CREATE TABLE AS SELECT and switch it with the older
  • 49. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Index Rebuild 49 • Heap – Can be rebuilt to remove forward pointers • B-Tree Index – Will remove high levels of fragmentation • Columnstore – Can increase the density of segments • Rebuilding as index is an OFFLINE operation in Azure SQL DW
  • 50. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Scaling Performance 50 • Increase the User Resource Class – EXEC sp_addrolemember ‘largerc’, ‘loaduser’; – Higher Resource Class – more memory and CPU – More concurrency slots – less concurrent queries – The highest role assigned takes precedence • Increase the Data Warehouse Units – ALTER DATABASE AWDW MODIFY (SERVICE_OBJECTIVE=‘DW1000’); – It is an OFFLINE operation – Make sure there are no loads or transactions in progress – Can also be done through the Azure Portal
  • 51. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Tracking Queries with Lables 51 SELECT sum(Qty) FROM dbo.FactInternetSales OPTION (LABEL=‘mylabel’); SELECT * FROM sys.dm_pdw_exec_requests WHERE label=‘mylabel’); User Query Admin Query
  • 52. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017 Labeling a query and tracking its execution 52 DEMO
  • 54. Thank You S E L E C T K N O W L E D G E F R O M S Q L S E R V E R