3. 1982 I started working with computers
1988 I started my professional career in computers industry
1996 I started working with SQL Server 6.0
1998 I earned my first certification at Microsoft as
Microsoft Certified Solution Developer (3rd in Greece)
1999 I started my career as Microsoft Certified Trainer (MCT) with
more than 30.000 hours of training until now!
2010 I became for first time Microsoft MVP on Data Platform
I created the SQL School Greece www.sqlschool.gr
2012 I became MCT Regional Lead by Microsoft Learning Program.
2013 I was certified as MCSE : Data Platform
I was certified as MCSE : Business Intelligence
2016 I was certified as MCSE: Data Management & Analytics
Antonios
Chatzipavlis
SQL Server Expert and Evangelist
Data Platform MVP
MCT, MCSE, MCITP, MCPD, MCSD, MCDBA,
MCSA, MCTS, MCAD, MCP, OCA, ITIL-F
4. Μια πηγή ενημέρωσης για τον Microsoft SQL Server
προς τους Έλληνες IT Professionals, DBAs,
Developers, Information Workers αλλά και απλούς
χομπίστες που απλά τους αρέσει ο SQL Server.
Help line : help@sqlschool.gr
• Articles about SQL Server
• SQL Server News
• SQL Nights
• Webcasts
• Downloads
• Resources
What we are doing here Follow us in socials
fb/sqlschoolgr
fb/groups/sqlschool
@antoniosch
@sqlschool
yt/c/SqlschoolGr
SQL School Greece group
S E L E C T K N O W L E D G E F R O M S Q L S E R V E R
5. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Presentation Content
5
• First Look on Azure SQL DW
• Designing for Azure SQL DW
• Loading Data on Azure SQL DW
• Querying and Tuning Azure SQL DW
6. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
First Look on
Azure SQL Data Warehouse
6
7. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
What is Azure SQL Data Warehouse?
7
Service in
Microsoft Azure
It’s a PAAS
offering
It’s a Massively
Parallel Processing
System
Distribute
Storage
Distributed
Compute
8. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
SMP vs MPP
8
Symmetric Multiprocessing Massively Parallel Processing
9. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Data Warehousing Unit
9
A measure of the
underlying compute
power of the database
10. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Data Warehousing Unit
10
For Example
50 100
100 DWU 500 DWU
3 table loaded in 15 min
20 minutes to run a report
3 table loaded in 3 min
4 minutes to run a report
11. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Why Choose Cloud Over On-Premises DW?
11
• Doesn’t need large CAPEX to get started
• Doesn’t need large OPEX
• We can scale storage and compute up or down
on demand
12. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
What and How do you pay for this Service ?
12
• Storage
– Storage is billed by GB
– Standard or Premium Geo Redundant
– No cost for storage transactions
– Outbound data transfer is billed
• Compute Power
– Compute is billed by DWUs
– Can go from 100 to 2000
– Billed per hour
When not in use, compute
power of the DW can be
completely paused for
maximum savings
13. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Provisioning Azure SQL Data Warehouse
13
Select a
Region
Select or
Create a
Server
Pick
origin of
the data
Pick
DWU
level
14. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Methods of Provisioning
14
• Azure Portal
– Select New > Data + Storage
• PowerShell
– New AzureRmSqlDatabase Cmdlet
• T-SQL
– CREATE DATABASE Command
15. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Provision a Data Warehouse
15
DEMO
16. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Designing for
Azure SQL Data Warehouse
16
17. SQL Server Azure SQL DW!=
An Azure SQL DW database requires design
decisions that are different from SQL Server
18. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Distribution Key
18
Determines the method in which Azure
SQL Data Warehouse spreads the data
across multiple nodes
Azure SQL Data Warehouse
uses up to 60 distributions
when loading data into the
system
20. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Round-Robin Distribution
20
RecordNo CustomerID InvoiceDate
1 1000 2017-04-21
2 1000 2017-04-22
3 2000 2017-04-22
4 3000 2017-04-22
5 4000 2017-04-22
Rows distributed to all nodes
21. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Data Distribution best practice
21
Even DistributionOdd Distribution
22. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Good Hash Key
22
Distributes
Evenly
Used for
Grouping
Used as
Join Condition
Is Not
Updated
Has more than
60
distinct values
Round-Robin will always provide a uniform distribution but not necessarily the best performance
23. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Data Types
23
Use the smallest
data type which will
support your data
Avoid defining all
character columns
to a large default
length
Define columns as
VARCHAR instead of
NVARCHAR if you
don’t need Unicode
The goal is to not only save space but also move data as efficiently as possible
Some complex data types (xml, geography, etc) are not supported yet
24. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Table Types
24
Clustered
Columnstore
Default table
type
High
compression
ratio
Ideally
Segments of
1M rows
No secondary
indexes Heap
No index on
the data
Fast Load
No
compression
Allows
secondrary
indexes
Clustered
B-Tree
Sorted index
on the data
Fast singleton
lookup
No
compression
Allows
secondary
indexes
25. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Table Partitioning
25
1. Ease of loading and removal of data from a partitioned table
2. Targeting specific partitions on table maintenance operation
3. Performance improvements due to partition elimination
Partitioning is very common in SQL Server Data Warehouses for three reasons:
A highly granular partitioning scheme can work in SQL Server but hurt performance in Azure SQL DW
60 Distributions 365 Partitions 21.900 Data Buckets
21.900 Data Buckets Ideal Segment Size
(1M Rows)
21.900.000.000
Rows
Lower Granularity (week, month) can perform better depending on how much data you have
26. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
How do we apply these principles to a Dimensional Model?
26
• Fact Table
– Large ones are better as Columnstores
– Distributed through Has key as much as possible as long as it is even
– Partitioned only if the is large enough to fill up each segment
• Dimension Tables
– Can be Hash distributed or Round-Robin if there is no clear candidate
join key
– Columnstore for large dimensions
– Heap or Clustered Index for small dimensions
– Add secondary indexes for alternate join columns
– Partitioning not recommended
27. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Analyzing distribution and data types for DW tables
27
DEMO
28. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Loading Data on
Azure SQL Data Warehouse
28
29. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Loading an MPP System
29
The main principle of loading
data into Azure DW is to do as
much work in parallel as possible
30. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Data Warehouse Readers
30
100 200 300 400 500 600 1000 1200 1500 2000
Readers 8 16 24 32 40 48 60 60 60 60
Writers 60 60 60 60 60 60 60 60 60 60
DWU
Your DWUs have a direct impact on how fast you can load data in parallel
- Azure SQL Data Warehouse introduces the concept of Data Warehouse
Readers.
- These are threads that will be reading data in parallel and then passing it
off to Writer threads.
31. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Optimize Insert Batch Size
31
• Avoid trickle insert pattern
– Ideal batch size is 1 million or more direct or in a file
• Avoid Ordered Data
– Data ordered by distribution key can introduce hot spots that slow down the load
operation
• Using Temporary Tables
– Stage and transform on a Temp Heap table before moving to permanent storage
• Use the CREATE TABLE AS statement
– Fully parallel operation
– It’s minimally logged
– It can change: distribution, table type, partitioning
33. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
User Resource Class
33
Class Smallrc Mediumrc Largerc Xlargerc
Default 8 16 24 32
Memory 100 MB 100-1600 MB 200-3200 MB 400-6400 MB
The lower range corresponds to DWU100 the upper range to DWU2000
User Resource classes as database roles that govern how many resources
are given to a query
For fast and high quality loads create a user just for loading which utilize a medium or large RC
34. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Loading Methods
34
• Single-client loading methods
– SSIS
– Azure Data Factory
– BCP
– Can add some parallel capabilities but are bottleneck at the Control node
• Parallel readers loading methods
– PolyBase
– Reads from Azure Blob Storage and loads the content into Azure SQL DW
– Bypasses the Control node and loads directly into the Compute Nodes
35. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Control Node
35
The Control Node
receives connections
and orchestrates the
queries
The Compute Nodes
do processing on the
data and scale with
the DWUs
36. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Loading with SSIS
36
SSIS Control
Node
37. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Loading data with SSIS
37
DEMO
38. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Loading with PolyBase
38
Control
Node Azure
Blob Storage
PolyBase can load data from
UTF-8 delimited text files and
popular Hadoop file formats
(RC file, ORC and Parquet)
39. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Loading data with PolyBase
39
DEMO
40. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Migration Utility
40
• Supports SQL Server 2012+ and
Azure SQL Database
• Provides a migration report pointing
out possible issues
• Assists with schema migration
• Assists with data migration
41. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Using the Azure SQL DW migration utility
41
DEMO
42. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Querying and Tuning
Azure SQL Data Warehouse
42
43. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Workload Management Principles
43
User Resource
Class
Concurrency
Model
Transaction Size
TwoMaximumLimits
1024 Connections
32 Concurrent Queries
45. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Resource Class and Concurrency Slots
45
Class Smallrc Mediumrc Largerc Xlargerc
DWUs 100-2000 100-2000 100-2000 100-2000
Slots 1 1-6 2-32 4-64
SELECT queries against system views, stats and other management commands do not use concurrency slots
46. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Transaction Size Limits
46
100 200 300 400 500 600 1000 1200 1500 2000
GB /
Distribution
1 1,5 2,25 3 3,75 4,5 7,5 9 11,25 15
DWU
A DW200 transaction doing equal work per distribution could
consume 60 x 1,5 GB = 90 GB of space
47. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Maintaining Statistics
47
• The service does not create or maintain stats
automatically
• Creating New stats
– Sampled single column stats is a good start
– Multi columns stats for joins involving multiple columns
– Focus on columns used in JOINs, GROUP BY, HAVING and WHERE clauses
– Increase the sample if necessary
• Updating existing stats
– If new dates or dimension categories added
– If new data loads have completed
– If an UPDATE or DELETE changes the distribution of data
48. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Index Defrag
48
• Heap
– Does not have a defrag option
• B-Tree Index
– Useful for removing low levels of fragmentation
• Columnstore
– Proactively compresses CLOSED rowgroups
• On a large table with heavy fragmentation it is often faster to recreate the
table with the CREATE TABLE AS SELECT and switch it with the older
49. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Index Rebuild
49
• Heap
– Can be rebuilt to remove forward pointers
• B-Tree Index
– Will remove high levels of fragmentation
• Columnstore
– Can increase the density of segments
• Rebuilding as index is an OFFLINE operation in Azure SQL DW
50. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Scaling Performance
50
• Increase the User Resource Class
– EXEC sp_addrolemember ‘largerc’, ‘loaduser’;
– Higher Resource Class – more memory and CPU
– More concurrency slots – less concurrent queries
– The highest role assigned takes precedence
• Increase the Data Warehouse Units
– ALTER DATABASE AWDW MODIFY (SERVICE_OBJECTIVE=‘DW1000’);
– It is an OFFLINE operation
– Make sure there are no loads or transactions in progress
– Can also be done through the Azure Portal
51. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Tracking Queries with Lables
51
SELECT sum(Qty)
FROM dbo.FactInternetSales
OPTION (LABEL=‘mylabel’);
SELECT *
FROM sys.dm_pdw_exec_requests
WHERE label=‘mylabel’);
User Query
Admin Query
52. Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017
Labeling a query and tracking its execution
52
DEMO