This webinar by Volodymyr Trishyn (Senior Software Engineer, Consultant, GlobalLogic) was delivered at On Air webinar #15 on July 31, 2020.
Webinar agenda:
- SQL Database
- Azure SQL Data Warehouse
- Azure SQL Elastic Database Pool
- Geo-replication
- Distributed Transactions
- Transaction Isolation Level
- Table Partitioning
- Materialized View Pattern
More details and presentation: https://www.globallogic.com/ua/about/events/webinar-azure-sql/
6. Adjust performance and scale without downtime
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-technical-overview
7. Elastic pools to maximize resource utilization
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-technical-overview
8. Azure Synapse
Analytics
• Stores data into relational
tables with columnar storage.
• Leverages Massively Parallel
Processing (MPP) to quickly
run complex queries across
petabytes of data.
• SQL Azure Synapse Analytics as
a key component of a big data
solution.
• Import big data into Azure
Synapse Analytics with simple
PolyBase T-SQL queries.
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-overview-what-is
9. Failover groups and active geo-replication.
Azure SQL Database auto-failover groups (in-preview) is a SQL Database feature designed to automatically
manage geo-replication relationship, connectivity, and failover at scale.
Because auto-failover groups involve multiple databases, they must be configured on the primary server.
Both primary and secondary servers must be in the same subscription.
Auto-failover groups support replication of all databases in the group to only one secondary server in a
different region.
Failover switches all secondary databases in the group to primary. After the database failover is completed,
the DNS record is automatically updated to redirect the end-points to the new region.
Active geo-replication, without auto-failover groups, allows up to four secondaries in any region.
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-geo-replication-overview
11. Replication strategy
Replication
strategy LRS ZRS GRS RA-GRS
Data is replicated
across multiple
datacenters.
No Yes Yes Yes
Data can be read
from a secondary
location as well as
the primary location.
No No No Yes
Designed to
provide durability of
objects over a given
year.
at least
99.999999999% (11
9's)
at least
99.9999999999% (12
9's)
at least
99.99999999999999
% (16 9's)
at least
99.99999999999999
% (16 9's)
https://docs.microsoft.com/en-us/azure/storage/common/storage-redundancy
12. Distributed
transactions
Elastic database transactions for SQL
DB enable applications to make
atomic changes to data stored in
several different SQL Databases.
The preview focuses on client-side
development experiences in C# and
.NET.
A server-side experience using T-SQL
is planned for a later time.
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-elastic-transactions-overview
14. In-Memory technologies in SQL Database.
In-Memory technologies are available in all databases in the Premium tier, including databases in Premium
elastic pools.
• In-Memory OLTP increases throughput and reduces latency for transaction processing. High-throughput
transaction processing such as trading and gaming, data ingestion from events or IoT devices, caching, data
load, and temporary table and table variable scenarios.
• Clustered columnstore indexes reduce your storage footprint (up to 10 times) and improve performance
for reporting and analytics queries. You can use it with historical data in your operational database to
archive and be able to query up to 10 times more data.
• Nonclustered columnstore indexes for HTAP help you to gain real-time insights into your business through
querying the operational database directly, without the need to run an expensive extract, transform, and
load (ETL) process and wait for the Azure Synapse Analytics to be populated.
• You can also have the combination of a memory-optimized table with a columnstore index. This
combination enables you to perform very fast transaction processing, and to concurrently run analytics
queries very quickly on the same data.
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-in-memory
15. Transactions per second
Azure SQL Database is able to achieve 75,000 transactions per second (TPS) in a single database, which is an
11X performance improvement from using In-Memory OLTP, compared with traditional tables and stored
procedures.
Pricing tier TPS for In-Memory OLTP TPS for traditional tables Performance gain
P15 75,000 6,800 11X
P2 8,900 1,000 9X
16. Table Partitioning.
• Creating the Partition Function
• Creating the Partition Scheme
• Creating the Partitioned Table
CREATE PARTITION FUNCTION
PF_HASH_BY_VALUE (BIGINT) AS
RANGE LEFT
FOR VALUES (100000, 200000,
300000, 400000, 500000, 600000,
700000, 800000, 900000)
https://www.mssqltips.com/sqlservertip/3494/azure-sql-database--table-partitioning/
17. Materialized View
pattern.
Materialized views, which only
contain data required by a
query, allow applications to
quickly obtain the information
they need.
When the source data for the
view changes, the view must
be updated to include the new
information. You can schedule
this to happen automatically,
or when the system detects a
change to the original data.
https://docs.microsoft.com/en-us/azure/architecture/patterns/materialized-view
18. Materialized Views vs Views.
Materialized Views
• Stores results not queries
• Requires Physical memory
• Auto updates (in MS SQL case)
• Execution time is less
Views
• Stores queries not data
• No physical memory required
• Auto Updates
• Execution time is more
19. Extensive monitoring and alerting capabilities
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-technical-overview
20. Scale out databases with the shard map manager
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-elastic-scale-shard-map-management
24. Azure Synapse Analytics.
SQL Server, in this case the Parallel Azure Synapse Analytics appliance version is
tested and scales up to hundreds of Terabytes.
For anything that goes in the Petabytes, the solution is not only technical but
also architectural. You must plan for backup, disaster recovery, maintenance
etc.
Most likely the result will be that you need to distribute your data across
multiple systems to go into the petabyte range.
But it depends a lot on the specific workload.
25. MPP Architecture
components.
Azure Synapse Analytics
leverages a scale out
architecture to distribute
computational processing of
data across multiple nodes.
The unit of scale is an
abstraction of compute power
that is known as a Azure
Synapse Analytics unit.
Azure Synapse Analytics
separates compute from
storage which enables you as
the user to scale compute
independently of the data in
your system.
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/massively-parallel-processing-mpp-architecture
26. Hash-distributed
tables.
A hash distributed table can
deliver the highest query
performance for joins and
aggregations on large tables.
• Each row belongs to one
distribution.
• A deterministic hash
algorithm assigns each row
to one distribution.
• The number of table rows
per distribution varies as
shown by the different sizes
of tables.
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/massively-parallel-processing-mpp-architecture
27. Round-robin distributed tables.
A round-robin table is the simplest table to create and delivers fast performance when used as a staging table
for loads.
A round-robin distributed table distributes data evenly across the table but without any further optimization.
A distribution is first chosen at random and then buffers of rows are assigned to distributions sequentially.
It is quick to load data into a round-robin table, but query performance can often be better with hash
distributed tables.
Joins on round-robin tables require reshuffling data and this takes additional time.
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/massively-parallel-processing-mpp-architecture
28. Replicated
Tables.
A replicated table provides the
fastest query performance for
small tables.
A table that is replicated
caches a full copy of the table
on each compute node.
Consequently, replicating a
table removes the need to
transfer data among compute
nodes before a join or
aggregation.
Replicated tables are best
utilized with small tables.
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/massively-parallel-processing-mpp-architecture
29. Hub and spokes
for thousands of
BI users
Use the hub and spoke architecture
to achieve the scale you want, at
the price and performance level
you decide. This architecture also
provides resource isolation and
advanced security features.
https://azure.microsoft.com/en-us/services/sql-data-warehouse
30. Auto-scale Azure
Synapse
Analytics for
optimized usage
With Azure Functions, you can take
full advantage of the elasticity of
Azure Synapse Analytics by auto-
scaling, and optimize your cost.
https://azure.microsoft.com/en-us/services/sql-data-warehouse
31. Big-data
integration for
batch processing
of unstructured
data.
With Azure Data Factory, users
can easily integrate your on-
premises and cloud data
applications to your data
warehouse.
Customers can choose between
the leading open source
solution (HDInsight) or .Net
(Azure Data Lake) to process
unstructured data and load
terabytes of results to Azure
Synapse Analyticse using
PolyBase.
https://azure.microsoft.com/en-us/services/sql-data-warehouse
32. Load data from
Azure blob storage.
• Create external tables
• Load the data into your data
warehouse
• Create statistics on newly
loaded data
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/load-data-from-azure-blob-storage-using-polybase