Compression: a hidden Gem for IO heavy Databases
The limiting factor in most database systems is the ability to read and write data to the IO subsystem.
We're still using storage layouts and methodologies in SQL Server that are a reflection of old spinning media in times gone by.
Until major changes are made to the internal storage layouts, we have "some" hope with options such as data compression, sparse columns and filtered indexes, which not only save space on disk, but also reflect a saving in memory.
In this session we will go over the IO savings technologies presented in SQL Server, and discuss how implementing some of these will assist in your operational performance goals.
Presenter: Charley Hanania, MVP
Charley is Principal Consultant at QS2 AG in Switzerland and has consulted to organisations of all sizes during his extensive career in Database and Platform Consulting.
He's been focussed on SQL Server since v4.2 on OS/2 and with over 15 years of experience in IT he's supported companies in the areas of DB training, development, architecture & administration throughout Europe, America & Australasia.
Communities are Charley's passion and he became active in database communities in the mid 90's, participating in heterogeneous database user groups in Australia. He continues to lead an active role through community events such as Database Days, the European PASS Conference, PASS & the Swiss PASS Chapter.
3. December 2013 Meeting Agenda
Meeting Agenda:
15:30 - 15:45 :
15:45 - 16:00 :
16:00 - 17:15 :
17:15 - 17:45 :
17:45 - 19:00 :
19:00 -
Arrival and Registration
Introduction & Welcome
HDInsight: Introduction to Hadoop on Windows Azure
Break & Refreshments
Compression: a hidden Gem for IO heavy Databases
End / Drinks
4. Session 1:
HDInsight: Introduction to Hadoop on Windows Azure
Speaker: Scott Klein – Microsoft Corp
Apache Hadoop is open source software that was born out of the
large web properties need to process massive amounts of data. Now
Hadoop is becoming the de facto platform for big data storage and
processing in the enterprise. A modern Hadoop-based data platform
is a combination of multiple open source projects brought together
tested and integrated to create an enterprise-grade platform.
In this session we review the Hadoop projects required in a Windows
Hadoop platform and how it integrates with Microsoft SQL Server
Database Platform analysis tools. We drill down into Hadoop
integration and implementation Windows, Windows Azure and
integration with the SQL Server Data Platform. We will also take a
detailed look at Hadoop in Windows Azure, called HDInsight, and
how it provides the enterprise processing required with todays data
and information.
5. Session 2:
Compression: a hidden Gem for IO heavy Databases
Speaker: Charley Hanania, MVP
The limiting factor in most database systems is the ability to read and
write data to the IO subsystem.
We're still using storage layouts and methodologies in SQL Server
that are a reflection of old spinning media in times gone by.
Until major changes are made to the internal storage layouts, we
have "some" hope with options such as data compression, sparse
columns and filtered indexes, which not only save space on disk, but
also reflect a saving in memory.
In this session we will go over the IO savings technologies presented
in SQL Server, and discuss how implementing some of these will
assist in your operational performance goals.
6. Virtual Chapter Meetings – December
VIRTUAL CHAPTER
MEETING
TOPIC
Performance
Dec 5 – 14:00 (GMT-05:00)
Natural Born Killers, Performance Issues to
Avoid – presented by Richard Douglas
Application Development
Dec 6 – 12:00pm (GMT-07:00)
Isn’t That Spatial: Spatial Data for the App
Dev – presented by Jason Horner
Virtualization
Dec 11 – 10:00am (GMT – 07:00)
Virtualization Q&A with Brent Ozar
Business Intelligence
Dec 17 – 12:00 (GMT+12:00)
Inferred Dimension Members within MDS
and SISS – presented by Reza Rad
Big Data
Dec 17 – 14:00 (GMT -05:00)
Building a Big Data Architecture –
presented by Joey D’Antoni
Global Russian
Dec 18
SQL Server Data Quality Services –
presented by Константин Хомяков
7. PASS Virtual Chapters Listing
Check out the sqlpass.org for more information on all the Virtual Chapters including:
•
•
•
•
•
•
•
•
•
•
•
7
Application Development
Azure
Big Data
Book Readers
Business Analytics
Business Intelligence
Data Architecture
DBA
DBA Fundamentals
Healthcare
Master DataData Quality
•
•
•
•
•
•
•
•
•
•
•
Performance
Oracle
Powershell
Professional Development
Security
Virtualization
Women in Technology
Virtual PASS Portuguese
Global Chinese
Global Spanish
Global Russian
8. JOIN US for our second annual event to get the best learning for
analyzing, managing, and sharing business information and
insights through the Microsoft Data Platform of technologies.
9. Stay Involved!
•
Sign up for a free membership today at sqlpass.org
•
•
•
•
Linked In: Professional Association for SQL Server
Facebook: Professional Association for SQL Server Group
Twitter: @SQLPASS
The PASS Blog: sqlpass.org
11. Compression
a hidden Gem for IO heavy Databases
Charley Hanania :: QS2 AG – Quality Software Solutions
B.Sc (Computing), MCP, MCDBA, MCITP, MCTS, MCT, Microsoft MVP: SQL Server
Principal Consultant - Senior Database Specialist
12. Presentation Summary
The limiting factor in most database systems is the ability to
read and write data to the IO subsystem.
We're still using storage layouts and methodologies in SQL
Server that are a reflection of old spinning media in times
gone by.
Until major changes are made to the internal storage layouts,
we have "some" hope with options such as data
compression, sparse columns and filtered indexes, which
not only save space on disk, but also reflect a saving in
memory.
In this session we will go over the IO savings technologies
presented in SQL Server, and discuss how implementing some
of these will assist in your operational performance goals.
13. The idea behind compression
•
Save on storage costs
•
•
SAN based storage
Backup storage
• Improve Read Performance
•
•
IO is generally the worst performer in Database infrastructure
Lower IO & Higher CPU = reduced transactional turnaround*
* In most cases
14. Vardecimal Implementation
• Introduced in SQL Server 2005 Service Pack 2 (SP2)
• Marked as deprecated shortly thereafter in preference to
Row and Page Compression
• Applied to a whole table
• Stored using scientific notation
•
(sign) * mantissa * 10exponent
15. Row Compression Implementation
• Only changes the physical storage format of the data
that is associated with a data type.
• Uses variable-length storage format for numeric types.
• Stores fixed character strings using variable-length
format by not storing the blank characters.
•
NULL and 0 values across all data types are optimized and take no bytes.
16. Page Compression Implementation
• Does the following:
1.
2.
3.
4.
Row compression
Prefix compression*
Dictionary compression
Unicode Compression
Prefix Compression:
• Value is identified that can be used
to reduce the storage space for the
values in each column.
• Row is created to identify the prefix.
• Repeated prefix values in the column
are replaced by a reference
*at the byte level, page compression is data type agnostic
Images from
SQL BOL
17. Page Compression Implementation
• Does the following:
1.
2.
3.
4.
Row compression
Prefix compression
Dictionary compression
Unicode Compression
Dictionary Compression:
• Searches for repeated values
anywhere on the page, and stores
them in the CI area
• Is not restricted to one column as
per prefix
CI = compression information structure
18. Page Compression Implementation
• Does the following:
1.
2.
3.
4.
Row compression
Prefix compression
Dictionary compression
Unicode Compression
Unicode Compression:
• Works on Row and Page compressed data
• Reduces length of datatype storage due to underutilised storage (double byte) capability.
• NCHAR & NVARCHAR Data can see savings of
up to 50%*
• Does not work for off row or in nvarchar(max)
* 50% for data that is wholly single byte
20. Sparse Columns
• A type of column storage where a NULL takes no storage
at all
• normal NULL columns require space to indicate that they are
NULL
• The data is stored internally in a different form
• The Keyword SPARSE is used in the column creation
definition
• Sparse columns can also be used as a column set, which
then allows you to return data that is only in sparse
columns
• Best used when you have many columns that are mostly
going to be empty/NULL, otherwise slightly innefficient.
21. A look into how to implement & digest sparse columns
Experiment
22. Filtered Indexes
• An index with a WHERE clause!
• Can be used with Index Intersection etc in the Query
Optimiser
• Allows statistics to be more focused as the dataset within
the index is distilled
• Compression Capable
• Overhead is in the Administration
23. Simple example of efficiency introduced through a Filtered Index
Experiment
24. Links and Resources
• Storing Decimal Data As Variable Length
•
Mark Rasmussen
• The Anatomy of Vardecimals
• SQL Server Diagnostic Information Queries
•
Glenn Berry
• Pro SQL Server 2012 Relational Database Design and
Implementation
• Microsoft SQL Server 2008 Internals
• SQL Server 2014 : Native backup encryption & compression
•
Aaron Bertrand
• Tonight’s T-SQL Code (http://bit.ly/1eiTOwT)
The first column actually takes up zero bytes, as it matches the stored column prefix exactly – a saving of four bytes!All the others store a single byte that identifies how many bytes to use from the stored column prefix, as well as the differential bytes coming afterwards.