In-Memory features is the most perspective trend in the area of high performance. Columnstore Indexes is one of such features, and even with their restrictions, they can accelerate your queries at times! How to get more from this feature? In which situations should we use them? Which internal mechanisms help to achive that? You can get answers on these questions on this session.
3. About me
3 |
Denis Reznik
Kiev, Ukraine
Database Architect at The Frayman Group
Microsoft MVP
Community enthusiast
4. Agenda
Columnar storage
Creation of Columnstore index
Usage scenarios and limitations
Performance accelerators
Columnstore Storage internals
Columnstore Execution mode internals
Columnstore index maintenance
Columnstore Future (actually Present :)
4 |
5. Row Store and Column Store
In row store, data is stored tuple by tuple.
In column store, data is stored column by
column
6. Row Store and Column Store
Most of the queries does not process all the attributes of a particular
relation.
nam address
e
id
SELECT c.Name, c.Address
FROM Customers c
WHERE c.City = 'Sofia'
city
state
age
8. Usage scenarios and limitations
Primary focus of Columnstore Indexes is DW
databases
In SQL Server 2012 Columnstore Indexes
are read-only
Supported operators and data types are
limited
10. How Are These Performance Gains
Achieved?
Two complimentary technologies:
Storage
Data is stored in a compressed columnar data format (stored
by column) instead of row store format (stored by row).
New “batch mode” execution
Vector-based query execution capability
Data can then be processed in batches versus row-by-row
Depending on filtering and other factors, a query may also
benefit by “segment elimination” - bypassing million row
chunks (segments) of data, further reducing I/O
11. Compression
Patented VERTIPAQ algorithms
So, there is no public information about how the
data actually compressed
But some info we have
Dictionary encoding
Run Length encoding
Bit-Vector encoding
…
16. Memory management
• Memory management is automatic
• Columnstore is persisted on disk
• Needed columns fetched into memory
• Columnstore segments is a unit of data between disk and memory
T.C
T.C
T.C T.C
4
1
3
T.C 2
T.C
1
T.C
T.C
T.C T.C 3
4
1
3
T.C 2
T.C
1
T.C
T.C 3
T.C
T.C 3
1
4
2
SELECT C2, SUM(C4)
FROM T
GROUP BY C2;
T.C
2
T.C
2
T.C
4
T.C
4
17. Batch mode processing
Batch object
bitmap of qualifying rows
Column vectors
Process ~1000 rows at a
time
Vector operators
implemented
Greatly reduced CPU time
(7 to 40X)
18. Segment Elimination
• Segment (rowgroup) = 1 million row chunk
• Min, Max kept for each column in a segment
• Scans can skip segments based on this info
column_i
d
segment_i
d
min_data_i
d
max_data_id
1
1
20120101
20120131
1
2
20120115
20120215
1
3
20120201
20120228
skipped
select Date, count(*)
from dbo.Purchase
where Date >= '20120201'
group by Date
20. Maintaining Data in a Columnstore Index
Once built, the table becomes “read-only”
and INSERT/UPDATE/DELETE/MERGE is
no longer allowed
ALTER INDEX REBUILD / REORGANIZE not
allowed
How can I modify index data?
Drop columnstore index / make modifications /
add columnstore index
UNION ALL (but be sure to validate performance)
Partition switches (IN and OUT)
21. Columnstore Index Future
Actually it is already become
Columnstore indexes can be clustered (in
SQL server 2014)
Clustered Columnstore indexes can be
updatable (in SQL Server 2014)
Update data (deltas) store in rowstore until
segment can be created