Statistics and Indexes Internals

Statistics
and
Index Internals
Athens May 31, 2017

PresenterInfo
1982 I started working with computers
1988 I started my professional career in computers industry
1996 I started working with SQL Server 6.0
1998 I earned my first certification at Microsoft as
Microsoft Certified Solution Developer (3rd in Greece)
1999 I started my career as Microsoft Certified Trainer (MCT) with
more than 30.000 hours of training until now!
2010 I became for first time Microsoft MVP on Data Platform
I created the SQL School Greece www.sqlschool.gr
2012 I became MCT Regional Lead by Microsoft Learning Program.
2013 I was certified as MCSE : Data Platform
I was certified as MCSE : Business Intelligence
2016 I was certified as MCSE: Data Management & Analytics
Antonios
Chatzipavlis
SQL Server Expert and Evangelist
Data Platform MVP
MCT, MCSE, MCITP, MCPD, MCSD, MCDBA,
MCSA, MCTS, MCAD, MCP, OCA, ITIL-F

SQLschool.gr
Μια πηγή ενημέρωσης για τον Microsoft SQL Server προς τους Έλληνες
IT Professionals, DBAs, Developers, Information Workers αλλά και
απλούς χομπίστες που απλά τους αρέσει ο SQL Server.
Help line : help@sqlschool.gr
• Articles about SQL Server
• SQL Server News
• SQL Nights
• Webcasts
• Downloads
• Resources
What we are doing here Follow us in socials
fb/sqlschoolgr
fb/groups/sqlschool
@antoniosch
@sqlschool
yt/c/SqlschoolGr
SQL School Greece group
SELECT KNOWLEDGE
FROM SQL SERVER

▪ Sign up for a free membership today at sqlpass.org.
▪ Linked In: http://www.sqlpass.org/linkedin
▪ Facebook: http://www.sqlpass.org/facebook
▪ Twitter: @SQLPASS
▪ PASS: http://www.sqlpass.org

PresentationContent
• Statistics and Cardinality Estimation
• Index Internals

Statistics and Cardinality Estimation
• Cost-Based Optimization
• Predicate Selectivity
• Inspecting Statistics
• Cardinality Estimation
• Creating Statistics
• Updating Statistics
• Filtered Statistics

• Cost-based optimization selects the query execution plan
with the lowest estimated cost for execution
• Statistics are a critical part of cost-based optimization
- Statistics provide information about data distribution in a column or group
of columns
- Columns in tables and indexes might have statistics
Cost-Based Optimization

• Predicates
- Expressions which evaluate to true or false
- Found in joins, WHERE and HAVING clauses
• Predicate Selectivity
- The number of rows from a table that meet a predicate
- High selectivity = small percentage of rows returned
- Low selectivity = large percentage of rows returned
• Predicate Selectivity in Query Optimization
- Selectivity used to sequence join operations and choose between
scan/seek
Predicate Selectivity

• Statistics header
- Statistics metadata
• Density vector
- Measure of uniqueness
• Histogram
- Data distribution, in up to 200 steps
• DBCC SHOW_STATISTICS
Inspecting Statistics

• In SQL Server, cardinality estimation attempts to predict
the number of rows returned by a query or query
operator
• SQL Server 2014 and SQL Server 2016 include rewritten
cardinality estimation logic
• Several factors can cause poor cardinality estimation:
- Out-of-date or missing statistics
- Functions in predicates
- Table variables
Cardinality Estimation

• Automatic creation:
• When AUTO_CREATE_STATS = ON
• Single column statistics only
• Names start _WA…
• Manual creation:
• CREATE STATISTICS
• sp_createstats
Creating Statistics

• Automatic update:
• AUTO_UPDATE_STATISTICS = ON
• Automatic update threshold
• AUTO_UPDATE_STATS_ASYNC option
• Manual update:
• UPDATE STATISTICS command
• sp_updatestats
• Updating statistics causes query plans to be recompiled
Updating Statistics

• CREATE STATISTICS using a WHERE clause
• Some limitations:
- Limited benefit from AUTO_UPDATE_STATISTICS
- Limited to simple comparison logic
- Cannot be created on all data/column types
- Cannot be created on indexed views
• Histogram has finer resolution
Filtered Statistics

DEMO
Analyzing Cardinality
Estimation

Index Internals
• Heap Internals
• Index Structure
• Picking the Right Index Key
• Single Column and Multicolumn Indexes
• Filtered Indexes
• The Query Optimizer’s Choice of Indexes
• Data Modification Internals
• Index Fragmentation
• Index Column Order
• Identify and Create Missing Indexes

• A table without a clustered index is a heap.
• IAM pages contain pointers to extents containing heap
data.
- The IAM is the only link between pages in a heap; row data is unordered.
• Heap rows are identified by a row identifier (RID).
- Nonclustered indexes on a heap contain RID pointers
Heap Internals

• B-trees:
- Self-balancing tree data structure
- One root node; many non-leaf nodes; many leaf nodes
- Clustered and nonclustered indexes are b-trees
• Index key:
- Defines the order of data in an index
• Clustered index:
- Root node and non-leaf nodes contain index pages; leaf nodes contain data pages
- Index order = table data order
• Nonclustered index:
- All nodes contain index pages
• XML and spatial data types have special indexes
Index Structure

• Clustered index key criteria:
- Unique
- Non-nullable
- Narrow
- Static
- Ever-increasing
• Nonclustered index criteria:
- Frequently used predicates
- Join columns
- Aggregate queries
- Avoid redundant/duplicate indexes, wide keys, indexes serving only one query for systems with
a lot of data change
Picking the Right Index Key

• Single column index:
- Each predicate column has its own index
- Less performance gain, but more reusable
• Multicolumn index:
- All predicate columns are included in the key of the index
- Greatest performance gain, but limited reusability
Single Column and Multicolumn Indexes

• Nonclustered index with a filter predicate:
- Filter predicate defined with a WHERE clause in the index definition
• Better performance than a whole-table index:
- Finer-grained statistics
- Smaller size
• Suitable when table data is queried in clearly identifiable
subsets
• Some restrictions apply
Filtered Indexes

• Data access methods for indexes
• Table scan
• Clustered index access
• Nonclustered index seek on heap
• Nonclustered index seek on non-heap
• Covering nonclustered index
• Predicate SARGability:
• WHERE <column> <operator> <value>
• <column> must appear alone
• Leading string wildcards prevent index seek
The Query Optimizer’s Choice of Indexes

• Delete:
- Key reference removed from leaf-level page
- A page with no references is removed from the b-tree
• Insert:
- New key reference added to either:
- Existing free space on a page
- New page
• Update:
- A delete followed by an insert
Data Modification Internals

Index Fragmentation
Unfragmented Index
Externally Fragmented Index
Internally Fragmented Index

• SELECT performance:
- Only the first column has a histogram
- Use the most selective column as the first column, but consider how the
index will be used
• INSERT performance:
- Consider the effect column order will have on INSERT performance for a
clustered multicolumn index
Index Column Order

• Query Plans
• Query Store
• Database Engine Tuning Advisor
• Missing index DMVs
• Tools will not:
- Suggest clustered indexes
- Suggest modifications to existing indexes
- Analyze column order in multicolumn indexes
- Suggest filtered indexes
Identify and Create Missing Indexes

DEMO
Picking the Right Index Key

Statistics and Indexes Internals

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Statistics and Indexes Internals

Similar a Statistics and Indexes Internals (20)

Más de Antonios Chatzipavlis

Más de Antonios Chatzipavlis (20)

Último

Último (20)

Statistics and Indexes Internals