The document provides biographical information about Antonios Chatzipavlis, a SQL Server expert and evangelist. It then summarizes his presentation on statistics and index internals in SQL Server, which covers topics like cardinality estimation, inspecting and updating statistics, index structure and types, and identifying missing indexes. The presentation includes demonstrations of analyzing cardinality estimation and picking the right index key.
4. PresenterInfo
1982 I started working with computers
1988 I started my professional career in computers industry
1996 I started working with SQL Server 6.0
1998 I earned my first certification at Microsoft as
Microsoft Certified Solution Developer (3rd in Greece)
1999 I started my career as Microsoft Certified Trainer (MCT) with
more than 30.000 hours of training until now!
2010 I became for first time Microsoft MVP on Data Platform
I created the SQL School Greece www.sqlschool.gr
2012 I became MCT Regional Lead by Microsoft Learning Program.
2013 I was certified as MCSE : Data Platform
I was certified as MCSE : Business Intelligence
2016 I was certified as MCSE: Data Management & Analytics
Antonios
Chatzipavlis
SQL Server Expert and Evangelist
Data Platform MVP
MCT, MCSE, MCITP, MCPD, MCSD, MCDBA,
MCSA, MCTS, MCAD, MCP, OCA, ITIL-F
5. SQLschool.gr
Μια πηγή ενημέρωσης για τον Microsoft SQL Server προς τους Έλληνες
IT Professionals, DBAs, Developers, Information Workers αλλά και
απλούς χομπίστες που απλά τους αρέσει ο SQL Server.
Help line : help@sqlschool.gr
• Articles about SQL Server
• SQL Server News
• SQL Nights
• Webcasts
• Downloads
• Resources
What we are doing here Follow us in socials
fb/sqlschoolgr
fb/groups/sqlschool
@antoniosch
@sqlschool
yt/c/SqlschoolGr
SQL School Greece group
SELECT KNOWLEDGE
FROM SQL SERVER
6. ▪ Sign up for a free membership today at sqlpass.org.
▪ Linked In: http://www.sqlpass.org/linkedin
▪ Facebook: http://www.sqlpass.org/facebook
▪ Twitter: @SQLPASS
▪ PASS: http://www.sqlpass.org
10. • Cost-based optimization selects the query execution plan
with the lowest estimated cost for execution
• Statistics are a critical part of cost-based optimization
- Statistics provide information about data distribution in a column or group
of columns
- Columns in tables and indexes might have statistics
Cost-Based Optimization
11. • Predicates
- Expressions which evaluate to true or false
- Found in joins, WHERE and HAVING clauses
• Predicate Selectivity
- The number of rows from a table that meet a predicate
- High selectivity = small percentage of rows returned
- Low selectivity = large percentage of rows returned
• Predicate Selectivity in Query Optimization
- Selectivity used to sequence join operations and choose between
scan/seek
Predicate Selectivity
12. • Statistics header
- Statistics metadata
• Density vector
- Measure of uniqueness
• Histogram
- Data distribution, in up to 200 steps
• DBCC SHOW_STATISTICS
Inspecting Statistics
13. • In SQL Server, cardinality estimation attempts to predict
the number of rows returned by a query or query
operator
• SQL Server 2014 and SQL Server 2016 include rewritten
cardinality estimation logic
• Several factors can cause poor cardinality estimation:
- Out-of-date or missing statistics
- Functions in predicates
- Table variables
Cardinality Estimation
14. • Automatic creation:
• When AUTO_CREATE_STATS = ON
• Single column statistics only
• Names start _WA…
• Manual creation:
• CREATE STATISTICS
• sp_createstats
Creating Statistics
16. • CREATE STATISTICS using a WHERE clause
• Some limitations:
- Limited benefit from AUTO_UPDATE_STATISTICS
- Limited to simple comparison logic
- Cannot be created on all data/column types
- Cannot be created on indexed views
• Histogram has finer resolution
Filtered Statistics
18. Index Internals
• Heap Internals
• Index Structure
• Picking the Right Index Key
• Single Column and Multicolumn Indexes
• Filtered Indexes
• The Query Optimizer’s Choice of Indexes
• Data Modification Internals
• Index Fragmentation
• Index Column Order
• Identify and Create Missing Indexes
19. • A table without a clustered index is a heap.
• IAM pages contain pointers to extents containing heap
data.
- The IAM is the only link between pages in a heap; row data is unordered.
• Heap rows are identified by a row identifier (RID).
- Nonclustered indexes on a heap contain RID pointers
Heap Internals
20. • B-trees:
- Self-balancing tree data structure
- One root node; many non-leaf nodes; many leaf nodes
- Clustered and nonclustered indexes are b-trees
• Index key:
- Defines the order of data in an index
• Clustered index:
- Root node and non-leaf nodes contain index pages; leaf nodes contain data pages
- Index order = table data order
• Nonclustered index:
- All nodes contain index pages
• XML and spatial data types have special indexes
Index Structure
21. • Clustered index key criteria:
- Unique
- Non-nullable
- Narrow
- Static
- Ever-increasing
• Nonclustered index criteria:
- Frequently used predicates
- Join columns
- Aggregate queries
- Avoid redundant/duplicate indexes, wide keys, indexes serving only one query for systems with
a lot of data change
Picking the Right Index Key
22. • Single column index:
- Each predicate column has its own index
- Less performance gain, but more reusable
• Multicolumn index:
- All predicate columns are included in the key of the index
- Greatest performance gain, but limited reusability
Single Column and Multicolumn Indexes
23. • Nonclustered index with a filter predicate:
- Filter predicate defined with a WHERE clause in the index definition
• Better performance than a whole-table index:
- Finer-grained statistics
- Smaller size
• Suitable when table data is queried in clearly identifiable
subsets
• Some restrictions apply
Filtered Indexes
24. • Data access methods for indexes
• Table scan
• Clustered index access
• Nonclustered index seek on heap
• Nonclustered index seek on non-heap
• Covering nonclustered index
• Predicate SARGability:
• WHERE <column> <operator> <value>
• <column> must appear alone
• Leading string wildcards prevent index seek
The Query Optimizer’s Choice of Indexes
25. • Delete:
- Key reference removed from leaf-level page
- A page with no references is removed from the b-tree
• Insert:
- New key reference added to either:
- Existing free space on a page
- New page
• Update:
- A delete followed by an insert
Data Modification Internals
27. • SELECT performance:
- Only the first column has a histogram
- Use the most selective column as the first column, but consider how the
index will be used
• INSERT performance:
- Consider the effect column order will have on INSERT performance for a
clustered multicolumn index
Index Column Order
28. • Query Plans
• Query Store
• Database Engine Tuning Advisor
• Missing index DMVs
• Tools will not:
- Suggest clustered indexes
- Suggest modifications to existing indexes
- Analyze column order in multicolumn indexes
- Suggest filtered indexes
Identify and Create Missing Indexes