3. WHAT WE’RE GOING
TO TALK ABOUT
• Why indexes?
• How indexes are used
• DB Storage Basics
• Index Structure
• Clustered indexes vs non-clustered
• Includes, Fill factor and covering
11. TYPES OF INDEXES
• Clustered
• Nonclustered
• Heaps
• Column Store
• XML
• Spatial
• Full-Text
12. HOW DO INDEXES
WORK?
In order to understand the magic of
indexes we need to understand how
SQL Server stores data.
13. HOW SQL SERVER
STORES INFORMATION
Page Header
Pages
•
•
Offset Array
Used to store just about
everything.
•
Page Header: Metadata
about the page. (Page
Number, Type, etc.)
•
Records
Page Size: 8K
Offset Array: Pointers to
where each row begins.
14. HOW SQL SERVER
STORES INFORMATION
Page Header
Page Header
Page Header
Row 1
Row 1
Row 1
Row 2
Row 2
Row 3
1
2
1
3
2
1
18. HOW ARE INDEXES
STORED
CREATE NONCLUSTERED INDEX
IX_Posts_OwnerUserId
ON Posts (OwnerUserId)
SELECT UserId, Title
FROM
Posts
WHERE UserId = 160
1400
1200
201 400
1100
150
101 200
51 100
101 150
201 300
151 200
201250
301 400
251 300
301 350
351 400
19. CLUSTERED INDEXES
• What is a table, Alex?
• Only one per table!
• Can the key be more than one
column.
• Must be unique (SQL Server will
help if it isn’t.
• Can’t be more than 900 bytes
combined
21. NONCLUSTERED
INDEXES
• Subset of data optimized for
searching.
• Key can be more than one column
• Can’t be more than 900 bytes
combined
• Doesn’t have to be unique.
• Max of 999 per table.
(If I find 999 indexes on your table I will find you…)
24. DEMO REVIEW
• Key lookups are slow. Use Includes
to combat.
• Use columns in the index key if you
intend on searching them.
• Index order matters!
• Suggested indexes may not be the
best way to go.
25. SORRY, YOUR PAGE IS
IN ANOTHER CASTLE
Page Header
Page Header
Id: 2
Id: 6
Id: 3
Id: 7
Id: 12
Id: 14
5
4
3
2
1
3
2
1
26. FILL FACTOR
• Minimizes page splits by leaving empty
space in the page.
• Is a number 0 to 100.
• A value of 80 means that 80% of the
page will be filled with data.
• 0 and 100 both mean that 100% of the
page will be filled with data.
• Only applies to leaf pages
27. DISADVANTAGES OF
INDEXES
• Increases storage.
• Creating duplicate copies of data.
• Insert, Updates and Deletes are slower.
• The DB has to delete data from the table
and all of the related indexes.
• Additional maintenance
Working on an app. Spent the last two months day and night. Finally got it delivered. The users are shocked. Like they got hit with a bolt of lightening. (Slide Doc with shock. Comic bubble "Great Scott"). Really slow, dog slow, 1955 slow. You walk into DBA's office and get an ear full (Slide: Ozar with pan). He starts mumbling something about there being no indexes. You run out of his office shamed and a healthy fear of your life.
Doc Slide
Clustered: With a clustered index, one or more columns are selected as the key columns for the index. These columns are used to sort and store the data in the table. The column( s) used as the key columns for a clustered index are selected based on the most frequently used data path to the records in the table. Only one Clustered index per table.Nonclustered: In a nonclustered index, columns are selected and sorted based on their values. These columns contain a reference to the clustered index or heap location of the data they are related to. There can be many nonclustered indexes on a table, in fact up to 999 nonclustered indexes.Heap: In a heap, the first row added to the index is the first record in the table, the second row is the second record in the table, the third row is the third record in the table, and so on. There is nothing in the data that is used to specify the order in which the data has been added. The data and records are in the table without any particular order.Column Store: Column store indexes are completely new to SQL Server 2012. Traditionally, indexes are stored in row-based organization, also known as row store. This form of storage is extremely efficient when one row or a small range is requested. When a large range or all rows are returned, this organization can become inefficient. The column store index favors the return of large ranges of rows by storing data in column-wise organization. When you create a column store index, you typically include all the columns in a table. This ensures that all columns are included in the enhanced performance benefits of the column store organization. In a column store index, instead of storing all of the columns for a record together, each column is stored separately with all of the other rows in an index. The benefit of this type of index is that only the columns and rows required for a query need to be read. In data warehousing scenarios, often less than 15 percent of the columns in an index are needed for the results of a query. 1 Column store indexes do have a few restrictions on them when compared to other indexes. To begin with, data modifications, such as those through INSERT, UPDATE, and DELETE statements, are disallowed. For this reason, column store indexes are ideally situated for large data warehouses where the data is not changed that frequently. They also take significantly longer to create; at the time of this writing, they average two to three times longer than the time to create a similar nonclustered index.XML: For every node in an XML document an entry is made in the XML index. This information is persisted in internal tables that SQL Server can use to determine whether the XML document contains the data that is being queried. Creating and maintaining XML indexes can be quite costly. Every time the index is updated, it needs to shred all of the nodes of the XML document into the XML index. The larger the XML document, the more costly this process will be. However, if data in an XML column will be queried often, the cost of creating and maintaining an XML index can be offset quickly by removing the need to shred all of the XML documents at runtime.Spatial: Spatial indexes dissect the spatial information that is provided into a four-level representation of the data. This representation allows SQL Server to plot out the spatial information, both geometry and geography, in the record to determine where rows overlap and the proximity of one point to another point. There are a few restrictions that exist with spatial indexes. The main restriction is that spatial indexes must be created on tables that have primary keys. Without a primary key, the spatial index creation will not succeed. When creating spatial indexes, they are restricted utilizing parallel processing, and only a single spatial index can be built at a time. Also, spatial indexes cannot be used on indexed views.Jason Strate, Ted Krueger. Expert Performance Indexing for SQL Server 2012 (Kindle Locations 461-472). Apress. Full-Text: Full-text queries perform linguistic searches against text data in full-text indexes by operating on words and phrases based on rules of a particular language such as English or Japanese. Full-text queries can include simple words and phrases or multiple forms of a word or phrase. A full-text query returns any documents that contain at least one match (also known as a hit). A match occurs when a target document contains all the terms specified in the full-text query, and meets any other search conditions, such as the distance between the matching terms.http://technet.microsoft.com/en-us/library/ms142571.aspx
Page Header: The header is 96 bytes and contains meta-information about the page, such as the page number, the owning object, and type of page. At the end of the page is the offset array. Offset Array: The offset array is 36 bytes and provides pointers to the byte location of the start of rows on the page. Reading from the end of the page backwards, the offset can be used to identify the starting position of every row, sometimes referred to as a slot, on the page.Jason Strate, Ted Krueger. Expert Performance Indexing for SQL Server 2012 (Kindle Locations 879-881). Apress.
Page Header: The header is 96 bytes and contains meta-information about the page, such as the page number, the owning object, and type of page. At the end of the page is the offset array. Offset Array: The offset array is 36 bytes and provides pointers to the byte location of the start of rows on the page. Reading from the end of the page backwards, the offset can be used to identify the starting position of every row, sometimes referred to as a slot, on the page.Jason Strate, Ted Krueger. Expert Performance Indexing for SQL Server 2012 (Kindle Locations 879-881). Apress.
The first page of the B-tree is an index page and is often referred to as the root level of the index. As an index page, the root level contains key values and page addresses for the next pages in the index. Depending on the size of the index, the next level of the index may be data pages or additional index pages. If the number of index rows required to sort all of the rows on the data pages exceeds the space available, then the root page will be followed by another level of index pages. Additional levels of index pages in a B-tree are referred to as intermediate levels. In many cases, indexes built with a B-tree structure will not require more than one or two intermediate levels. Even with a wide indexing key, millions to billions of rows can be sorted with just a few levels. The next level of pages below the root and intermediate levels of the indexes, referred to as the non-leaf levels, is the leaf level (see Figure 2-9). The leaf level contains all of the data pages for the index. The data pages are where all of the key values and the non-key values for the row are stored. Non-key values are never stored on the Index pages. Another differentiator between heaps and B-trees is the ability within the index levels to perform sequential page reads. Pages contain previous page and next page properties in the page headers. With index and data pages, these properties are populated and can be used to traverse the B-tree to find the next requested row from the B-tree without returning to the root level of the index.Jason Strate, Ted Krueger. Expert Performance Indexing for SQL Server 2012 (Kindle Locations 1081-1088). Apress.
The first page of the B-tree is an index page and is often referred to as the root level of the index. As an index page, the root level contains key values and page addresses for the next pages in the index. Depending on the size of the index, the next level of the index may be data pages or additional index pages. If the number of index rows required to sort all of the rows on the data pages exceeds the space available, then the root page will be followed by another level of index pages. Additional levels of index pages in a B-tree are referred to as intermediate levels. In many cases, indexes built with a B-tree structure will not require more than one or two intermediate levels. Even with a wide indexing key, millions to billions of rows can be sorted with just a few levels. The next level of pages below the root and intermediate levels of the indexes, referred to as the non-leaf levels, is the leaf level (see Figure 2-9). The leaf level contains all of the data pages for the index. The data pages are where all of the key values and the non-key values for the row are stored. Non-key values are never stored on the Index pages. Another differentiator between heaps and B-trees is the ability within the index levels to perform sequential page reads. Pages contain previous page and next page properties in the page headers. With index and data pages, these properties are populated and can be used to traverse the B-tree to find the next requested row from the B-tree without returning to the root level of the index.Jason Strate, Ted Krueger. Expert Performance Indexing for SQL Server 2012 (Kindle Locations 1081-1088). Apress.
The first page of the B-tree is an index page and is often referred to as the root level of the index. As an index page, the root level contains key values and page addresses for the next pages in the index. Depending on the size of the index, the next level of the index may be data pages or additional index pages. If the number of index rows required to sort all of the rows on the data pages exceeds the space available, then the root page will be followed by another level of index pages. Additional levels of index pages in a B-tree are referred to as intermediate levels. In many cases, indexes built with a B-tree structure will not require more than one or two intermediate levels. Even with a wide indexing key, millions to billions of rows can be sorted with just a few levels. The next level of pages below the root and intermediate levels of the indexes, referred to as the non-leaf levels, is the leaf level (see Figure 2-9). The leaf level contains all of the data pages for the index. The data pages are where all of the key values and the non-key values for the row are stored. Non-key values are never stored on the Index pages. Another differentiator between heaps and B-trees is the ability within the index levels to perform sequential page reads. Pages contain previous page and next page properties in the page headers. With index and data pages, these properties are populated and can be used to traverse the B-tree to find the next requested row from the B-tree without returning to the root level of the index.Jason Strate, Ted Krueger. Expert Performance Indexing for SQL Server 2012 (Kindle Locations 1081-1088). Apress.
To understand how indexes make your query fast you must understand how SQL Server runs a query. SQL Server creates a plan when the query runs. That plan is saved by SQL Server in the Plan Cache so if you run the query again it doesn't have to create the plan it can use the one in the cache.