SlideShare una empresa de Scribd logo
1 de 57
Descargar para leer sin conexión
SQL Server Data Indexing
Clustered Tables vs Heap Tables
• If a table has no indexes or only has non-clustered indexes it is called a heap
An age old question is whether or not a table must have a clustered index. The
answer is no, but in most cases, it is a good idea to have a clustered index on the
table to store the data in a specific order.
• The name suggests itself, these tables have a Clustered Index. Data is stored in a
specific order based on a Clustered Index key.
Cluster table
Heap Tables
Clustered Tables vs Heap Tables
HEAP
• Data is not stored in any particular
order
• Specific data can not be retrieved
quickly, unless there are also non-
clustered indexes.
• Data pages are not linked, so
sequential access needs to refer back
to the index allocation map (IAM)
pages
• Since there is no clustered index,
additional time is not needed to
maintain the index
• Since there is no clustered index, there
is not the need for additional space to
store the clustered index tree
• These tables have a index_id value of 0
in the sys.indexes catalog view
Clustered Table
• Data is stored in order based on the
clustered index key
• Data can be retrieved quickly based on the
clustered index key, if the query uses the
indexed columns
• Data pages are linked for faster sequential
access
• Additional time is needed to maintain
clustered index based on INSERTS,
UPDATES and DELETES
• Additional space is needed to store
clustered index tree
• These tables have a index_id value of 1 in
the sys.indexes catalog view
Clustered Tables vs Heap Tables
Types of Indexes
• Clustered index
• Nonclustered index
• Unique index
• Filtered index
• Covered Index
• Columnstore index
• Non-Key Index Columns
• Implied indexes
Created by some constraints
i. Primary Key
ii. Unique
Types of Indexes
• Full-text index
A special type of token-based functional index that is built and maintained by
the Microsoft Full-Text Engine for SQL Server. It provides efficient support for
sophisticated word searches in character string data.
• Spatial index
A spatial index provides the ability to perform certain operations more
efficiently on spatial objects (spatial data) in a column of the geometry data
type.
Types of Indexes
SQL Server Index Basics
Clustered Index
• The top-most node of this tree is called
the "root node"
• The bottom level of the nodes is called
"leaf nodes"
• Any index level between the root node
and leaf node is called an "intermediate
level"
• The leaf nodes contain the data pages of
the table in the case of a cluster index.
• The root and intermediate nodes
contain index pages holding an index
row.
• Each index row contains a key value and
pointer to intermediate level pages of
the B-tree or leaf level of the index.
• The pages in each level of the index are
linked in a doubly-linked list.
Clustered Index
Database and leaf node
Root Abby Bob Carol Dave
Abby Ada Andy Ann
Ada Alan Amanda Amy
• A clustered index
sorts and stores the
data rows of the table
or view in order based
on the clustered index
key.
• The clustered index is
implemented as a B-
tree index structure
that supports fast
retrieval of the rows,
based on their
clustered index key
values.
The basic syntax to create a clustered index is
CREATE CLUSTERED INDEX Index_Name ON Schema.TableName(Column);
• A clustered index stores the data for the table based on the columns defined in the
create index statement. As such, only one clustered index can be defined for the
table because the data can only be stored and sorted one way per table.
Nonclustered Index
• Index Leaf Nodes and Corresponding Table Data
• Each index entry consists of the
indexed columns (the key,
column 2) and refers to the
corresponding table row
(via ROWID or RID).
• Unlike the index, the table data is
stored in a heap structure and is
not sorted at all.
• There is neither a relationship
between the rows stored in the
same table block nor is there any
connection between the blocks.
Nonclustered Index
Database
Root Abby Bob Carol Dave
Amy Ada Amanda Alan
Leaf node
Abby Ada Andy Ann
Ada Alan Amanda Amy
• A nonclustered index can be
defined on a table or view
with a clustered index or on a
heap.
• Each index row in the
nonclustered index contains
the nonclustered key value
and a row locator
The basic syntax for a nonclustered index is
CREATE INDEX Index_Name ON Schema.TableName(Column);
• SQL Server supports
up to 999
nonclustered
indexes per table.
CLUSTERED VS. NONCLUSTERED INDEXES
• Clustered index: a SQL Server index that sorts and stores data
rows in a table, based on key values.
• Nonclustered index: a SQL Server index which contains a key
value and a pointer to the data in the heap or clustered index.
• The difference between clustered and nonclustered SQL
Server indexes is that
• a clustered index controls the physical order of the data pages.
• The data pages of a clustered index will always include all the columns
in the table, even if you only create the index on one column.
• The column(s) you specify as key columns affect how the pages are
stored in the B-tree index structure
• A nonclustered index does not affect the ordering and storing of the
data
Clustered and Nonclustered Indexes Interact
• Clustered indexes are always unique
– If you don’t specify unique when creating them, SQL Server may
add a “uniqueifier” to the index key
• Only used when there actually is a duplicate
• Adds 4 bytes to the key
• The clustering key is used in nonclustered indexes
– This allows SQL Server to go directly to the record from the
nonclustered index
– If there is no clustered index, a record identifier will be used instead
1 Jones John
2 Smith Mary
3 Adams Mark
4 Douglas Susan
Adams 3
Douglas 4
Jones 1
Smith 2
Leaf node of a clustered
index on EmployeeID
Leaf node of a nonclustered
index on LastName
Clustered and Nonclustered Indexes Interact
(continued)
• Another reason to keep the clustering key small!
• Consider the following query:
SELECT LastName, FirstName
FROM Employee
WHERE LastName = 'Douglas'
• When SQL Server uses the nonclustered index, it
– Traverses the nonclustered index until it finds the desired key
– Picks up the associated clustering key
– Traverses the clustered index to find the data
Deciding what indexes go where?
• Indexes speed access, but costly to maintain
– Almost every update to table requires altering both data pages
and every index.
• All inserts and deletions affect all indexes
• Many updates will affect non-clustered indexes
• Sometimes less is more
– Not creating an index sometimes may be best
• Code for tranasaction have where clause? What columns used?
Sort requried?
• Selectivity
– Indexes, particularly non-clustered indexes, are primarily beneficial in
situations where there is a reasonably HIGH LEVEL of Selectivity within
the index.
• % of values in column that are unique
• Higher percentage of unique values, the higher the selectivity
– If 80% of parts are either ‘red’ or ‘green’ not very selective
Deciding what indexes go where?
Choosing Clustered Index
• Only one per table! - Choose wisely
• Default, primary key creates clustered index
– Do you really want your prime key to be clustered index?
– Option: create table foo myfooExample
(column1 int identify
primary key nonclustered
column2 ….
)
– Changing clustered index can be costly
• How long? Do I have enough space?
Clustered Indexes Pros & Cons
• Pros
– Clustered indexes best for queries where columns in question will
frequently be the subject of
• RANGE query (e.g., between)
• Group by with max, min, count
– Search can go straight to particular point in data and just keep reading
sequentially from there.
– Clustered indexes helpful with order by based on clustered key
Clustered Indexes Pros & Cons
• The Cons – two situations
– Don’t use clustered index on column just because seems thing to do
(e.g., primary key default)
– Lots of inserts in non-sequential order
• Constant page splits, include data page as well as index pages
• Choose clustered key that is going to be sequential inserting
• Don’t use a clustered index at all perhaps?
These are limits, not goals. Every index you create will take up space in your
database. The index will also need to be modified when inserts, updates, and
deletes are performed. This will lead to CPU and disk overhead, so craft indexes
carefully and test them thoroughly
There are a few limits to indexes.
• There can be only one clustered index per table.
• SQL Server supports up to 999 nonclustered indexes per table.
• An index – clustered or nonclustered – can be a maximum of 16 columns and
900 bytes.
Limits to indexes
PRIMARY KEY AS A CLUSTERED INDEX
• Primary key: a constraint to enforce uniqueness in a table. The primary key
columns cannot hold NULL values.
• In SQL Server, when you create a primary key on a table, if a clustered index
is not defined and a nonclustered index is not specified, a unique clustered
index is created to enforce the constraint.
• However, there is no guarantee that this is the best choice for a clustered
index for that table.
• Make sure you are carefully considering this in your indexing strategy.
Unique Index
• An index that ensures the uniqueness of each value in the indexed column.
• If the index is a composite, the uniqueness is enforced across the columns as a whole,
not on the individual columns.
• For example, • if you were to create an index on the FirstName and LastName
columns in a table, the names together must be unique, but the
individual names can be duplicated.
• A unique index is automatically created when you define a primary key or unique
constraint:
• Primary key: When you define a primary key constraint on one or more
columns, SQL Server automatically creates a unique, clustered index if a
clustered index does not already exist on the table or view. However, you can
override the default behavior and define a unique, nonclustered index on the
primary key.
• Unique: When you define a unique constraint, SQL Server automatically creates
a unique, nonclustered index. You can specify that a unique clustered index be
created if a clustered index does not already exist on the table.
• A unique index ensures that the index key contains no duplicate values.
Both clustered and nonclustered indexes can be unique.
Filtered index
• An optimized nonclustered index, especially suited to cover queries that select from a
well-defined subset of data.
• SQL Server 2008 introduces Filtered Indexes which is an index with a WHERE clause
• Filtered indexes can provide the following advantages over full-table indexes:
• Improved query performance and plan quality
• Reduced index maintenance costs
• Reduced index storage costs
A well-designed filtered index improves query performance and execution plan quality
because it is smaller than a full-table nonclustered index and has filtered statistics
An index is maintained only when data manipulation language (DML) statements affect
the data in the index. A filtered index reduces index maintenance costs compared with a
full-table nonclustered index because it is smaller and is only maintained when the data
in the index is changed.
Creating a filtered index can reduce disk storage for nonclustered indexes when a full-table
index is not necessary.
Filtered index
Design Considerations
• When a column only has a small number of relevant values for queries, you can create a
filtered index on the subset of values. For example, when the values in a column are mostly
NULL and the query selects only from the non-NULL values, you can create a filtered index for
the non-NULL data rows. The resulting index will be smaller and cost less to maintain than a
full-table nonclustered index defined on the same key columns.
• When a table has heterogeneous data rows, you can create a filtered index for one or more
categories of data. This can improve the performance of queries on these data rows by narrowing
the focus of a query to a specific area of the table. Again, the resulting index will be smaller and
cost less to maintain than a full-table nonclustered index.
SELECT ComponentID, StartDate FROM Production.BillOfMaterials
WITH ( INDEX ( FIBillOfMaterialsWithEndDate ) ) WHERE EndDate
IN ('20000825', '20000908', '20000918');
To ensure that a filtered index is used in a SQL query
CREATE NONCLUSTERED INDEX
FIBillOfMaterialsWithEndDate ON
Production.BillOfMaterials (ComponentID,
StartDate) WHERE EndDate IS NOT NULL ;
Covering Indexes
• When a nonclustered index includes all the data requested in a query (both the items
in the SELECT list and the WHERE clause), it is called a covering index
• With a covering index, there is no need to access the actual data pages
– Only the leaf nodes of the nonclustered index are accessed
– For example, your query might retrieve the FirstName ,LastName and DOB columns from a
table, based on a value in the ContactID column. You can create a covering index that
includes all three columns.
• Because the leaf node of a clustered index is the data itself, a clustered index covers all
queries
Leaf node of a nonclustered index on LastName, FirstName, Birthdate
Adams Mark 1/14/1956 3
Douglas Susan 12/12/1947 4
Jones John 4/15/1967 1
Smith Mary 7/14/1970 2
The last column is EmployeeID.
Remember that the clustering key
is always included in a
nonclustered index.
Non-Key Index Columns
• SQL Server 2005 and later allow you to include columns in a non-clustered
index that are not part of the key
– Allows the index to cover more queries
– Included columns only appear in the leaf level of the index
– Up to 1,023 additional columns
– Can include data types that cannot be key columns
• Except text, ntext, and image data types
• Syntax
CREATE [ UNIQUE ] NONCLUSTERED INDEX index_name
ON <object> ( column [ ASC | DESC ] [ ,...n ] )
[ INCLUDE ( column_name [ ,...n ] ) ]
• Example
CREATE NONCLUSTERED INDEX NameRegion_IDX
ON Employees(LastName)
INCLUDE (Region)
KEY VS. NONKEY COLUMNS
• Key columns: the columns specified to create a clustered or nonclustered index.
• Nonkey columns: columns added to the INCLUDE clause of a nonclustered index.
• The basic syntax to create a nonclustered index with nonkey columns is:
• CREATE INDEX Index_Name ON Schema.TableName(Column) INCLUDE
(ColumnA, ColumnB);
• A column cannot be both a key and a non-key. It is either a key column or a non-
key, included column.
• The difference lies in where the data about the column is stored in the B-tree.
Clustered and nonclustered key columns are stored at every level of the index –
the columns appear on the leaf and all intermediate levels. A nonkey column will
only be stored at the leaf level, however.
• There are benefits to using non-key columns.
• Columns can be accessed with an index scan.
• Data types not allowed in key columns are allowed in nonkey columns. All data
types but text, ntext, and image are allowed.
• Included columns do not count against the 900 byte index key limit enforced by
SQL Server.
The query we want to use is
SELECT ProductID, Name, ProductNumber, Color
FROM dbo.Products
WHERE Color = 'Black';
The first index is nonclustered, with two key columns:
CREATE INDEX IX_Products_Name_ProductNumber ON dbo.Products(Name,
ProductNumber);
The second is also nonclustered, with two key columns and three nonkey
columns:
CREATE INDEX IX_Products_Name_ProductNumber_ColorClassStyle ON
dbo.Products(Name, ProductNumber)
INCLUDE (Color, Class, Style);
In this case, the first index would not be a covering index for that query. The
second index would be a covering index for that specific query.
COVERING INDEXES EXAMPLES
Column Store Index Basic
There are two types of storage available in the database; RowStore and ColumnStore.
In RowStore, data rows are placed sequentially on a page while in ColumnStore values
from a single column, but from multiple rows are stored adjacently. So a ColumnStore
Index works using ColumnStore storage.
We cannot perform DML ( Insert Update  Delete ) operations on a table having a
ColumnStore Index, because this puts the data in a Read Only mode.
So one big advantage of using this feature is a Data Warehouse where most operations
are read only.
Creating Column Store Index
Creating a ColumnStore Index is the same as creating a NonClustered Index except we
need to add the ColumnStore keyword as shown below.
The syntax of a ColumnStore Index is:
CREATE NONCLUSTERED COLUMNSTORE INDEX ON Table_Name
(Column1,Column2,... Column N)
Example:
-- Creating Non - CLustered ColumnStore Index on 3 Columns
CREATE NONCLUSTERED COLUMNSTORE INDEX [ColumnStore__Test_Person]ON
[dbo].[Test_Person]([FirstName] , [MiddleName],[LastName])
• The cost when using the ColumnStore index is 4 times less than the
traditional non-clustered index.
Fill Factor
• When you create an index the fill factor option indicates how full the
leaf level pages are when the index is created or rebuilt.
• Valid values are 0 to 100.
• A fill factor of 0 means that all of the leaf level pages are full.
• If data is always inserted at the end of the table, then the fill factor could be
between 90 to 100 percent since the data will never be inserted into the
middle of a page.
• If the data can be inserted anywhere in the table then a fill factor of 60 to 80
percent could be appropriate based on the INSERT, UPDATE and DELETE
activity.
Introduction of sql server indexing
How SQL Server Indexes Work
B-Tree Index Data Structure
• SQL Server indexes are based on B-trees
– Special records called nodes that allow keyed access to data
– Two kinds of nodes are special
• Root
• Leaf
Intermediate node
Leaf
node
Data
pages
Root node A O
O T
T W
E IGCA T
A C E G I K M N O Q
A I
• If there are enough records, intermediate levels may be added as well.
• Clustered index leaf-level pages contain the data in the table.
• Nonclustered index leaf-level pages contain the key value and a pointer to the
data row in the clustered index or heap.
SQL Server B-Tree Rules
• Root and intermediate nodes point only to other nodes
• Only leaf nodes point to data
• The number of nodes between the root and any leaf is the same for all leaves
• B+tree can have more than 1 keys in a node, in fact thousands of keys is seen typically
stored in a node and hence, the branching factor of a B+tree is very large.
• B-trees are always sorted
• The tree will be maintained during insertion, deletion, and updating so that these rules are
met
– When records are inserted or updated, nodes may split
– When records are deleted, nodes may be collapsed
• B+trees have all the key values in their leaf nodes. All the leaf nodes of a B+tree are at
the same height, which implies that every index lookup will take same number of B+tree
lookups to find a value.
• Within a B+tree all leaf nodes are linked together in a linked-listed, left to right, and since
the values at the leaf nodes are sorted, so range lookups are very efficient.
What Is a Node?
• A page that contains key and pointer pairs
Key Pointer
Key Pointer
Key Pointer
Key Pointer
Key Pointer
Key Pointer
Key Pointer
Key Pointer
Splitting a B-Tree Node
Root (Level 0)
Node (Level 1)
Leaf (Level 2)
Abby Bob Carol Dave
Abby Ada Andy Ann
Ada Alan Amanda Amy
Bob Alan Amanda Carol Amy Dave Ada DB
Let’s Add Alice
• Step 1: Split the leaf node
Bob Alan Amanda Carol Amy Dave Ada Alice
Ada Alan Alice Amanda Amy
Adding Alice
• Step 2: Split the next level up
DB
Leaf
Abby Ada Amanda Andy Ann
Bob Alan Amanda Carol Amy Dave Ada Alice
Ada Alan Alice Amanda Amy
Adding Alice
(continued)• Split the root
DB
LeafAda Alan Alice
Bob Alan Amanda Carol Amy Dave Ada Alice
Amanda Amy
Andy Ann
Carol DaveAbby Andy Bob
Abby Ada Amanda
Adding Alice
(continued)
• When the root splits, the tree grows another level
Root (Level 0)
Node
(Level 1)
Node
(Level 2)
Leaf
(Level 3)
DB
Abby Carol
Amanda Amy
Bob Alan Amanda Carol Amy Dave Ada Alice
Ada Alan Alice
Abby Andy Bob
Abby Ada Amanda
Carol Dave
Andy Ann
Page splits cause fragmentation
• Two types of fragmentation
– Data pages in a clustered table
– Index pages in all indexes
• Fragmentation happens because these pages must be kept in order
• Data page fragmentation happens when a new record must be added to a page that is full
– Consider an Employee table with a clustered index on LastName, FirstName
– A new employee, Peter Dent, is hired
ExtentAdams, Carol
Ally, Kent
Baccus, Mary
David, Sue
Dulles, Kelly
Edom, Mike
Farly, Lee
Frank, Joe
Ollen, Carol
Oppus, Larry...
Data Page Fragmentation
Extent
ExtentDulles, Kelly
Edom, Mike ...
Adams, Carol
Ally, Kent
Baccus, Mary
David, Sue
Dent, Peter
Farly, Lee
Frank, Joe
Ollen, Carol
Oppus, Larry...
Index Fragmentation
• Index page fragmentation occurs when a new key-pointer pair must be
added to an index page that is full
– Consider an Employee table with a nonclustered index on Social
Security Number
• Employee 048-12-9875 is added
036-11-9987, pointer
036-33-9874, pointer
038-87-8373, pointer
046-11-9987, pointer
048-33-9874, pointer
052-87-8373, pointer
116-11-9987, pointer
116-33-9874, pointer ...
124-11-9987, pointer
124-33-9874, pointer
125-87-8373, pointer
Extent
Index Fragmentation
(continued)
Extent
Extent
036-11-9987, pointer
036-33-9874, pointer
038-87-8373, pointer
046-11-9987, pointer
048-12-9875, pointer
116-11-9987, pointer
116-33-9874, pointer ...
124-11-9987, pointer
124-33-9874, pointer
125-87-8373, pointer
048-33-9874, pointer
052-87-8373, pointer
...
Introduction of sql server indexing
How B+tree Indexes Impact
Performance
Why use B+tree?
• B+tree is used for an obvious reason and that is speed.
• As we know that there are space limitations when it comes to memory, and not
all of the data can reside in memory, and hence majority of the data has to be
saved on disk.
• Disk as we know is a lot slower as compared to memory because it has
moving parts.
• So if there were no tree structure to do the lookup, then to find a value in a
database, the DBMS would have to do a sequential scan of all the records.
• Now imagine a data size of a billion rows, and you can clearly see that
sequential scan is going to take very long.
• But with B+tree, its possible to store a
billion key values (with pointers to billion
rows) at a height of 3, 4 or 5, so that every
key lookup out of the billion keys is going
to take 3, 4 or 5 disk accesses, which is a
huge saving.
This goes to show the effectiveness of a
B+tree index, more than 16 million key
values can be stored in a B+tree of height
1 and every key value can be accessed in
exactly 2 lookups.
How is B+tree structured?
• B+trees are normally structured in such a
way that the size of a node is chosen
according to the page size.
• Why? Because whenever data is accessed
on disk, instead of reading a few bits, a
whole page of data is read, because that is
much cheaper.
• Let us look at an example,
Consider InnoDB whose page size is 16KB
• and suppose we have an index on a integer
column of size 4bytes
• So a node can contain at most 16 * 1024 /
4 = 4096 keys, and a node can have at
most 4097 children.
• So for a B+tree of height 1, the root node
has 4096 keys and the nodes at height 1
(the leaf nodes) have 4096 * 4097 =
16781312 key values.
• So the size of the index values have a direct bearing on performance!
How important is the size of the index values?
As can be seen from the above example, the size of the index values plays a very
important role for the following reasons:
• The longer the index, the less number of values that can fit in a node,
and hence the more the height of the B+tree.
• The more the height of the tree, the more disk accesses are needed.
• The more the disk accesses the less the performance.
Index Design
• For tables that are heavily updated, use as few columns as possible in the
index, and don’t over-index the tables.
• If a table contains a lot of data but data modifications are low, use as many
indexes as necessary to improve query performance
• For clustered indexes, try to keep the length of the indexed columns as short
as possible. Ideally, try to implement your clustered indexes on unique
columns that do not permit null values.
• The uniqueness of values in a column affects index performance. In general,
the more duplicate values you have in a column, the more poorly the index
performs.
Index design should take into account a number of considerations.
Index Design
• In addition, indexes are automatically updated when the data rows themselves
are updated, which can lead to additional overhead and can affect
performance.
• Due to the storage and sorting impacts, be sure to carefully determine the best
column for this index.
• The number of columns in the clustered (or non clustered) index can have
significant performance implications with heavy INSERT, UPDATE and DELETE
activity in your database.
• For composite indexes, take into consideration the order of the columns in the
index definition. Columns that will be used in comparison expressions in the
WHERE clause (such as WHERE FirstName = 'Charlie') should be listed first.
• You can also index computed columns if they meet certain requirements. For
example, the expression used to generate the values must be deterministic
(which means it always returns the same result for a specified set of inputs).
Identifying Fragmentation vs. page
splits
DBCC SHOWCONTIG
Page 283
Resolving Fragmentation
Heap Tables:
• For heap tables this is not as easy. The following are different options you can
take to resolve the fragmentation:
• Create a clustered index
• Create a new table and insert data from the heap table into the new table
based on some sort order
• Export the data, truncate the table and import the data back into the table
Clustered Tables:
• Resolving the fragmentation for a clustered table can be done easily by
rebuilding or reorganizing your clustered index. This was shown in this
previous tip: SQL Server 2000 to 2005 Crosswalk - Index Rebuilds.
DBCC DBREINDEX
DBCC INDEXDEFRAG
( { database_name | database_id | 0 }
, { table_name | table_id}
, { index_name | index_id }
)
Introduction of sql server indexing
Mahabubur Rahaman
Senior Database Architect
Orion Informatics Ltd

Más contenido relacionado

La actualidad más candente (20)

SQL Constraints
SQL ConstraintsSQL Constraints
SQL Constraints
 
sql function(ppt)
sql function(ppt)sql function(ppt)
sql function(ppt)
 
The Relational Model
The Relational ModelThe Relational Model
The Relational Model
 
Integrity Constraints
Integrity ConstraintsIntegrity Constraints
Integrity Constraints
 
MS Sql Server: Creating Views
MS Sql Server: Creating ViewsMS Sql Server: Creating Views
MS Sql Server: Creating Views
 
SQL Queries
SQL QueriesSQL Queries
SQL Queries
 
Integrity constraints in dbms
Integrity constraints in dbmsIntegrity constraints in dbms
Integrity constraints in dbms
 
MySQL Views
MySQL ViewsMySQL Views
MySQL Views
 
Introduction to-sql
Introduction to-sqlIntroduction to-sql
Introduction to-sql
 
Sql Constraints
Sql ConstraintsSql Constraints
Sql Constraints
 
Joins in SQL
Joins in SQLJoins in SQL
Joins in SQL
 
SQL Joins and Query Optimization
SQL Joins and Query OptimizationSQL Joins and Query Optimization
SQL Joins and Query Optimization
 
Sql joins
Sql joinsSql joins
Sql joins
 
Trigger
TriggerTrigger
Trigger
 
SQL Commands
SQL Commands SQL Commands
SQL Commands
 
Database index by Reema Gajjar
Database index by Reema GajjarDatabase index by Reema Gajjar
Database index by Reema Gajjar
 
Sql joins
Sql joinsSql joins
Sql joins
 
Chapter 3 stored procedures
Chapter 3 stored proceduresChapter 3 stored procedures
Chapter 3 stored procedures
 
1 - Introduction to PL/SQL
1 - Introduction to PL/SQL1 - Introduction to PL/SQL
1 - Introduction to PL/SQL
 
5. stored procedure and functions
5. stored procedure and functions5. stored procedure and functions
5. stored procedure and functions
 

Similar a Introduction of sql server indexing

9. index and index organized table
9. index and index organized table9. index and index organized table
9. index and index organized tableAmrit Kaur
 
Database Indexes
Database IndexesDatabase Indexes
Database IndexesSperasoft
 
Introduction to NOSQL quadrants
Introduction to NOSQL quadrantsIntroduction to NOSQL quadrants
Introduction to NOSQL quadrantsViswanath J
 
dotnetMALAGA - Sql query tuning guidelines
dotnetMALAGA - Sql query tuning guidelinesdotnetMALAGA - Sql query tuning guidelines
dotnetMALAGA - Sql query tuning guidelinesJavier García Magna
 
We Don't Need Roads: A Developers Look Into SQL Server Indexes
We Don't Need Roads: A Developers Look Into SQL Server IndexesWe Don't Need Roads: A Developers Look Into SQL Server Indexes
We Don't Need Roads: A Developers Look Into SQL Server IndexesRichie Rump
 
SQLDay2013_Denny Cherry - Table indexing for the .NET Developer
SQLDay2013_Denny Cherry - Table indexing for the .NET DeveloperSQLDay2013_Denny Cherry - Table indexing for the .NET Developer
SQLDay2013_Denny Cherry - Table indexing for the .NET DeveloperPolish SQL Server User Group
 
SAG_Indexing and Query Optimization
SAG_Indexing and Query OptimizationSAG_Indexing and Query Optimization
SAG_Indexing and Query OptimizationVaibhav Jain
 
No sql or Not only SQL
No sql or Not only SQLNo sql or Not only SQL
No sql or Not only SQLAjay Jha
 
Sql server ___________session_17(indexes)
Sql server  ___________session_17(indexes)Sql server  ___________session_17(indexes)
Sql server ___________session_17(indexes)Ehtisham Ali
 
Indexing and-hashing
Indexing and-hashingIndexing and-hashing
Indexing and-hashingAmi Ranjit
 
Query Optimization in SQL Server
Query Optimization in SQL ServerQuery Optimization in SQL Server
Query Optimization in SQL ServerRajesh Gunasundaram
 
Sql query performance analysis
Sql query performance analysisSql query performance analysis
Sql query performance analysisRiteshkiit
 
Database indexing techniques
Database indexing techniquesDatabase indexing techniques
Database indexing techniquesahmadmughal0312
 
SQL Joins Basic and Fundamentals
SQL Joins Basic and FundamentalsSQL Joins Basic and Fundamentals
SQL Joins Basic and FundamentalsPratikKhodwe1
 

Similar a Introduction of sql server indexing (20)

9. index and index organized table
9. index and index organized table9. index and index organized table
9. index and index organized table
 
Database Indexes
Database IndexesDatabase Indexes
Database Indexes
 
Introduction to NOSQL quadrants
Introduction to NOSQL quadrantsIntroduction to NOSQL quadrants
Introduction to NOSQL quadrants
 
dotnetMALAGA - Sql query tuning guidelines
dotnetMALAGA - Sql query tuning guidelinesdotnetMALAGA - Sql query tuning guidelines
dotnetMALAGA - Sql query tuning guidelines
 
Sql performance tuning
Sql performance tuningSql performance tuning
Sql performance tuning
 
We Don't Need Roads: A Developers Look Into SQL Server Indexes
We Don't Need Roads: A Developers Look Into SQL Server IndexesWe Don't Need Roads: A Developers Look Into SQL Server Indexes
We Don't Need Roads: A Developers Look Into SQL Server Indexes
 
Statistics and Indexes Internals
Statistics and Indexes InternalsStatistics and Indexes Internals
Statistics and Indexes Internals
 
SQLDay2013_Denny Cherry - Table indexing for the .NET Developer
SQLDay2013_Denny Cherry - Table indexing for the .NET DeveloperSQLDay2013_Denny Cherry - Table indexing for the .NET Developer
SQLDay2013_Denny Cherry - Table indexing for the .NET Developer
 
SAG_Indexing and Query Optimization
SAG_Indexing and Query OptimizationSAG_Indexing and Query Optimization
SAG_Indexing and Query Optimization
 
Index
IndexIndex
Index
 
Types of no sql databases
Types of no sql databasesTypes of no sql databases
Types of no sql databases
 
No sql or Not only SQL
No sql or Not only SQLNo sql or Not only SQL
No sql or Not only SQL
 
Sql server ___________session_17(indexes)
Sql server  ___________session_17(indexes)Sql server  ___________session_17(indexes)
Sql server ___________session_17(indexes)
 
Indexing and-hashing
Indexing and-hashingIndexing and-hashing
Indexing and-hashing
 
Tunning overview
Tunning overviewTunning overview
Tunning overview
 
Query Optimization in SQL Server
Query Optimization in SQL ServerQuery Optimization in SQL Server
Query Optimization in SQL Server
 
Sql query performance analysis
Sql query performance analysisSql query performance analysis
Sql query performance analysis
 
Database indexing techniques
Database indexing techniquesDatabase indexing techniques
Database indexing techniques
 
SQL Joins Basic and Fundamentals
SQL Joins Basic and FundamentalsSQL Joins Basic and Fundamentals
SQL Joins Basic and Fundamentals
 
Database tables
Database tablesDatabase tables
Database tables
 

Más de Mahabubur Rahaman

Transaction isolationexamples
Transaction isolationexamplesTransaction isolationexamples
Transaction isolationexamplesMahabubur Rahaman
 
supporting t-sql scripts for IndexPage, Datapage and IndexDefragmentation
supporting t-sql scripts for IndexPage, Datapage and IndexDefragmentationsupporting t-sql scripts for IndexPage, Datapage and IndexDefragmentation
supporting t-sql scripts for IndexPage, Datapage and IndexDefragmentationMahabubur Rahaman
 
supporting t-sql scripts for Heap vs clustered table
supporting t-sql scripts for Heap vs clustered tablesupporting t-sql scripts for Heap vs clustered table
supporting t-sql scripts for Heap vs clustered tableMahabubur Rahaman
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemMahabubur Rahaman
 

Más de Mahabubur Rahaman (6)

Transaction isolationexamples
Transaction isolationexamplesTransaction isolationexamples
Transaction isolationexamples
 
Lock basicsexamples
Lock basicsexamplesLock basicsexamples
Lock basicsexamples
 
Sql server concurrency
Sql server concurrencySql server concurrency
Sql server concurrency
 
supporting t-sql scripts for IndexPage, Datapage and IndexDefragmentation
supporting t-sql scripts for IndexPage, Datapage and IndexDefragmentationsupporting t-sql scripts for IndexPage, Datapage and IndexDefragmentation
supporting t-sql scripts for IndexPage, Datapage and IndexDefragmentation
 
supporting t-sql scripts for Heap vs clustered table
supporting t-sql scripts for Heap vs clustered tablesupporting t-sql scripts for Heap vs clustered table
supporting t-sql scripts for Heap vs clustered table
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 

Último

Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 

Último (17)

Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 

Introduction of sql server indexing

  • 1. SQL Server Data Indexing
  • 2. Clustered Tables vs Heap Tables • If a table has no indexes or only has non-clustered indexes it is called a heap An age old question is whether or not a table must have a clustered index. The answer is no, but in most cases, it is a good idea to have a clustered index on the table to store the data in a specific order. • The name suggests itself, these tables have a Clustered Index. Data is stored in a specific order based on a Clustered Index key. Cluster table Heap Tables
  • 3. Clustered Tables vs Heap Tables HEAP • Data is not stored in any particular order • Specific data can not be retrieved quickly, unless there are also non- clustered indexes. • Data pages are not linked, so sequential access needs to refer back to the index allocation map (IAM) pages • Since there is no clustered index, additional time is not needed to maintain the index • Since there is no clustered index, there is not the need for additional space to store the clustered index tree • These tables have a index_id value of 0 in the sys.indexes catalog view
  • 4. Clustered Table • Data is stored in order based on the clustered index key • Data can be retrieved quickly based on the clustered index key, if the query uses the indexed columns • Data pages are linked for faster sequential access • Additional time is needed to maintain clustered index based on INSERTS, UPDATES and DELETES • Additional space is needed to store clustered index tree • These tables have a index_id value of 1 in the sys.indexes catalog view Clustered Tables vs Heap Tables
  • 5. Types of Indexes • Clustered index • Nonclustered index • Unique index • Filtered index
  • 6. • Covered Index • Columnstore index • Non-Key Index Columns • Implied indexes Created by some constraints i. Primary Key ii. Unique Types of Indexes
  • 7. • Full-text index A special type of token-based functional index that is built and maintained by the Microsoft Full-Text Engine for SQL Server. It provides efficient support for sophisticated word searches in character string data. • Spatial index A spatial index provides the ability to perform certain operations more efficiently on spatial objects (spatial data) in a column of the geometry data type. Types of Indexes
  • 9. Clustered Index • The top-most node of this tree is called the "root node" • The bottom level of the nodes is called "leaf nodes" • Any index level between the root node and leaf node is called an "intermediate level" • The leaf nodes contain the data pages of the table in the case of a cluster index. • The root and intermediate nodes contain index pages holding an index row. • Each index row contains a key value and pointer to intermediate level pages of the B-tree or leaf level of the index. • The pages in each level of the index are linked in a doubly-linked list.
  • 10. Clustered Index Database and leaf node Root Abby Bob Carol Dave Abby Ada Andy Ann Ada Alan Amanda Amy • A clustered index sorts and stores the data rows of the table or view in order based on the clustered index key. • The clustered index is implemented as a B- tree index structure that supports fast retrieval of the rows, based on their clustered index key values. The basic syntax to create a clustered index is CREATE CLUSTERED INDEX Index_Name ON Schema.TableName(Column); • A clustered index stores the data for the table based on the columns defined in the create index statement. As such, only one clustered index can be defined for the table because the data can only be stored and sorted one way per table.
  • 11. Nonclustered Index • Index Leaf Nodes and Corresponding Table Data • Each index entry consists of the indexed columns (the key, column 2) and refers to the corresponding table row (via ROWID or RID). • Unlike the index, the table data is stored in a heap structure and is not sorted at all. • There is neither a relationship between the rows stored in the same table block nor is there any connection between the blocks.
  • 12. Nonclustered Index Database Root Abby Bob Carol Dave Amy Ada Amanda Alan Leaf node Abby Ada Andy Ann Ada Alan Amanda Amy • A nonclustered index can be defined on a table or view with a clustered index or on a heap. • Each index row in the nonclustered index contains the nonclustered key value and a row locator The basic syntax for a nonclustered index is CREATE INDEX Index_Name ON Schema.TableName(Column); • SQL Server supports up to 999 nonclustered indexes per table.
  • 13. CLUSTERED VS. NONCLUSTERED INDEXES • Clustered index: a SQL Server index that sorts and stores data rows in a table, based on key values. • Nonclustered index: a SQL Server index which contains a key value and a pointer to the data in the heap or clustered index. • The difference between clustered and nonclustered SQL Server indexes is that • a clustered index controls the physical order of the data pages. • The data pages of a clustered index will always include all the columns in the table, even if you only create the index on one column. • The column(s) you specify as key columns affect how the pages are stored in the B-tree index structure • A nonclustered index does not affect the ordering and storing of the data
  • 14. Clustered and Nonclustered Indexes Interact • Clustered indexes are always unique – If you don’t specify unique when creating them, SQL Server may add a “uniqueifier” to the index key • Only used when there actually is a duplicate • Adds 4 bytes to the key • The clustering key is used in nonclustered indexes – This allows SQL Server to go directly to the record from the nonclustered index – If there is no clustered index, a record identifier will be used instead 1 Jones John 2 Smith Mary 3 Adams Mark 4 Douglas Susan Adams 3 Douglas 4 Jones 1 Smith 2 Leaf node of a clustered index on EmployeeID Leaf node of a nonclustered index on LastName
  • 15. Clustered and Nonclustered Indexes Interact (continued) • Another reason to keep the clustering key small! • Consider the following query: SELECT LastName, FirstName FROM Employee WHERE LastName = 'Douglas' • When SQL Server uses the nonclustered index, it – Traverses the nonclustered index until it finds the desired key – Picks up the associated clustering key – Traverses the clustered index to find the data
  • 16. Deciding what indexes go where? • Indexes speed access, but costly to maintain – Almost every update to table requires altering both data pages and every index. • All inserts and deletions affect all indexes • Many updates will affect non-clustered indexes • Sometimes less is more – Not creating an index sometimes may be best • Code for tranasaction have where clause? What columns used? Sort requried?
  • 17. • Selectivity – Indexes, particularly non-clustered indexes, are primarily beneficial in situations where there is a reasonably HIGH LEVEL of Selectivity within the index. • % of values in column that are unique • Higher percentage of unique values, the higher the selectivity – If 80% of parts are either ‘red’ or ‘green’ not very selective Deciding what indexes go where?
  • 18. Choosing Clustered Index • Only one per table! - Choose wisely • Default, primary key creates clustered index – Do you really want your prime key to be clustered index? – Option: create table foo myfooExample (column1 int identify primary key nonclustered column2 …. ) – Changing clustered index can be costly • How long? Do I have enough space?
  • 19. Clustered Indexes Pros & Cons • Pros – Clustered indexes best for queries where columns in question will frequently be the subject of • RANGE query (e.g., between) • Group by with max, min, count – Search can go straight to particular point in data and just keep reading sequentially from there. – Clustered indexes helpful with order by based on clustered key
  • 20. Clustered Indexes Pros & Cons • The Cons – two situations – Don’t use clustered index on column just because seems thing to do (e.g., primary key default) – Lots of inserts in non-sequential order • Constant page splits, include data page as well as index pages • Choose clustered key that is going to be sequential inserting • Don’t use a clustered index at all perhaps?
  • 21. These are limits, not goals. Every index you create will take up space in your database. The index will also need to be modified when inserts, updates, and deletes are performed. This will lead to CPU and disk overhead, so craft indexes carefully and test them thoroughly There are a few limits to indexes. • There can be only one clustered index per table. • SQL Server supports up to 999 nonclustered indexes per table. • An index – clustered or nonclustered – can be a maximum of 16 columns and 900 bytes. Limits to indexes
  • 22. PRIMARY KEY AS A CLUSTERED INDEX • Primary key: a constraint to enforce uniqueness in a table. The primary key columns cannot hold NULL values. • In SQL Server, when you create a primary key on a table, if a clustered index is not defined and a nonclustered index is not specified, a unique clustered index is created to enforce the constraint. • However, there is no guarantee that this is the best choice for a clustered index for that table. • Make sure you are carefully considering this in your indexing strategy.
  • 23. Unique Index • An index that ensures the uniqueness of each value in the indexed column. • If the index is a composite, the uniqueness is enforced across the columns as a whole, not on the individual columns. • For example, • if you were to create an index on the FirstName and LastName columns in a table, the names together must be unique, but the individual names can be duplicated. • A unique index is automatically created when you define a primary key or unique constraint: • Primary key: When you define a primary key constraint on one or more columns, SQL Server automatically creates a unique, clustered index if a clustered index does not already exist on the table or view. However, you can override the default behavior and define a unique, nonclustered index on the primary key. • Unique: When you define a unique constraint, SQL Server automatically creates a unique, nonclustered index. You can specify that a unique clustered index be created if a clustered index does not already exist on the table. • A unique index ensures that the index key contains no duplicate values. Both clustered and nonclustered indexes can be unique.
  • 24. Filtered index • An optimized nonclustered index, especially suited to cover queries that select from a well-defined subset of data. • SQL Server 2008 introduces Filtered Indexes which is an index with a WHERE clause • Filtered indexes can provide the following advantages over full-table indexes: • Improved query performance and plan quality • Reduced index maintenance costs • Reduced index storage costs A well-designed filtered index improves query performance and execution plan quality because it is smaller than a full-table nonclustered index and has filtered statistics An index is maintained only when data manipulation language (DML) statements affect the data in the index. A filtered index reduces index maintenance costs compared with a full-table nonclustered index because it is smaller and is only maintained when the data in the index is changed. Creating a filtered index can reduce disk storage for nonclustered indexes when a full-table index is not necessary.
  • 25. Filtered index Design Considerations • When a column only has a small number of relevant values for queries, you can create a filtered index on the subset of values. For example, when the values in a column are mostly NULL and the query selects only from the non-NULL values, you can create a filtered index for the non-NULL data rows. The resulting index will be smaller and cost less to maintain than a full-table nonclustered index defined on the same key columns. • When a table has heterogeneous data rows, you can create a filtered index for one or more categories of data. This can improve the performance of queries on these data rows by narrowing the focus of a query to a specific area of the table. Again, the resulting index will be smaller and cost less to maintain than a full-table nonclustered index. SELECT ComponentID, StartDate FROM Production.BillOfMaterials WITH ( INDEX ( FIBillOfMaterialsWithEndDate ) ) WHERE EndDate IN ('20000825', '20000908', '20000918'); To ensure that a filtered index is used in a SQL query CREATE NONCLUSTERED INDEX FIBillOfMaterialsWithEndDate ON Production.BillOfMaterials (ComponentID, StartDate) WHERE EndDate IS NOT NULL ;
  • 26. Covering Indexes • When a nonclustered index includes all the data requested in a query (both the items in the SELECT list and the WHERE clause), it is called a covering index • With a covering index, there is no need to access the actual data pages – Only the leaf nodes of the nonclustered index are accessed – For example, your query might retrieve the FirstName ,LastName and DOB columns from a table, based on a value in the ContactID column. You can create a covering index that includes all three columns. • Because the leaf node of a clustered index is the data itself, a clustered index covers all queries Leaf node of a nonclustered index on LastName, FirstName, Birthdate Adams Mark 1/14/1956 3 Douglas Susan 12/12/1947 4 Jones John 4/15/1967 1 Smith Mary 7/14/1970 2 The last column is EmployeeID. Remember that the clustering key is always included in a nonclustered index.
  • 27. Non-Key Index Columns • SQL Server 2005 and later allow you to include columns in a non-clustered index that are not part of the key – Allows the index to cover more queries – Included columns only appear in the leaf level of the index – Up to 1,023 additional columns – Can include data types that cannot be key columns • Except text, ntext, and image data types • Syntax CREATE [ UNIQUE ] NONCLUSTERED INDEX index_name ON <object> ( column [ ASC | DESC ] [ ,...n ] ) [ INCLUDE ( column_name [ ,...n ] ) ] • Example CREATE NONCLUSTERED INDEX NameRegion_IDX ON Employees(LastName) INCLUDE (Region)
  • 28. KEY VS. NONKEY COLUMNS • Key columns: the columns specified to create a clustered or nonclustered index. • Nonkey columns: columns added to the INCLUDE clause of a nonclustered index. • The basic syntax to create a nonclustered index with nonkey columns is: • CREATE INDEX Index_Name ON Schema.TableName(Column) INCLUDE (ColumnA, ColumnB); • A column cannot be both a key and a non-key. It is either a key column or a non- key, included column. • The difference lies in where the data about the column is stored in the B-tree. Clustered and nonclustered key columns are stored at every level of the index – the columns appear on the leaf and all intermediate levels. A nonkey column will only be stored at the leaf level, however. • There are benefits to using non-key columns. • Columns can be accessed with an index scan. • Data types not allowed in key columns are allowed in nonkey columns. All data types but text, ntext, and image are allowed. • Included columns do not count against the 900 byte index key limit enforced by SQL Server.
  • 29. The query we want to use is SELECT ProductID, Name, ProductNumber, Color FROM dbo.Products WHERE Color = 'Black'; The first index is nonclustered, with two key columns: CREATE INDEX IX_Products_Name_ProductNumber ON dbo.Products(Name, ProductNumber); The second is also nonclustered, with two key columns and three nonkey columns: CREATE INDEX IX_Products_Name_ProductNumber_ColorClassStyle ON dbo.Products(Name, ProductNumber) INCLUDE (Color, Class, Style); In this case, the first index would not be a covering index for that query. The second index would be a covering index for that specific query. COVERING INDEXES EXAMPLES
  • 30. Column Store Index Basic There are two types of storage available in the database; RowStore and ColumnStore. In RowStore, data rows are placed sequentially on a page while in ColumnStore values from a single column, but from multiple rows are stored adjacently. So a ColumnStore Index works using ColumnStore storage. We cannot perform DML ( Insert Update Delete ) operations on a table having a ColumnStore Index, because this puts the data in a Read Only mode. So one big advantage of using this feature is a Data Warehouse where most operations are read only.
  • 31. Creating Column Store Index Creating a ColumnStore Index is the same as creating a NonClustered Index except we need to add the ColumnStore keyword as shown below. The syntax of a ColumnStore Index is: CREATE NONCLUSTERED COLUMNSTORE INDEX ON Table_Name (Column1,Column2,... Column N) Example: -- Creating Non - CLustered ColumnStore Index on 3 Columns CREATE NONCLUSTERED COLUMNSTORE INDEX [ColumnStore__Test_Person]ON [dbo].[Test_Person]([FirstName] , [MiddleName],[LastName]) • The cost when using the ColumnStore index is 4 times less than the traditional non-clustered index.
  • 32. Fill Factor • When you create an index the fill factor option indicates how full the leaf level pages are when the index is created or rebuilt. • Valid values are 0 to 100. • A fill factor of 0 means that all of the leaf level pages are full. • If data is always inserted at the end of the table, then the fill factor could be between 90 to 100 percent since the data will never be inserted into the middle of a page. • If the data can be inserted anywhere in the table then a fill factor of 60 to 80 percent could be appropriate based on the INSERT, UPDATE and DELETE activity.
  • 34. How SQL Server Indexes Work
  • 35. B-Tree Index Data Structure • SQL Server indexes are based on B-trees – Special records called nodes that allow keyed access to data – Two kinds of nodes are special • Root • Leaf Intermediate node Leaf node Data pages Root node A O O T T W E IGCA T A C E G I K M N O Q A I • If there are enough records, intermediate levels may be added as well. • Clustered index leaf-level pages contain the data in the table. • Nonclustered index leaf-level pages contain the key value and a pointer to the data row in the clustered index or heap.
  • 36. SQL Server B-Tree Rules • Root and intermediate nodes point only to other nodes • Only leaf nodes point to data • The number of nodes between the root and any leaf is the same for all leaves • B+tree can have more than 1 keys in a node, in fact thousands of keys is seen typically stored in a node and hence, the branching factor of a B+tree is very large. • B-trees are always sorted • The tree will be maintained during insertion, deletion, and updating so that these rules are met – When records are inserted or updated, nodes may split – When records are deleted, nodes may be collapsed • B+trees have all the key values in their leaf nodes. All the leaf nodes of a B+tree are at the same height, which implies that every index lookup will take same number of B+tree lookups to find a value. • Within a B+tree all leaf nodes are linked together in a linked-listed, left to right, and since the values at the leaf nodes are sorted, so range lookups are very efficient.
  • 37. What Is a Node? • A page that contains key and pointer pairs Key Pointer Key Pointer Key Pointer Key Pointer Key Pointer Key Pointer Key Pointer Key Pointer
  • 38. Splitting a B-Tree Node Root (Level 0) Node (Level 1) Leaf (Level 2) Abby Bob Carol Dave Abby Ada Andy Ann Ada Alan Amanda Amy Bob Alan Amanda Carol Amy Dave Ada DB
  • 39. Let’s Add Alice • Step 1: Split the leaf node Bob Alan Amanda Carol Amy Dave Ada Alice Ada Alan Alice Amanda Amy
  • 40. Adding Alice • Step 2: Split the next level up DB Leaf Abby Ada Amanda Andy Ann Bob Alan Amanda Carol Amy Dave Ada Alice Ada Alan Alice Amanda Amy
  • 41. Adding Alice (continued)• Split the root DB LeafAda Alan Alice Bob Alan Amanda Carol Amy Dave Ada Alice Amanda Amy Andy Ann Carol DaveAbby Andy Bob Abby Ada Amanda
  • 42. Adding Alice (continued) • When the root splits, the tree grows another level Root (Level 0) Node (Level 1) Node (Level 2) Leaf (Level 3) DB Abby Carol Amanda Amy Bob Alan Amanda Carol Amy Dave Ada Alice Ada Alan Alice Abby Andy Bob Abby Ada Amanda Carol Dave Andy Ann
  • 43. Page splits cause fragmentation • Two types of fragmentation – Data pages in a clustered table – Index pages in all indexes • Fragmentation happens because these pages must be kept in order • Data page fragmentation happens when a new record must be added to a page that is full – Consider an Employee table with a clustered index on LastName, FirstName – A new employee, Peter Dent, is hired ExtentAdams, Carol Ally, Kent Baccus, Mary David, Sue Dulles, Kelly Edom, Mike Farly, Lee Frank, Joe Ollen, Carol Oppus, Larry...
  • 44. Data Page Fragmentation Extent ExtentDulles, Kelly Edom, Mike ... Adams, Carol Ally, Kent Baccus, Mary David, Sue Dent, Peter Farly, Lee Frank, Joe Ollen, Carol Oppus, Larry...
  • 45. Index Fragmentation • Index page fragmentation occurs when a new key-pointer pair must be added to an index page that is full – Consider an Employee table with a nonclustered index on Social Security Number • Employee 048-12-9875 is added 036-11-9987, pointer 036-33-9874, pointer 038-87-8373, pointer 046-11-9987, pointer 048-33-9874, pointer 052-87-8373, pointer 116-11-9987, pointer 116-33-9874, pointer ... 124-11-9987, pointer 124-33-9874, pointer 125-87-8373, pointer Extent
  • 46. Index Fragmentation (continued) Extent Extent 036-11-9987, pointer 036-33-9874, pointer 038-87-8373, pointer 046-11-9987, pointer 048-12-9875, pointer 116-11-9987, pointer 116-33-9874, pointer ... 124-11-9987, pointer 124-33-9874, pointer 125-87-8373, pointer 048-33-9874, pointer 052-87-8373, pointer ...
  • 48. How B+tree Indexes Impact Performance
  • 49. Why use B+tree? • B+tree is used for an obvious reason and that is speed. • As we know that there are space limitations when it comes to memory, and not all of the data can reside in memory, and hence majority of the data has to be saved on disk. • Disk as we know is a lot slower as compared to memory because it has moving parts. • So if there were no tree structure to do the lookup, then to find a value in a database, the DBMS would have to do a sequential scan of all the records. • Now imagine a data size of a billion rows, and you can clearly see that sequential scan is going to take very long. • But with B+tree, its possible to store a billion key values (with pointers to billion rows) at a height of 3, 4 or 5, so that every key lookup out of the billion keys is going to take 3, 4 or 5 disk accesses, which is a huge saving.
  • 50. This goes to show the effectiveness of a B+tree index, more than 16 million key values can be stored in a B+tree of height 1 and every key value can be accessed in exactly 2 lookups. How is B+tree structured? • B+trees are normally structured in such a way that the size of a node is chosen according to the page size. • Why? Because whenever data is accessed on disk, instead of reading a few bits, a whole page of data is read, because that is much cheaper. • Let us look at an example, Consider InnoDB whose page size is 16KB • and suppose we have an index on a integer column of size 4bytes • So a node can contain at most 16 * 1024 / 4 = 4096 keys, and a node can have at most 4097 children. • So for a B+tree of height 1, the root node has 4096 keys and the nodes at height 1 (the leaf nodes) have 4096 * 4097 = 16781312 key values.
  • 51. • So the size of the index values have a direct bearing on performance! How important is the size of the index values? As can be seen from the above example, the size of the index values plays a very important role for the following reasons: • The longer the index, the less number of values that can fit in a node, and hence the more the height of the B+tree. • The more the height of the tree, the more disk accesses are needed. • The more the disk accesses the less the performance.
  • 52. Index Design • For tables that are heavily updated, use as few columns as possible in the index, and don’t over-index the tables. • If a table contains a lot of data but data modifications are low, use as many indexes as necessary to improve query performance • For clustered indexes, try to keep the length of the indexed columns as short as possible. Ideally, try to implement your clustered indexes on unique columns that do not permit null values. • The uniqueness of values in a column affects index performance. In general, the more duplicate values you have in a column, the more poorly the index performs. Index design should take into account a number of considerations.
  • 53. Index Design • In addition, indexes are automatically updated when the data rows themselves are updated, which can lead to additional overhead and can affect performance. • Due to the storage and sorting impacts, be sure to carefully determine the best column for this index. • The number of columns in the clustered (or non clustered) index can have significant performance implications with heavy INSERT, UPDATE and DELETE activity in your database. • For composite indexes, take into consideration the order of the columns in the index definition. Columns that will be used in comparison expressions in the WHERE clause (such as WHERE FirstName = 'Charlie') should be listed first. • You can also index computed columns if they meet certain requirements. For example, the expression used to generate the values must be deterministic (which means it always returns the same result for a specified set of inputs).
  • 54. Identifying Fragmentation vs. page splits DBCC SHOWCONTIG Page 283
  • 55. Resolving Fragmentation Heap Tables: • For heap tables this is not as easy. The following are different options you can take to resolve the fragmentation: • Create a clustered index • Create a new table and insert data from the heap table into the new table based on some sort order • Export the data, truncate the table and import the data back into the table Clustered Tables: • Resolving the fragmentation for a clustered table can be done easily by rebuilding or reorganizing your clustered index. This was shown in this previous tip: SQL Server 2000 to 2005 Crosswalk - Index Rebuilds. DBCC DBREINDEX DBCC INDEXDEFRAG ( { database_name | database_id | 0 } , { table_name | table_id} , { index_name | index_id } )
  • 57. Mahabubur Rahaman Senior Database Architect Orion Informatics Ltd