Guide to SQL Server Table Indexes – Part 4

In the first part of this article series, you learned about the basics of SQL Server database table’s indexes, the difference between the clustered and non-clustered indexes, and how the leaf nodes, non-leaf nodes, and heaps associated with data storage. In the second and third part, you learned about the different types of indexes available in SQL Server, and what are these indexes advantages and dis-advantages.

In this article, you will learn about index design and columns, candidates for index key columns, index placement, and pad index and the fill-factor option.


Index design

Well-designed indexes are an important part of database optimisation because indexes help to reduce query response time while improving the overall performance of query without affecting the design of underlying table/tables. However, index design is a very complex, time consuming and error prone process, even for databases and workloads with moderate complexity, and during the Index design process, you make decisions on index key columns, non-key columns, and index storage locations. You should carefully consider included columns as key columns, because they depend on how the data is accessed by users of databases and database applications.

General Guidelines for Index Design

In addition to the primary key and unique constraint indexes created automatically during table creation, the good index candidates include:

  • The higher the level of selectivity in a column, the more likely it is as a key column candidate.
  • Foreign keys are good candidates for non-clustered indexes because they are mostly used in join operations.
  • Columns used in joining tables are good candidates for non-clustered indexes because it will improve query performance significantly in join operations by reducing the time required in locating the required rows in each of the join tables.
  • Columns searched for specific values or ranges of values are good candidates for non-clustered index or filtered index.  Filtered indexes are especially useful to cover those queries that frequently needs access to well-defined subset of data.
  • Columns used in sort order are good candidates for non-clustered or clustered index.
  • Columns queried for data returned organized as groups.
  • Columns used in LIKE comparisons.
  • You can include computed columns as key columns, but with the following restrictions:
    • The value must be persisted
    • The column result must be deterministic (always return the same result).
    • The column cannot generate a text, ntext, or image data type value.

SET options are an additional concern. SET options are connection-specific. When working with computed columns, you must set the following options in the connection creating the index, and any connection modifying the column data. If the options are not set correctly, errors can occur when you create the column or modify key column values.

SET Option

Setting

ANSI_NULLS

ON

ANSI_PADDING

ON

ANSI_WARNINGS

ON

ARITHABORT

ON

CONCAT_NULL_YIELDS_NULL

ON

QUOTED_IDENTIFIER

ON

NUMERIC_ROUNDABORT

OFF

Index placement

The physical location of the data pages that make up the index is referred to as the index placement. Unless you specify otherwise, the database engine writes any index you create to the filegroup containing its base table (see below):

Description: C:\Users\PMNSER~1\AppData\Local\Temp\SNAGHTMLa1f4ad.PNG

You can specify to create a clustered or non-clustered index on a different filegroup. When you create a clustered index, the clustered index and the table are saved in the same physical location. Rebuilding clustered index on an alternate filegroup also saves table on the same alternate filegroup.

When you create a non-clustered index, the storage hierarchy of a non-clustered index is not physically related to the table on which it is based. You can store a table and any or all of its non-clustered indexes in different filegroups. If the table has multiple non-clustered indexes, you can create them in different filegroups.

The leaf level of the non-clustered index contains pointers to the data pages of the table, either in a heap or a clustered index. By placing the index on a different filegroup, you have more detailed control over disk usage and object placement, and potentially better performance for queries based on the index due to lower resource conflict between the table and the index.

Partitioned table and its partitioned indexes are the exception to this rule. You can spread the partitions for a partitioned table across multiple filegroups, which spreads the partitioned indexes for those partitions across the same filegroups.

Performance increases when you place the table and its indexes on different physical hard drives. Performance also increases more when searching the table and indexes on different hard drives connected to different disk controllers, minimizing conflicts between read and write requests to disk. This is a scalable design, which makes it possible to support the data requirements of large enterprise data applications.

For detail about index placement and how to place them on different filegroup, see extensive set of documentation at the Microsoft " Placing Indexes on Filegroups” web page.

Padding indexes

When you create or rebuild indexes, you can use padding indexes to affect the space left in the pages making up the index. A padding index leaves space for users to enter data without having to modify the index’s physical structure. You determine if you need to support padding when you create or recreate the index. The data in the index nodes doesn’t change for a read-only table or stable look-up table, so there is no reason to pad indexes created on those tables. The situation is different in an OLTP database or with an active, dynamic table.

When the database engine needs to add a row to a full page, the database engine has to split that page to make additional room and keep the index hierarchy intact. Splitting pages is resource intensive. Depending on index size and other activity in the database, the process can result in a noticeable performance loss. To prevent splits, or at least reduce the need for them, you pad the index. When you pad an index, you specify a fill factor value, which specifies how much space to leave available. The fill factor can be set to 0, or to a percentage between 1 and 100. The effect of fill factor values is described in the following table.

Value

Leaf pages are filled

0

At least one slot left open on nonleaf pages (default setting)

1-99

To the specified percentage and nonleaf node pages leave at least one slot open, unless the index is padded, and then the percentage applies to both leaf and nonleaf pages

100

At least one slot left open on nonleaf pages

When you pad the index, you leave space open throughout the index hierarchy.  If you do not pad the index, SQL Server leaves room only in the leaf node pages. In an active table where data change frequently, you must fill the index and specify a low fill factor. If you rarely modify the table, you should specify a higher fill factor or 0. Operationally, a fill factor of 0 works the same as a fill factor of 100. The padding is applied only when creating or rebuilding the index. The padding is not maintained during normal operations.

Continue to Part-5:  

In part-5, you will learn about index operations, index related dynamic management views and functions, and methods to defragment and indexes.

]]>

Leave a comment

Your email address will not be published.