Tips on Optimizing SQL Server Non-Clustered Indexes
Non-clustered indexes are best for queries:
- That return few rows (including just one row) and where the index has good selectivity (generally above 95%).
- That retrieve small ranges of data (not large ranges). Clustered indexes perform better for large range queries.
- Where both the WHERE clause and the ORDER BY clause are both specified for the same column in a query. This way, the non-clustered index pulls double duty. It helps to speed up accessing the records, and it also speeds up the sorting of the records (because the returned data is already sorted).
- That use JOINs (although clustered indexes are better if they can be used).
- When the column or columns to be indexed are very wide. While wide indexes are never a good thing, if you have no choice, a non-clustered index will have overall less overhead than a clustered index on a wide index.
[6.5, 7.0, 2000, 2005] Updated 1-6-2006
If a column in a table is not at least 95% unique, then the query optimizer may not use a non-clustered index based on that column. Because of this, don’t add non-clustered indexes to columns that aren’t at least 95% unique. For example, a column with “yes” or “no” as the data won’t be at least 95% unique. [6.5, 7.0, 2000] Updated 1-6-2006
If your table needs a clustered index, be sure it is added to the table before you add any non-clustered indexes. If you don’t, when you add a clustered index to your table, all of the pre-existing non-clustered indexes will have to be rebuilt (which is done automatically when the clustered index is built), putting an unnecessary strain on your server. [6.5, 7.0, 2000, 2005] Updated 1-6-2006
To determine the selectivity on an index on a given table, run this command: DBCC SHOW_STATISTICS (table_name, index_name). The higher the selectivity of an index, the greater the likelihood it will be used by the query optimizer. [6.5, 7.0, 2000, 2005] Updated 1-6-2006
When deciding whether or not to add a non-clustered index to a column of a table, it is useful to first find out how selective it is. By this, what we want to know is the ratio of unique rows to total rows (based on a specific column) found in the table. Generally, if a column is not more than 95% unique, then the Query Optimizer might not use the index. If this is the case, then adding the non-clustered index may be a waste of disk space. If fact, adding a non-clustered index that is never used will hurt a table’s performance.
Another useful reason to determine the selectivity of a column is to decide what is the best order to position indexes in a composite index. This is because you will get the best performance out of a composite index if the columns are arranged so that the most selective is the first one, the next most selective, the second one, and so on.
So how do you determine the selectivity of a column? One way is to run the following script on any column you are considering for a non-clustered index. This example script is designed to be used with the Northwind database, so you will need to modify it appropriately for your use.
–Finds the Degree of Selectivity for a Specific Column in a Row
Declare @total_unique float
Declare @total_rows float
Declare @selectivity_ratio float
SELECT @total_unique = 0
SELECT @total_rows = 0
SELECT @selectivity_ratio = 0
–Finds the Total Number of Unique Rows in a Table
–Be sure to replace OrderID below with the name of your column
–Be sure to replace [Order Details] below with your table name
SELECT @total_unique = (SELECT COUNT(DISTINCT OrderID) FROM [Order Details])
–Calculates Total Number of Rows in Table
–Be sure to replace [Order Details] below with your table name
SELECT @total_rows = (SELECT COUNT(*) FROM [Order Details])
–Calculates Selectivity Ratio for a Specific Column
SELECT @selectivity_ratio = ROUND((SELECT @total_unique/@total_rows),2,2)
SELECT @selectivity_ratio as ‘Selectivity Ratio’
The results in this case is 38%, which means that adding a non-clustered index to the OrderID column of the Order Details table in the Northwind database is probably not a very good idea. [6.5, 7.0, 2000, 2005] Updated 1-6-2006
In some cases, even though a column (or columns of a composite index) has a non-clustered index, the Query Optimizer may not use it (even though it should), instead performing a table scan (if the table is a heap) or a clustered index scan (if there is a clustered index). This, of course, can produce unwanted performance problems.
This particular problem can occur when there is a data correlation between the order of the rows in the table, and the order of the non-clustered index entries. This can occur when there is correlation between the clustered index and the non-clustered index. For example, the clustered index may be created on a date column, and the non-clustered index might be created on an invoice number column. If this is the case, then there is a correlation (or direct relationship) between the increasing dates and the increasing invoice numbers found in each row.
The reason this problem occurs is because the Query Optimizer assumes there is no correlation, and it makes its optimization decisions based on this assumption.
If you run into this problem, there are three potential resolutions to this problem:
· If possible, reorder the non-clustered index column (assuming a composite index) so that the column with the highest cardinality is the first column in the composite index.
· Create covering indexes.
· Add index hints to your queries to overrule the Query Optimizer.
[7.0, 2000] Updated 11-1-2005
When you think of page splits, you normally only think of clustered indexes. This is because clustered indexes enforce the physical order of the index, and page splitting can be a problem if the clustered index is based on a non-incrementing column. But what has this to do with non-clustered indexes? While non-clustered indexes use a clustered index (assuming the table is not a heap) as their key, most people don’t realize that non-clustered indexes can suffer from page splitting, and because of this, need to have an appropriate fillfactor and pad_index set for them.
Here’s an example of how non-clustered indexes can experience page splits. Let’s say you have a table that has a clustered index on it, such as customer number. Let’s also say that you have a non-clustered index on the zip code column. As you can quite well imagine, the data in the zip code column will have no relation to the customer number and will be more or less random, and data will have to be inserted into the zip code index randomly. Like clustered index pages, non-clustered index pages can experience page splitting.
So just as with clustered indexes, non-clustered indexes need to have an appropriate fillfactor and pad_index, and also be rebuilt on a periodic basis. [7.0, 2000, 2005] Updated 11-1-2005