How to Create Nonkey Columns in Nonclustered Indexes in SQL Server 2005

Let’s say for example that we have a table that contains these columns. (This example has been simplified to make it easier to follow.)

Primary_Key

Customer_No

Customer_Name

Customer_Address

Let’s assume that the Primary_Key column is a monotonically increasing integer and that the last three columns are VARCHAR columns. Let’s also assume that there is a clustered index on the Primary_Key column, and that there are no nonclustered indexes on the table.

Now, let’s assume we run the following query:

SELECT Customer_No, Customer_Name, Customer_Address
FROM Table_Name
WHERE Primary_Key = 1001

When the query optimizer analyzes this query, it will know that the Primary_Key column is the key to a clustered index. So the clustered index will (most likely) be used to locate the Primary_Key of 1001 very quickly using a clustered index lookup. In addition, since the other three columns are part of the clustered index, the values for these three columns are immediately available and can be immediately displayed after the query finishes executing. Keep in mind that all the columns in a clustered index are stored at the leaf level of the clustered index, so SQL Server, in this example, does not have to look elsewhere to find the data to be returned.

Now, let’s take a look at the following query:

SELECT Customer_No, Customer_Name, Customer_Address
FROM Table_Name
WHERE Customer_No = ‘ABC123’

In this case, when this query is analyzed by the query optimizer, there is no index on the column Customer_No. Because of this, SQL Server will perform a clustered index scan to look for the designated record to return. Since this column is not indexed or unique, every row in the table will have to be scanned until the record is found. Once it is found, the rest of the record (because it is a clustered index and each row’s data is part of the leaf index), is immediately available and the data is returned.

If we need to perform the above query often, it would be to our advantage to add a nonclustered index to the Customer_No column. This way, when we run a query like the one above, instead of performing a time-consuming clustered index scan, it can use the nonclustered index on Customer_No to perform an index lookup, which is much faster than a clustered table scan. But is this all we need to do to optimize the above query? Actually, we have forgotten something. If we run the query, let’s look at it again below, something else has to happen before our data is returned.

SELECT Customer_No, Customer_Name, Customer_Address
FROM Table_Name
WHERE Customer_No = ‘ABC123’

When this query runs, it will use the nonclustered index on the Customer_No column to quickly identify the record using an index lookup. But unlike a clustered index, a nonclustered index only contains the data stored in the index, which is, in this case, only the Customer_No column. On the other hand, our query wants to return three columns: Customer_No, Customer_Name, and Customer_Address, not just key Customer_No. Because of this, SQL Server will now have to perform another step before it can return our data. Once it has located the correct row in the nonclustered index, then SQL Server must then look up the values of the other two columns from the clustered index, where this data is stored. This is called a bookmark lookup.

As you can imagine, performing a bookmark lookup takes additional SQL Server resources and time to perform, hurting performance. Is there anyway to avoid a bookmark lookup? If you have been a DBA for very long, you are probably familiar with the concept of a covering index. If you are not, a covering index is a nonclustered index that includes not only the value used to perform the lookup (the Customer_No) in our example, but also the other data columns needed to “cover” the query. In this case, the Customer_Name and Customer_Address columns.

So instead of creating a nonclustered index with Customer_No as the only column in the index (which is what we did before), we create a nonclustered index that has three columns in it, including Customer_No, Customer_Name, and Customer_Address. This is called a covering index because when the query is run and uses Customer_No to locate the row, the remaining data needed to fulfill the query exists in the index and can be immediately returned and no bookmark lookup has to be performed, as we saw in the previous example.

Covering indexes, if used judiciously, can be used to speed up many types of commonly run queries. But covering indexes have some limitations. For example, they are limited to a maximum of 16 columns; they have a 900-character maximum width; certain datatypes cannot be included in them; and adding additional columns to an index makes the index wider, which in turn requires more disk I/O to read and write the rows, potentially hurting performance, especially if the table is subject to many INSERTs, UPDATEs, and DELETEs.

In the past, DBAs have learned how to live with these drawbacks. Like most all performance tuning suggestions, covering indexes have their pros and cons, and the DBA must use his or her experience to determine how they should best be implemented.

The behavior I have just described is true for all versions of SQL Server, but in SQL Server 2005, we now have a new feature called “nonkey column nonclustered indexes.” This new feature is a variation of the standard covering index we have been talking about up to this point, but with some advantages over covering indexes.

Let’s go back to our example query.

SELECT Customer_No, Customer_Name, Customer_Address
FROM Table_Name
WHERE Customer_No = ‘ABC123’

Let’s assume, as before, that there is a clustered index on Primary_Key. Now, instead of creating a covering index as described above, what we can do is create a nonclustered index on Customer_No, but instead of adding Customer_Name and Customer_Address as additional key columns to the index, we instead add Customer_Name and Customer_Address as nonkey columns to the nonclustered index.

What does this mean? This is very much like our covering index, except that only the column Customer_No is a part of the actual key, and Customer_Name and Customer_Address tag along with the key, but they are not key columns, but nonkey columns. By doing this, we gain the exact same advantage of a covering index in that we don’t have to perform a bookmark lookup to retrieve the Customer_Name and Customer_Address columns, as these columns are part of the index.

Now you maybe asking yourself, what’s the difference between covering indexes and nonkey column indexes, as they both produce the same results? Good question, and now I’m going to tell you why you might want to consider a nonkey column nonclustered index over using a covering index. The benefits include:

All data types are supported, except text, ntext, and image. So you have more datatype options than a covering index.
The maximum number of columns is 1,023, not 16 as with covering indexes.
Because the actual index is narrower, the key is more efficient and can offer better performance over a covering index where all of the columns are part of the key.

Yes, these are small differences, but lots of small differences add up and can help boost the performance of your SQL Server. In fact, you may want to review all of your currently existing covering indexes and change them to nonkey column nonclustered indexes to see if your query performance increases. Depending on the circumstances, performance could indeed increase.

Nonkey columns are created using the INCLUDE clause of the CREATE INDEX statement. For example:

CREATE INDEX IX_Table ON Table (Key_Index_Column) INCLUDE (Column1, Column2, Column3)

In the above case, a nonclustered index with one key column is created with three nonkey columns.

So, what are the downsides to using nonkey column nonclustered indexes? Actually, they are the same for covering indexes, and include:

More space is required to store indexes with nonkey columns. Nonkey column data is stored at both the leaf level of the index and in the table itself.
Larger indexes mean fewer rows can fit on a page, potentially increasing disk I/O.
Index maintenance is increased for data modifications, potentially hurting performance if nonkey columns are large and the database experiences a high level of data modifications.

While the above are also true for covering indexes, they can be slightly more problematic with nonkey column nonclustered indexes because they have the potential of using up more space than covering indexes. So you have to be careful about weighing the pros and cons of the nonkey column nonclustered index, just as you always have had to do with covering indexes.

To learn more about how nonkey column nonclustered indexes work, first identify a current covering index in your database, along with a query that runs against it frequently, and of course takes full advantage of the covering index. Use Profiler to capture performance statistics for the query for a day. Then modify the currently existing covering index to make it a nonkey column nonclustered index. Now, use Profiler again to capture the performance statistics for the same query for a day. Then compare the two and see which indexing option works best for you.

To learn more about the pros and cons of nonkey column nonclustered indexes, see Books Online.

]]>

Leave a comment Cancel reply