Tips on Optimizing SQL Server Clustered Indexes

As a rule of thumb, every table should have a clustered index. Generally, but not always, the clustered index should be on a column that monotonically increases–such as an identity column, or some other column where the value is increasing–and is unique. In many cases, the primary key is the ideal column for a clustered index. 

If you have any experience with performance tuning SQL Server 6.5, you may have heard that is not a good idea to add a clustered index to a column that monotonically increases because it can cause a “hotspot” on the disk that can cause performance problems. That advice is true in SQL Server 6.5.

Normally, “hotspots” aren’t generally a problem. You would have to have over 1,000 transactions a second before a “hotspot” were to negatively affect performance. In fact, a “hotspot” can be beneficial under these circumstances because it eliminates page splits.

Here’s why. If you are inserting new rows into a table that has a clustered index as its primary key, and the key monotonically increases, this means that each INSERT will physically occur one after another on the disk. Because of this, page splits won’t occur during INSERTs, which in itself saves overhead. This is because SQL Server has the ability to determine if data being inserted into a table has a monotonically increasing sequence, and won’t perform page splits when this happens.

If you are inserting a lot of rows into a heap (a table without a clustered index), data is not inserted in any particular order onto data pages, whether the data is monotonically or not monotonically increasing. This results in SQL Server having to work harder (more reads) to access the data when requested from disk. On the other hand, if a clustered index is added to a table, data is inserted sequentially on data pages, and generally less disk I/O is required to retrieve the data when requested from disk.

If data is inserted into a clustered index in more or less random order, data is often inserted randomly into physical data pages, which is similar to the problem of inserting data into a heap.

So again, often, the best overall recommendation is to add a clustered index to a column that monotonically increases (assuming there is a column that does so). This is especially true if the table is subject to many INSERTS, UPDATES, and DELETES. But if a table is subject to few data modification, but to many SELECT statements, then this advice is less useful, and other options for taking full advantage of the clustered index should be considered. Read the other tips on this page to learn of situations where you should place the clustered index on other columns.  [7.0, 2000, 2005] Updated 12-6-2005

*****

Here are some more good reasons to add a clustered index to every table.

Keep in mind that a clustered index physically orders the data in a table based on a single or composite column. Second, the data in a heap (a table without a clustered index) is not stored in any particular physical order.

Whenever you need to query the column or columns used for the clustered index, SQL Server has the ability to sequentially read the data in a clustered index an extent (8 data pages, or 64K) at a time. This makes it very easy for the disk subsystem to read the data quickly from disk, especially if there is a lot of data to be retrieved.

But if a heap is used, even if you add a non-clustered index on an appropriate column or columns, because the data is not physically ordered (unless you are using a covering index), SQL Server has to read the data from disk randomly using 8K pages. This creates a lot of extra work for the disk subsystem to retrieve the same data, hurting performance.

Another disadvantage of a heap is that when you rebuild indexes to reduce fragmentation, heaps are not defragmented, because they are not indexes. This means that over time, a heap will become more and more fragmented, further hurting performance. Adding a clustered index will insure that the table can be defragmented when indexes are rebuilt.

These are just several of many reasons why a clustered index should be added to virtually all tables. [6.5, 7.0, 2000, 2005] Updated 12-6-2005

*****

Since you can only create one clustered index per table, take extra time to carefully consider how it will be used. Consider the type of queries that will be used against the table, and make an educated guess as to which query (the most common one run against the table, perhaps) is the most critical, and if this query will benefit from having a clustered index. [6.5, 7.0, 2000, 2005] Updated 12-6-2005

Continues…

Leave a comment

Your email address will not be published.