SQL Server Index Maintenance Performance Tuning for Large Tables

Couple of remarks about this very 1st version:

  • ALTER INDEX…REORGANIZE is single-threaded. That means, no matter how many processors you have in your box, this operation will only ever use just one of them. This can turn the maintenance of indexes on larger tables into a patience play.
  • ALTER INDEX…REBUILD can use parallelism, can therefore run much faster, but uses more resources.

For further explanations, see: Configuring Parallel Index Operations

Okay, this doesn’t really require much research and to a certain degree it explains why our runtimes are in the hours, when the biggest indexes are “only” reorganised.

However, it was quite striking to find out that quite a significant portion of this runtime could be directly attributed to sys.dm_db_index_physical_stats itself. Although we were aware of the fact that even in the LIMITED mode that we used the pages above the leaf level had to be scanned, we simply didn’t expect that this was going to take that long. In our case it could take ~30 minutes, before a result is returned from the function. This is half of the time we have in our daily maintenance window. Most of these 30 minutes was needed to determine the fragmentation of the one big table. For the other tables the function only took 3 – 4 minutes, which is completely acceptable.

Once we found this out, the next step was to try to find a way to make sys.dm_db_index_physical_stats run faster. One of the obvious thoughts that came to mind was to filter out this big table from sys.dm_db_index_physical_stats.

In order to avoid hardcoding stuff into the maintenance procedure, we decided to go for a separate table that stores the object_id, the index_id along with a datetime of the last maintenance. This table was then used in a WHERE NOT EXISTS clause to make sure, that the big table was not considered in sys.dm_db_index_physical_stats. However, that wasn’t really crowned with success. The runtime didn’t go down as expected. While searching for an explanation for this behaviour we came across a blog entry by Paul Randal: Inside sys.dm_db_index_physical_stats. As you can read there, this DMF does not support “predicate push-down”. This means in our case, that irrespective of our smart WHERE NOT EXISTS clause, the fragmentation is determined first and then the WHERE clause is applied. An even more clear description of this behaviour is found here:

Most dynamic management views support “predicate push-down,” where the only data processed is that which matches the predicate in the WHERE clause. However, sys.dm_db_index_physical_stats is a function, not a view, so it can’t do this. This means you have to manually filter and only ask the function to process those indexes you know have the potential to be fragmented and may require rebuilding or reorganizing.

This invalidated our attempt immediately. Next we came up with this brilliant idea:

    OBJECT_SCHEMA_NAME(FRAG.[object_id]) + '.' + OBJECT_NAME(FRAG.[object_id]),
    sys.indexes SIX
        DB_ID(),        --use the currently connected database
        SIX.object_id,  --Parameter for object_id.
        SIX.index_id,   --Parameter for index_id.
        0,              --Parameter for partition_number.
        DEFAULT         --Scanning mode. Default to "LIMITED", which is good enough
    ) FRAG
    SIX.object_id <> OBJECT_ID('...big Table...')
    FRAG.avg_fragmentation_in_percent DESC;

The motivation here was to filter via sys.indexes in the WHERE clause and therefore effectively rule out the big table, and at the same time feed the DMF with the object_id and index_id from sys.indexes. Unfortunately this idea was only brilliant until we tried out the query the first time:

Msg 413, Level 16, State 1, Line 1
Correlated parameters or sub-queries are not supported by the inline function “sys.dm_db_index_physical_stats”.
Msg 413, Level 16, State 1, Line 1
Correlated parameters or sub-queries are not supported by the inline function “sys.dm_db_index_physical_stats”.

Well. We’ve run out of ideas and now decided to use some kind of hybrid approach, in which we use a separate table that stores the object_id, index_id, page_count, and datetime of the last maintenance for all indexes of the database.

For tables with less than 1.000.000 pages we loop through this new table and call the sys.dm_db_index_physical_stats with the object_id and index_id taken from the table as parameter and then we either rebuild that index or not, depending on the level of fragmentation. If there is still enough time in the maintenance window, we retrieve another row from the table to start another operation or exit the procedure.

For tables with more than 1.000.000 pages we rebuild one single index on each maintenance day. For these indexes we do not determine the level of fragmentation upfront, but rather rebuild it right away. In the current setup an index is treated every three weeks and we have observed fragmentation carefully for these indexes and can rightly assume that enough fragmentation has been accumulated over these three weeks to justify a rebuild.

We have also decided to go for a rebuild, instead of a reorganise. This might be “too much” at times, but the option to get parallelism and therefore reduce runtime was convincing for us. However, to avoid introducing new bottlenecks by using up all available processors during index operations, we determine the number of processors upfront, and only use half of them in the ALTER INDEX statement. This still leaves enough resources available to other processes that might kick in during the index maintenance. So far, this strategy has worked out quite well for us, but you never know. Things might be different with the next billion rows.

Here are some more links on this topic.



Pages: 1 2


No comments yet... Be the first to leave a reply!