SQL Server Index Maintenance Performance Tuning for Large Tables

Couple of
remarks about this very 1st version:

    INDEX…REORGANIZE is single-threaded. That means, no matter how many
    processors you have in your box, this operation will only ever use just one of
    them. This can turn the maintenance of indexes on larger tables into a patience
  • ALTER INDEX…REBUILD can use parallelism, can therefore run much faster, but
    uses more resources.

For further
explanations, see: Configuring
Parallel Index Operations

Okay, this
doesn’t really require much research and to a certain degree it explains why
our runtimes are in the hours, when the biggest indexes are “only”

However, it was
quite striking to find out that quite a significant portion of this runtime
could be directly attributed to sys.dm_db_index_physical_stats itself. Although
we were aware of the fact that even in the LIMITED mode that we used the pages
above the leaf level had to be scanned, we simply didn’t expect that this was
going to take that long. In our case it could take ~30 minutes, before a result
is returned from the function. This is half of the time we have in our daily
maintenance window. Most of these 30 minutes was needed to determine the
fragmentation of the one big table. For the other tables the function only took
3 – 4 minutes, which is completely acceptable.

Once we found
this out, the next step was to try to find a way to make
sys.dm_db_index_physical_stats run faster. One of the obvious thoughts that
came to mind was to filter out this big table from

In order to
avoid hardcoding stuff into the maintenance procedure, we decided to go for a
separate table that stores the object_id, the index_id along with a datetime of
the last maintenance. This table was then used in a WHERE NOT EXISTS clause to
make sure, that the big table was not considered in
sys.dm_db_index_physical_stats. However, that wasn’t really crowned with
success. The runtime didn’t go down as expected. While searching for an explanation
for this behaviour we came across a blog entry by Paul Randal: Inside
. As you can
read there, this DMF does not support “predicate push-down”. This
means in our case, that irrespective of our smart WHERE NOT EXISTS clause, the
fragmentation is determined first and then the WHERE clause is applied. An even
more clear description of this behaviour is found here:

Most dynamic
management views support “predicate push-down,” where the only data processed
is that which matches the predicate in the WHERE clause. However,
sys.dm_db_index_physical_stats is a function, not a view, so it can’t do this.
This means you have to manually filter and only ask the function to process
those indexes you know have the potential to be fragmented and may require
rebuilding or reorganizing.

This invalidated
our attempt immediately. Next we came up with this brilliant idea:

    OBJECT_SCHEMA_NAME(FRAG.[object_id]) + '.' + OBJECT_NAME(FRAG.[object_id]),
    sys.indexes SIX
        DB_ID(),        --use the currently connected database
        SIX.object_id,  --Parameter for object_id.
        SIX.index_id,   --Parameter for index_id.
        0,              --Parameter for partition_number.
        DEFAULT         --Scanning mode. Default to "LIMITED", which is good enough
    ) FRAG
    SIX.object_id <> OBJECT_ID('...big Table...')
    FRAG.avg_fragmentation_in_percent DESC;

The motivation
here was to filter via sys.indexes in the WHERE clause and therefore
effectively rule out the big table, and at the same time feed the DMF with the
object_id and index_id from sys.indexes. Unfortunately this idea was only
brilliant until we tried out the query the first time:

Msg 413, Level 16, State 1, Line 1
Correlated parameters or sub-queries are not supported by the inline function
Msg 413, Level 16, State 1, Line 1
Correlated parameters or sub-queries are not supported by the inline function

Well. We’ve run
out of ideas and now decided to use some kind of hybrid approach, in which we
use a separate table that stores the object_id, index_id, page_count, and
datetime of the last maintenance for all indexes of the database.

For tables with
less than 1.000.000 pages we loop through this new table and call the
sys.dm_db_index_physical_stats with the object_id and index_id taken from the
table as parameter and then we either rebuild that index or not, depending on
the level of fragmentation. If there is still enough time in the maintenance
window, we retrieve another row from the table to start another operation or exit
the procedure.

For tables with
more than 1.000.000 pages we rebuild one single index on each maintenance day.
For these indexes we do not determine the level of fragmentation upfront, but
rather rebuild it right away. In the current setup an index is treated every
three weeks and we have observed fragmentation carefully for these indexes and
can rightly assume that enough fragmentation has been accumulated over these
three weeks to justify a rebuild.

We have also
decided to go for a rebuild, instead of a reorganise. This might be “too
much” at times, but the option to get parallelism and therefore reduce
runtime was convincing for us. However, to avoid introducing new bottlenecks by
using up all available processors during index operations, we determine the
number of processors upfront, and only use half of them in the ALTER INDEX
statement. This still leaves enough resources available to other processes that
might kick in during the index maintenance. So far, this strategy has worked
out quite well for us, but you never know. Things might be different with the
next billion rows.

Here are some
more links on this topic.




Leave a Reply

Your email address will not be published. Required fields are marked *