SQL Server Index Maintenance Performance Tuning for Large Tables

Couple of remarks about this very 1st version:

  • ALTER INDEX…REORGANIZE is single-threaded. That means, no matter how many processors you have in your box, this operation will only ever use just one of them. This can turn the maintenance of indexes on larger tables into a patience play.
  • ALTER INDEX…REBUILD can use parallelism, can therefore run much faster, but uses more resources.

For further explanations, see: Configuring Parallel Index Operations

Okay, this doesn’t really require much research and to a certain degree it explains why our runtimes are in the hours, when the biggest indexes are “only” reorganised.

However, it was quite striking to find out that quite a significant portion of this runtime could be directly attributed to sys.dm_db_index_physical_stats itself. Although we were aware of the fact that even in the LIMITED mode that we used the pages above the leaf level had to be scanned, we simply didn’t expect that this was going to take that long. In our case it could take ~30 minutes, before a result is returned from the function. This is half of the time we have in our daily maintenance window. Most of these 30 minutes was needed to determine the fragmentation of the one big table. For the other tables the function only took 3 – 4 minutes, which is completely acceptable.

Once we found this out, the next step was to try to find a way to make sys.dm_db_index_physical_stats run faster. One of the obvious thoughts that came to mind was to filter out this big table from sys.dm_db_index_physical_stats.

In order to avoid hardcoding stuff into the maintenance procedure, we decided to go for a separate table that stores the object_id, the index_id along with a datetime of the last maintenance. This table was then used in a WHERE NOT EXISTS clause to make sure, that the big table was not considered in sys.dm_db_index_physical_stats. However, that wasn’t really crowned with success. The runtime didn’t go down as expected. While searching for an explanation for this behaviour we came across a blog entry by Paul Randal: Inside sys.dm_db_index_physical_stats. As you can read there, this DMF does not support “predicate push-down”. This means in our case, that irrespective of our smart WHERE NOT EXISTS clause, the fragmentation is determined first and then the WHERE clause is applied. An even more clear description of this behaviour is found here:

Most dynamic management views support “predicate push-down,” where the only data processed is that which matches the predicate in the WHERE clause. However, sys.dm_db_index_physical_stats is a function, not a view, so it can’t do this. This means you have to manually filter and only ask the function to process those indexes you know have the potential to be fragmented and may require rebuilding or reorganizing.

This invalidated our attempt immediately. Next we came up with this brilliant idea:

SELECT
    OBJECT_SCHEMA_NAME(FRAG.[object_id]) + '.' + OBJECT_NAME(FRAG.[object_id]),
    SIX.[name],
    FRAG.avg_fragmentation_in_percent,
    FRAG.page_count
FROM
    sys.indexes SIX
    CROSS APPLY
    sys.dm_db_index_physical_stats
    (
        DB_ID(),        --use the currently connected database
        SIX.object_id,  --Parameter for object_id.
        SIX.index_id,   --Parameter for index_id.
        0,              --Parameter for partition_number.
        DEFAULT         --Scanning mode. Default to "LIMITED", which is good enough
    ) FRAG
WHERE
    SIX.object_id <> OBJECT_ID('...big Table...')
ORDER BY
    FRAG.avg_fragmentation_in_percent DESC;

The motivation here was to filter via sys.indexes in the WHERE clause and therefore effectively rule out the big table, and at the same time feed the DMF with the object_id and index_id from sys.indexes. Unfortunately this idea was only brilliant until we tried out the query the first time:

Msg 413, Level 16, State 1, Line 1
Correlated parameters or sub-queries are not supported by the inline function “sys.dm_db_index_physical_stats”.
Msg 413, Level 16, State 1, Line 1
Correlated parameters or sub-queries are not supported by the inline function “sys.dm_db_index_physical_stats”.

Well. We’ve run out of ideas and now decided to use some kind of hybrid approach, in which we use a separate table that stores the object_id, index_id, page_count, and datetime of the last maintenance for all indexes of the database.

For tables with less than 1.000.000 pages we loop through this new table and call the sys.dm_db_index_physical_stats with the object_id and index_id taken from the table as parameter and then we either rebuild that index or not, depending on the level of fragmentation. If there is still enough time in the maintenance window, we retrieve another row from the table to start another operation or exit the procedure.

For tables with more than 1.000.000 pages we rebuild one single index on each maintenance day. For these indexes we do not determine the level of fragmentation upfront, but rather rebuild it right away. In the current setup an index is treated every three weeks and we have observed fragmentation carefully for these indexes and can rightly assume that enough fragmentation has been accumulated over these three weeks to justify a rebuild.

We have also decided to go for a rebuild, instead of a reorganise. This might be “too much” at times, but the option to get parallelism and therefore reduce runtime was convincing for us. However, to avoid introducing new bottlenecks by using up all available processors during index operations, we determine the number of processors upfront, and only use half of them in the ALTER INDEX statement. This still leaves enough resources available to other processes that might kick in during the index maintenance. So far, this strategy has worked out quite well for us, but you never know. Things might be different with the next billion rows.

Here are some more links on this topic.

http://social.msdn.microsoft.com/forums/en-US/transactsql/thread/e53f959d-ddb0-44dc-981c-bcf8131dc8ed

http://www.sqlmag.com/blogs/sql-server-questions-answered/sql-server-questions-answered/tabid/1977/entryid/12869/Default.aspx

Pages: 1 2




Related Articles :

Uncategorized

26 Responses to “SQL Server Index Maintenance Performance Tuning for Large Tables”

  1. Hi,

    Nice article.

    You can discover a lot more about improving SQL performance via DMVs in this new book http://www.manning.com/stirk. It contains more than 100 scripts to identify problems, and offers a wide range of solutions.

    Chapter 11 contains various scripts for automating index maintenance.

    Chapters 1 and 3 (indexes) can be downloaded for free. Chapter 1 includes scripts for:

    A simple monitor
    Finding your slowest queries
    Find your missing indexes
    Identifying what SQL is running now
    Quickly find a cached plan

    Thanks
    Ian

  2. Good article but have you got the scripts for your table schemas ?

  3. Interesting article and it was interesting to learn about the lack of predicate push-back.

    One thing that can help with index rebuild times is using the SORT_IN_TEMPDB option (assuming that tempdb is on it’s own array) We have a 2 billion row table that I performed a test on. Using this option I got the rebuild of 14 indexes down from 15 hours to 6.5 hours.

    Additionally, with very large tables, it’s also worth considering partitioning them. Apart from the other benefits that partitioning brings, if the data being populated into the table is naturally partitioned by date then you only need to worry about re-indexing a single partition (unless you get a lot of data modifications to old data). Plus, the sys.dm_db_index_physical_stats function allows you to specify the partition number, which speeds that up alot. As a result of this, I’m currently able to re-index 14 indexes of a 2 billion row table every night (because I’m only rebuilding the current-month partition).

    • Hi Karl,
      yes, we had partitioning in mind right from the start, but that was no real need for it so far. Performance is still good and only recently the business has made its mind up on a data retention policy. As it stands now we can throw away most rows after 1 month, so again no urgent need for partitioning.

  4. I got to learn new things from this article. However, I am using below given script for checking index fragmentation adn it is working fine for me. thihs checks fragmentation of large tables only (you can define large in context of your environment.)

    create table #ROW_COUNT
    (
    name varchar(2000),
    rows int,
    reserved varchar (2000),
    data varchar(2000),
    index_size varchar(2000),
    unused varchar(2000)
    )
    SET NOCOUNT ON
    DECLARE @tablename varchar(2000)
    DECLARE fetch_table_name CURSOR FOR
    SELECT name FROM sys.objects where type like ‘U’
    OPEN fetch_table_name
    FETCH NEXT FROM fetch_table_name INTO @tablename
    while @@fetch_status = 0
    BEGIN
    insert into #ROW_COUNT (name,rows,reserved,data,index_size,unused)
    exec sp_spaceused @tablename
    FETCH NEXT FROM fetch_table_name INTO @tablename
    END
    CLOSE fetch_table_name
    DEALLOCATE fetch_table_name
    SELECT name, rows, CONVERT (int,(left(data,CHARINDEX(‘ ‘,data)))) data
    INTO #DATA_CONVERTED
    FROM #ROW_COUNT
    select object_name(a.object_id) TableName, b.name Index_Name
    , a.avg_fragmentation_in_percent, a.page_count, c.rows Number_Of_Rows_In_Table, c.data Amount_Of_Data_In_Table_In_KB
    from sys.dm_db_index_physical_stats(10, null, NULL, NULL , NULL) a,
    sys.indexes b, #DATA_CONVERTED c
    where (a.index_id=b.index_id and a.object_id=b.object_id)
    and object_name(a.object_id)=c.name
    and a.index_type_desc ‘heap’
    and a.avg_fragmentation_in_percent>70
    and c.rows>5000
    and c.data>25600
    and b.type_desc ‘heap’
    SET NOCOUNT OFF
    drop table #ROW_COUNT
    drop table #DATA_CONVERTED

  5. We went through a similar thought and devleopment process and arrived at a simple conclusion. In most cases it took just about as long to determine the level of fragmentation as it did to just run the Alter Index ReOrg/Rebuild. So during our regular maintenance, we just run the Alter index on most tables using boundaries similar to 100 < #pages < [Your Max Limit]. The process cycles through the tables, keeps track of what it does, and then resumes each night at the place it stopped yesterday.
    We still debate Reorg/Rebuild. Each has advantages and disadvantages. :)

  6. i’ve noticed that if you have a newer server like a Proliant DL 380 G5 or newer with 32GB of RAM, 2 quad core CPU’s and an OK I/O configuration then you can have a silent maintenance window because no one will notice that it’s running.

    we run 24×7 and no one ever complains about SQL during maintenance times. what you can also do is change the fragmentation level to 20% or so in order to hit less indexes but run maintenance more often.

  7. You should try ola hallengren index optimize tool and specify a time for some time in seconds if you only have.some time for this task
    Ola.hallengren.com

  8. Hi Frank,

    Nice and useful article.
    Only one comment, 2005 accept 2000 mode (as you already now). My suggestion is to specify, in that case, don’t work.

    Cheers,

    Luis

  9. This is a very well written and I very much appreciate the process from start to finish and your reasoning along the way.

    One approach you could take is to run the “sys.dm_db_index_physical_stats” process and store the results into a table.
    You could start this before your maintenance window and then have your index job check the table.
    The “sys.dm_db_index_physical_stats” process can run really anytime, day before, hour before,…
    The fact that it takes ~30min to run isn’t the issue, you would just have to start this earlier and have the table persist which in at least one scenario you already have.
    This is just one additional approach not any better than yours.

    Unfortunately, partitioning is only supported in enterprise and I am forced to stick with standard :(. Unless someone knows something I don’t.
    I have even tossed the idea of using tables by year and views with triggers to control which table the insert/update requires. A mock seems to work well; I just can’t get company to test with real data.

    • Hi Matt,
      I must confess, I haven’t thought about running sys.dm_db_index_physical_stats as a separate step outside the maintenance window and then “just” pick-up that results during the maintenance window. I guess that’ll give me food for thoughts now. Thanks!

  10. Frank,

    Thanks for a ‘real-world’ article. It was thought-provoking since I have a similarly large table in our syste. My questions is this. You state in the article that ‘we determine the number of processors upfront, and only use half of them in the ALTER INDEX statement’. I’m not sure how you did that. Is that outside the scope of the article or can you give us a hint about how you accomplished that? The implication was that you forced the ALTER INDEX to use only a subset of the CPUs. How was that accomplished?

    Willem

  11. Hi everyone.

    I have read an interesting article somewhere recently about using sys.dm_db_index_operational_stats for indirectly determine fragmentation in very large tables.

  12. I argue the fact that indexes on tables with lots of inserts and deletes are not well maintained with just REORGANIZE or REBUILD. I made a huge script that drops and creates each of the indexes on these tables once every two weeks. I run REBUILD on ALL and update statistics on indexes twice a week.

    These jobs run in under 2 hours on all my dbs, and the indexes are always being used, and never get too fragmented.

    Soon I will have a script that drops and creates all indexes which I can run every 2 or 3 months.

    • Hi Colin,
      I’m not sure I understand you fully. We had the idea of not bothering with detecting fragmentation on big tables at all, and go straight for a drop and (re)create, but never really investigated this much further when we implemented this. Since then, I thought about this as well and it is on my to-do list once we have reached our year-end change freeze. I guess that’ll go into a follow-up article to this one.

      How big are your tables? And how many rows are inserted or deleted on average daily?

  13. This is one of the best answer so far, I have read online.Just useful information. Very well presented. I had found another good collection of index in sql server over internet.
    please check out this link…
    http://mindstick.com/Articles/f7e21f40-d2f3-485a-b2aa-53093bac5ff7/?Index%20in%20SQL%20Server

    Thanks

  14. Thanks for the article Frank – nice to see your approach on this.

    We have got a similar, although slightly larger, issue to the 30 minute duration you talked about for the DMF – we have a 2 TB database, on a DL380 G6 (12 core) with 72 GB ram, where running the DMF in LIMITED mode takes 9h00+. SAMPLED was kiled after 26 hours, and DETAILEd – let’s not even go there (36 hours +). Large amount of data load activity takes place on the server (it is a performance data repository).

    Given it is 2 TB in size, we don’t just want to reindex it all, but we do note that performance degrades if maintenance fails.

    Is there any other metric peristed, even if it is somewhat out-of-date by thr time we do maintenance on indexes, or do we have to activel;y ask SQL to find out the fragmentation of an index by reviewing the strucutre (i.e. either SHOWCONTIG or the DMF?)

    I’d like to avoid reindex 200 GB tables if they don’t need it, but we can’t afford 10 hours just to find out if we should do maintenance.

    • One thing that comes into mind, might be to look into partitioning to split such beasts into manageable units. Apart from this… I honestly don’t know. Sometimes I think SQL Server is more sensitive to outdated statistics than it is to fragmentation. We had several cases where updating statistics fixed time-outs on the client with fragmentation staying the same.

      I think this is more of a black art than a science. :-)

  15. Frank, I too ran into a similar issue. We currently defrag smaller tables throughout the day based on the number of pages and level of fragmentation. We had found it took longer to determine fragmentation than to actually reorganize or rebuild the indexes. I ran across a post by Paul Randal that described how to create a wrapper function for sys.dm_db_index_physical_stats that can be used with cross apply. So we now determine the page count of the indexes using sys.dm_db_partition_stats and then from there we cross apply to the wrapper function. Its been working like a charm for about 6 months now and has dramatically reduced our maintenance time for these small indexes. Check out the link below for more info. Hope it helps!

    http://www.sqlskills.com/blogs/paul/post/Indexes-From-Every-Angle-Using-CROSS-APPLY-with-sysdm_db_index_physical_stats.aspx

Software Reviews | Book Reviews | FAQs | Tips | Articles | Performance Tuning | Audit | BI | Clustering | Developer | Reporting | DBA | ASP.NET Ado | Views tips | | Developer FAQs | Replication Tips | OS Tips | Misc Tips | Index Tuning Tips | Hints Tips | High Availability Tips | Hardware Tips | ETL Tips | Components Tips | Configuration Tips | App Dev Tips | OLAP Tips | Admin Tips | Software Reviews | Error | Clustering FAQs | Performance Tuning FAQs | DBA FAQs |