Performance Tuning Re-indexing and Update Statistics – A Case Study

Recently we started experiencing a very strange issue in our production reporting environment where the Re-indexing and Update Statistics operation suddenly began taking more than 2 days to complete and was thus causing blockage in the database which in turn caused impairment in application performance.

Reporting Server Configuration Details:
SQL Server Version: SQL Server 2005 Enterprise Edition
Service Pack: 3
Windows Version: Windows Server 2003 R2 Enterprise Edition
Number of CPU’s: 12
RAM: 16 GB

To troubleshoot this issue, I first separated the Re-indexing and Update Statistics operations.


Index Re-building:

Originally the Re-indexing operation was performed using the DBCC DBReindex command. When this command was executed against the database, it drops and re-creates every index present in the tables of the particular database which will take a considerable amount of time as well as cause heavy usage of the Server Hardware resources.

As per best practice recommended by Microsoft there is no need for us to perform the re-indexing of each and every index in the tables of the database. If the fragmentation level of a particular index is less than 30% then we can simply re-organize it, if it is greater than 30% then we should elect for the re-building of the index.

 First we need to understand the difference between the re-build and re-organize operations. Index rebuild will drop the existing index and re-create it again whereas index re-organize will physically re-organize the leaf node of the index. One thing to remember here is that the rebuildng operation is a very high resource utilization task.

 Based upon this, I decided to modify the Re-indexing task as follows:

-- Ensure a USE  statement has been executed first.
SET NOCOUNT ON;
DECLARE @objectid int;
DECLARE @indexid int;
DECLARE @partitioncount bigint;
DECLARE @schemaname nvarchar(130);
DECLARE @objectname nvarchar(130);
DECLARE @indexname nvarchar(130);
DECLARE @partitionnum bigint;
DECLARE @partitions bigint;
DECLARE @frag float;
DECLARE @command nvarchar(4000);
-- Conditionally select tables and indexes from the sys.dm_db_index_physical_stats function
-- and convert object and index IDs to names.
SELECT
    object_id AS objectid,
    index_id AS indexid,
    partition_number AS partitionnum,
    avg_fragmentation_in_percent AS frag
INTO #work_to_do
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, 'LIMITED')
WHERE avg_fragmentation_in_percent > 10.0 AND index_id > 0;
-- Declare the cursor for the list of partitions to be processed.
DECLARE partitions CURSOR FOR SELECT * FROM #work_to_do;
-- Open the cursor.
OPEN partitions;
-- Loop through the partitions.
WHILE (1=1)
    BEGIN;
        FETCH NEXT
           FROM partitions
           INTO @objectid, @indexid, @partitionnum, @frag;
        IF @@FETCH_STATUS < 0 BREAK;
        SELECT @objectname = QUOTENAME(o.name), @schemaname = QUOTENAME(s.name)
        FROM sys.objects AS o
        JOIN sys.schemas as s ON s.schema_id = o.schema_id
        WHERE o.object_id = @objectid;
        SELECT @indexname = QUOTENAME(name)
        FROM sys.indexes
        WHERE  object_id = @objectid AND index_id = @indexid;
        SELECT @partitioncount = count (*)
        FROM sys.partitions
        WHERE object_id = @objectid AND index_id = @indexid;
-- 30 is an arbitrary decision point at which to switch between reorganizing and rebuilding.
        IF @frag < 30.0
              SET @command = N'ALTER INDEX ' + @indexname + N' ON ' + @schemaname + N'.' + @objectname + N' REORGANIZE';
        IF @frag >= 30.0
            SET @command = N'ALTER INDEX ' + @indexname + N' ON ' + @schemaname + N'.' + @objectname + N' REBUILD';
        IF @partitioncount > 1
                    SET @command = @command + N' PARTITION=' + CAST(@partitionnum AS nvarchar(10));
                            EXEC (@command);
        --PRINT N'Executed: ' + @command;
    END;
-- Close and deallocate the cursor.
CLOSE partitions;
DEALLOCATE partitions;
-- Drop the temporary table.
DROP TABLE #work_to_do;
GO

Once the above index rebuilding logic was implemented, the query execution time reduced drastically from around 10-12 hours to a mere 2 hours 38 minutes which is an excellent improvement in performance.


Update Statistics:

The Update Statistics operation was performed using the below syntax:

Exec sp_MSForEachtable ''update statistics ? with fullscan''

The above command indicates that each and every statistic present in all the tables of the database is going to be updated with a FULLSCAN.

During the execution of the above command, I noticed that the query was taking a long long time to execute at a particular stage and that a particular column belonging to a NTEXT data type was the primary cause. Now let us first understand, How NTEXT can impair performance.

Let us first create a simple table using the below T-SQL:

create table comment
(
comment_text NTEXT
)

After the table is created, let’s insert few records into it using the below T-SQL:

insert comment
select 'ABC'
insert comment
select 'PQR'
insert comment
select 'XYZ'

Now let us check the Statistics IO count of the table named Comment:

set statistics io on
select * from comment
set statistics io off

The Statistics IO count is as follows:

logical reads 1,
physical reads 0,
read-ahead reads 0,
lob logical reads 6,
lob physical reads 0,
lob read-ahead reads 0

As you can see from the above T-SQL that though the value of the Logical Reads is only 1 but the value of the LOB Logical Reads is 6 which is pretty high. This is an expected behaviour as NTEXT never stores the data in the form of a table, instead it stores it in the form of a large objects structure which in turn will take a long long time to perform an Update Statistics operation if done with FULLSCAN. At some stage, you will notice that if you perform an Update Statistics operation with FULLSCAN it may take as long as days to complete.

Also it should be kept in mind that for a LOB column, when the statistics are created the first 100 bytes and last 100 bytes will be read to generate statistics. If we do not specify the sample rate for the UPDATE STATISTICS command, SQL Server will calculate the sample rate based on the number of rows and number of pages used by that table. In addition, the number of BLOB pages will not be considered so it is possible that when we run UPDATE STATISTIC a large volume of data will be sampled and 200 bytes of every BLOB row will be read which generates a large I/O request and increases memory consumption. Therefore it is always best that if you have a table in which there are columns of data types TEXT,NTEXT, IMAGE,VARBINARY then always perform the UPDATE STATISTICS operation without a FULLSCAN. I updated the T-SQL involved in performing the UPDATE STATISTICS operation as follows:

Update Statistics without NTEXT and Image Data Type:

SET NOCOUNT ON
DECLARE @columnname VARCHAR(MAX)
DECLARE @tablename SYSNAME
DECLARE @statsname SYSNAME
DECLARE @sql NVARCHAR(4000)
DECLARE @NAME VARCHAR(MAX)
declare @i INT
declare @j INT
create table #temp
(
tablename varchar(1000),
statsname varchar(1000),
columnname varchar(1000)
)
insert #temp(tablename,statsname,columnname)
SELECT DISTINCT
OBJECT_NAME(s.[object_id]),
s.name AS StatName,
COALESCE(@NAME+ ', ', '')+c.name
FROM sys.stats s JOIN sys.stats_columns sc ON sc.[object_id] = s.[object_id] AND sc.stats_id = s.stats_id
JOIN sys.columns c ON c.[object_id] = sc.[object_id] AND c.column_id = sc.column_id
JOIN INFORMATION_SCHEMA.COLUMNS D ON D.[COLUMN_NAME]= C.[NAME]
JOIN sys.partitions par ON par.[object_id] = s.[object_id]
JOIN sys.objects obj ON par.[object_id] = obj.[object_id]
WHERE OBJECTPROPERTY(s.OBJECT_ID,'IsUserTable') = 1
AND D.DATA_TYPE NOT IN('NTEXT','IMAGE')
create table #temp1
(
id int identity(1,1),
tablename varchar(8000),
statsname varchar(8000),
columnname varchar(8000)
)
insert #temp1(tablename,statsname,columnname)
select tablename,statsname,stuff(
(
    select ','+ [columnname] from #temp where
statsname = t.statsname for XML path('')
),1,1,'')
from (select distinct tablename,statsname from #temp )t
SELECT @i=1
SELECT @j=MAX(ID) FROM #temp1
WHILE(@I<=@J)
BEGIN
SELECT @statsname = statsname from #temp1 where id = @i
SELECT @tablename = tablename from #temp1 where id = @i
SELECT @columnname = columnname from #temp1 where id = @i
SET @sql = N'UPDATE STATISTICS '+QUOTENAME(@tablename)+QUOTENAME(@statsname)+space(1)+'WITH FULLSCAN'
--PRINT @sql
EXEC sp_executesql @sql
SET @i = @i+1
END
DROP TABLE #temp
DROP TABLE #temp1


Update Statistics with NTEXT and Image Data Type:

SET NOCOUNT ON
DECLARE @columnname VARCHAR(MAX)
DECLARE @tablename SYSNAME
DECLARE @statsname SYSNAME
DECLARE @sql NVARCHAR(4000)
DECLARE @NAME VARCHAR(MAX)
declare @i INT
declare @j INT
create table #temp
(
tablename varchar(1000),
statsname varchar(1000),
columnname varchar(1000)
)
insert #temp(tablename,statsname,columnname)
SELECT DISTINCT
OBJECT_NAME(s.[object_id]),
s.name AS StatName,
COALESCE(@NAME+ ', ', '')+c.name
FROM sys.stats s JOIN sys.stats_columns sc ON sc.[object_id] = s.[object_id] AND sc.stats_id = s.stats_id
JOIN sys.columns c ON c.[object_id] = sc.[object_id] AND c.column_id = sc.column_id
JOIN INFORMATION_SCHEMA.COLUMNS D ON D.[COLUMN_NAME]= C.[NAME]
JOIN sys.partitions par ON par.[object_id] = s.[object_id]
JOIN sys.objects obj ON par.[object_id] = obj.[object_id]
WHERE OBJECTPROPERTY(s.OBJECT_ID,'IsUserTable') = 1
AND D.DATA_TYPE  IN('NTEXT','IMAGE')
create table #temp1
(
id int identity(1,1),
tablename varchar(8000),
statsname varchar(8000),
columnname varchar(8000)
)
insert #temp1(tablename,statsname,columnname)
select tablename,statsname,stuff(
(
    select ','+ [columnname] from #temp where
statsname = t.statsname for XML path('')
),1,1,'')
from (select distinct tablename,statsname from #temp )t
SELECT @i=1
SELECT @j=MAX(ID) FROM #temp1
WHILE(@I<=@J)
BEGIN
SELECT @statsname = statsname from #temp1 where id = @i
SELECT @tablename = tablename from #temp1 where id = @i
SELECT @columnname = columnname from #temp1 where id = @i
SET @sql = N'UPDATE STATISTICS '+QUOTENAME(@tablename)+QUOTENAME(@statsname)
--PRINT @sql
EXEC sp_executesql @sql
SET @i = @i+1
END
DROP TABLE #temp
DROP TABLE #temp1

This is just one approach to fine tuninjg the Re-indexing and Update Statistics operation on a production reporting environment. It has now been almost 2 months since the above logic is in place and things are working absolutely fine. Please let me know if you have any comments or suggestions.




Related Articles :

7 Responses to “Performance Tuning Re-indexing and Update Statistics – A Case Study”

  1. Contrary to what SQL Server Books Online has to say about the low resource utilization of index reorganizing, my research from several different sources shows the opposite.

    I recommend that you read 2 articles and decide for yourself the impact index reorganization is having on your server. The first article specifically refers to the requirements of mirroring:
    > http://blogs.msdn.com/b/timchapman/archive/2012/09/28/index-rebuild-vs-reorganize-the-transaction-log-edition.aspx

    The first article’s summary:
    “…. For large, heavily fragmented indexes the reorganization operation is much less efficient than the index rebuild operation. This is important to know because if you’re using a technology that relies on reading from the transaction log, such as mirroring, transactional replication/CDC then performing a reorg is going to generate a lot more activity that needs to be sorted through…..”

    The results of the first article’s test show that index reorganization utilized 7500% more log file records than index rebuilding did.

    The second article refers to the Books Online guidelines for using reorg or rebuild:
    > http://www.sqlskills.com/blogs/paul/post/Where-do-the-Books-Online-index-fragmentation-thresholds-come-from.aspx

    The second article was written Paul Randal, a former Microsoft contractor who wrote the DBCC CHECKDB command execution and many other database diagnostic tools. He is considered the world’s leading authority on SQL Server database index structures and database integrity. He says plainly that the reorg vs. rebuild guidelines in Books Online are made up.

    My research and testing shows that, in general, performance improvements can be achieved by rebuilding indexes with fragmentation exceeding 20% and greater than 1000 pages in size.

    Just to make sure we understand each other, this is the command that I use to rebuild indexes:
    ALTER INDEX [] ON [dbo].[]
    REBUILD PARTITION = ALL WITH
    (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = ON,
    ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON,
    ONLINE = ON, SORT_IN_TEMPDB = ON, FILLFACTOR = 100 )
    except that I have a developed a proprietary fill factor calculation algorithm that dynamically determines and optimizes index fill factors, almost always below 100% full. I can’t share it with you. I can tell you that when fill factors are optimized, the necessity for index rebuilding is dramatically reduced.
    A fill factor designation describes the amount of free space left in every page of an index when it is created or rebuilt. Since INSERT and UPDATE commands can cause page splits, having fill factors of less than 100% can help to prevent page splits, which can be very expensive in terms of additional I/O and slow your system down.
    On maintaining index statistics:
    FULLSCAN causes the scan of an entire index when an index is created or rebuilt. The “sp_updatestats” stored procedure (without FULLSCAN) samples the data in the index rather than read all of the data. There is some debate as to which produces the best results. Apparently, either one might produce a better result depending upon the queries that use the index. “sp_updatestats” executes more quickly than a full statistics rebuild operation using FULLSCAN.

    Hope this helps,

    Lee Crain

  2. Nice article, appreciate your efforts to prepare something like this and help others to know about it.
    Good Job SS.

  3. Pity you had to suffer the first problem because there has been an example of rebuild/reorg of indexes based on fragmentation in Books Online for some time. Great article.

  4. Hi,

    I hope readers are aware that an index rebuild automatically gives you the best statistics (fullscan), so don’t run UPDATE STATS immediately after an index rebuild.

    Secondly, it makes sense to update stats based on an intelligent sampling algorithm. you can see one in section 10.4 of this book “DMVs in Action” http://www.amazon.com/SQL-Server-DMVs-Action-Management/dp/1935182730

    Thanks
    Ian

  5. I was also working on the same task it it was quite interesting.

  6. Good article …..the link you mentioned at the last is it a tool for sql performance monitoring also ?

  7. Highly energetic blog, I enjoyed that a lot. Will
    there be a part 2?

    My web page :: Web Site

Software Reviews | Book Reviews | FAQs | Tips | Articles | Performance Tuning | Audit | BI | Clustering | Developer | Reporting | DBA | ASP.NET Ado | Views tips | | Developer FAQs | Replication Tips | OS Tips | Misc Tips | Index Tuning Tips | Hints Tips | High Availability Tips | Hardware Tips | ETL Tips | Components Tips | Configuration Tips | App Dev Tips | OLAP Tips | Admin Tips | Software Reviews | Error | Clustering FAQs | Performance Tuning FAQs | DBA FAQs |