SQL Server UPDATE STATISTICS – SQL Server Performance

You can add statistics on columns that don’t have statistics in order to boost query performance. For example, you can create statistics for columns in a composite index other than the first column (which is the default for AUTO CREATE STATISTICS), and for columns that are not indexed.

The SQL Server Query Optimizer has the ability to use statistics on columns that are not indexed in order to create more optimal execution plans. Doing so can increase performance without having the overhead of indexes. In fact, sometimes it is more efficient to use column statistics instead of an index to optimize query performance. The only way to know this for sure is to make controlled experiments and see what works best for your situation.

Generally, column statistics, if needed by the Query Optimizer, are generated automatically by SQL Server when queries are being optimized by the Query Optimizer. For example, column statistics are automatically created when there are currently no statistics available for the column being used as a search argument in a WHERE clause or a JOIN clause. There are two cases when SQL Server will not automatically create column statistics, even when they fit the criteria described above. These include: when the costs needed to create the column statistics are larger than the costs of the query plan; and when SQL Server is too busy.

Another way to identify non-indexed columns to add statistics to is to use the Index Wizard. It can identify columns that can benefit from column statistics, and add them for you automatically if you wish.

You use the CREATE STATISTICS command to manually create statistics on columns.

*****

One way to tell if the statistics for a column are missing or outdated, or if the accuracy of current statistics is not ideal (they may not be ideal if the sample size of the statistics are not high enough), is to run a graphical execution plan of a query as an estimated plan and as a real plan. In other words, you will run both an estimated and an actual graphical execution plan using Query Analyzer, and then compare the results.

When you run an estimated graphical execution plan for a query, you get results, such as the estimated row count returned. When you run an actual graphical execution plan for a query, you get the actual row count returned. If the statistics for the column(s) used by the query are current and accurate, then the estimated row count and the actual row count should be very similar. If they are wildly different, this can mean three things. Either the statistics are missing, out of date, or not accurate enough.

First, check to see if the statistics are missing. If they are, they can easily be added. Second, if statistics do exist, then update them to see if this resolves the problem. Third, if the first two suggestions don’t work, then the statistics may not be accurate enough.

By default, when a table is less than 8 MB, the sample used to create the column statistics is the entire table. So for tables of less than 8 MB, in theory the statistics created can’t be any better than they are. So if your table is less than 8 MB, and updating the statistics doesn’t resolve the issue of a large difference between the estimated and actual row count, then nothing can. In a case like this, your only option, assuming there is a performance problem, is to use a hint to override what the Query Optimizer is doing wrong.

But if the table in question is greater than 8 MB, and when SQL Server automatically creates statistics for a table, it does not examine every row. Instead, it takes a sample of the available rows. This sample may or may not be good enough to produce accurate enough statistics. One way to get better statistics is to manually update the statistics for the table using the UPDATE STATISTICs command. This command has two options that allow you to get a better sampling of the rows in your table.

First, you can use the FULLSCAN option. This will force every row in the table or index to be fully examined and used to build the column statistics. As you might guess, using this option can increase the time it takes to update the statistics, which could hurt performance elsewhere on your server, especially if the table is huge.

As an alternative to the FULLSCAN option, you can use the SAMPLE option. This allows you to control how large a sample of rows to sample. For example, you might specify that only 50 percent of the rows are to be sampled. This is less than what a FULLSCAN will produce, but more than what a default sample will produce, offering a compromise in accuracy and resources needed to perform the sample.

If you find that the statistics for one or more of our tables is not accurate enough for your needs, you may need to schedule running the UPDATE STATISTICS command at regular intervals using an appropriate sampling level. Only through testing will you know what sampling level is best for your situation.

*****

Statistics are only maintained on the first column of a composite index. Because of this, SQL Server’s Query Optimizer might not make as good use of composite indexes as it could. If you suspect that a composite index is not being used as you expected it would be, you will want first to verify that the index is not being used by examining the query’s execution plan. If you see that it is not being used, and it should be, consider adding a hint to force the index to be used, then see if using the index really helps performance. If it does, keep the hint. If it does not help performance, drop the hint and the composite index that is not being used, assuming that it is not needed by another query.

*****

If you want to find out if an index has had its indexes updated or not, you can use the DBCC SHOW_STATISTICS command, like this:

DBCC SHOW_STATISTICS (table_name , index_name)

This command will not only tell you when statistics were last updated, but also provide you information on the kind of statistics that has been collected for the index you are examining

*****

If you have a performance issue with a query, the very first step you should take before analyzing it is to update the statistics on all the tables in your database. This is because many query performance issues are due to outdated statistics, and updating them before troubleshooting query performance may save you a lot of time. If the query is still giving you trouble after updating the statistics, then you can begin your troubleshooting.

*****

According to a SQL Server tuning expert at Microsoft (one of their best), it was suggested to me that if your SQL Server maintenance window allows for it, that you should update statistics for all tables and for all databases every night. This is because the auto-update option is less than perfect, and accurate and up-to-date statistics is so important to overall performance of SQL Server. Of course, if this is not possible because of time constraints, then you will have to perform this step less often, or even not at all, and rely on the auto-update feature. But if you do have this window, you should take advantage of it.

*****

If you upgrade to SQL Server 2000 from SQL Server 7.0, you should update all of the statistics in all your databases using either UPDATE STATISTICS or sp_updatestats. This is because statistics are not automatically updated during the conversion process.

If you don’t manually update the statistics, then you may find that many of the queries are running slowly because of incorrect statistics, and incorrectly blame SQL Server 2000 as being slower than SQL Server 7.0, which is just not the case.

Continues…

Leave a comment Cancel reply