Understanding SQL Server Query Optimization Statistics

In this article, you will learn what query optimisation statistics are and how SQL Server query optimizer uses statistics. You will also learn how to create and manage statistics, and what are the different ways to determine when statistics were last updated in SQL Server.

What are Query Optimization Statistics?

The term statistics refers specifically to distribution statistics. Distribution statistics describe index key values, specifically:

Selectivity – Refers to how many rows are identified by a key value. A unique index has high selectivity. A key with many duplicates has low selectivity.

Distribution – Used when estimating how efficient an index will be in retrieving data associated with a key value or range.

The query optimization statistics are simply a form of dynamic metadata that contain statistical information about the distribution of values in one or more columns of a table or indexed view. Statistics describe index key values, are maintained for index columns, and are used by SQL Server when deciding on the most appropriate indexes to use when running queries. Statistics helps to estimate the cardinality, or number of rows, in the query result, and generally helps query optimizer in making better decisions. For example, if there are only a dozen rows in a table, then there is no reason to go to an index for a search, it is always better to do a full table scan. But if that same table grows to one million rows, then you probably be better off using the index. But if you query this table column that only has very few unique values ​​(for example, it might be a column of "sex" that only contains "M" or "F"), then actually a full table scan would be a better choice because query optimiser would need to retrieve the block anyway to build the result set. Now say that the table is 99% "M", and only 1% "F", then full table scan would be used in one case, and index seeks in the other. In the same table, same query, potentially four different query plans based on the contents of the table. These things are the "statistics" and that are individual to each database – even two databases with identical table and index structure will have different statistics.

SQL Server query optimizer uses statistics to create query plans that improve query performance. For most queries, the query optimizer already generates the necessary statistics for a high quality query plan; in a few cases, you need to create additional statistics or modify the query design for best results.

By default, SQL Server maintains statistics for index columns. You can also keep statistics on selected non-indexed columns, if desired. Statistics on non-indexed columns count toward the limit of non-clustered indexes (249) that you can have on a table.

Database-wide statistics options

There are three options that relate to statistics. These are:

AUTO_CREATE_STATISTICS – When ON, SQL Server automatically creates missing statistics needed for query optimization.

AUTO_UPDATE_STATISTICS – When ON, SQL Server automatically updates statistics, as needed by the query optimizer for optimal query performance.

AUTO_UPDATE_STATISTICS_ASYNC – When ON, SQL Server query optimiser uses asynchronous statistics updates.

You use the SET command and configure either option as ON or OFF. Both AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS are set ON by default. You use database properties and the ALTER DATABASE command to maintain the settings and change the database default for all connections. The following is the general syntax for ALTER DATABASE command to set these database statistics options:

ALTER DATABASE database_name
SET
  AUTO_CREATE_STATISTICS { ON | OFF }
| AUTO_UPDATE_STATISTICS { ON | OFF }
| AUTO_UPDATE_STATISTICS_ASYNC { ON | OFF }
    

You can manage the automatic update statistics setting for an individual table or index (enable or disable) with sp_autostats, using the syntax:

sp_autostats table_name [[,] ‘ON | OFF’ [[,] index_name]

To view the setting for the Ops.Employee table, you can run:

sp_autostats Ops.Employee

If you specify an index name, you can set the update option for a selected index.

Creating statistics

The SQL Server query optimizer starts creating statistics automatically if it determines that the index statistics are missing or out of date and the database or table is configured to automatically create statistics. If you configured a database for manual statistics updates, the query optimizer (through execution plan, if shown) prompts you whether the statistics are missing or out of date.

By default, SQL Server uses a sample of rows available to update the statistics automatically. The sample size is determined dynamically based on the number of rows in the table or view. SQL Server uses a full scan to create statistics when:

• An index is initially created

• CREATE STATISTICS is executed with the FULLSCAN option

• UPDATE STATISTICS is executed.

• The table size is less than 8 MB.

Automatic statistics updates prevent out-of-date statistics from impairing the performance of the query optimizer.

Manually Creating Statistics

You can choose to manually create statistics for columns for which they are not created automatically by using CREATE STATISTICS statement.

The following is the syntax of CREATE STATISTICS command:

        CREATE STATISTICS statistics_name
ON { table|view } (column_list)
[WITH
[[ FULLSCAN | SAMPLE number PERCENT | ROWS
| STATS_STREAM = stats_stream ] [ , ] ] [ NORECOMPUTE ]]
    

You must specify a statistics_name. You can create statistics for a table or view and, if desired, limit the statistics for specific columns. You can perform the scan of full source or limit the number of rows processed with the FULLSCAN and SAMPLE options. Use NORECOMPUTE force manual statistics updates. Otherwise, the statistics set are updated automatically. The statistics are available to the query optimizer, even if there are no indexes on the base table or view.

Updating Statistics

SQL Server automatically updates statistics when they become obsolete, unless you have disabled the updates using ALTER DATABASE statement by setting AUTO_UPDATE_STATISTICS option to OFF. The query optimizer initiates these updates as soon as it determines which statistics are outdated. For example, SQL Server updates statistics when added or updated rows represent a significant percentage of the total rows in a table. You may have to manually update statistics when:

• An index was created on an empty table that was populated later.

• You ran TRUNCATE TABLE on a table and later repopulated the table.

• You want a full scan of the table for generating statistics.

• Static statistics (NORECOMPILE) were generated.

• Automatic update is disabled.

You can use the UPDATE STATISTICS statement to manually update statistics using the following syntax:

        UPDATE STATISTICS table | view [index | (statistics_name)]
[WITH [FULLSCAN | RESAMPLE | SAMPLE number PERCENT | ROWS
[[,] ALL | COLUMNS | INDEX]] [[,] NORECOMPUTE]]
    

You must specify a table or indexed view name. The UPDATE STATISTICS statement does not support non-indexed views. If not specified index names or statistics, all table indexes are updated. You specify the RESAMPLE option to use the original sampling ratio in the query optimizer. If the query optimizer determines that the original relationship is no longer valid, the sampling is corrected dynamically. You can choose to update:

• ALL – Update all statistics.

• COLUMNS – Update column statistics only.

• INDEX – Update index statistics only.

These are mutually exclusive options.

Determining When Statistics were last updated in SQL Server

There are two ways to access the last modified date of a statistic. These are:

• Through the header information using DBCC SHOW_STATISTICS.

• Through STATS_DATE() function and sys.stats system catalog view.

1) Through the header information using DBCC SHOW_STATISTICS

According to Microsoft Books Online, DBCC SHOW_STATISTICS returns the header, histogram, and vector density based on the data stored in the statistics object. The syntax lets you specify a table or indexed view along with a target index name, statistics name, or column name (see below):

        USE [<Database_Name>]
GO
DBCC SHOW_STATISTICS (table_or_indexed_view_name
                     ,index_or_statistics_name_coloumn_name')
[ WITH [ NO_INFOMSGS ] < option > [ , n ] ]
< option > :: =
    STAT_HEADER | DENSITY_VECTOR | HISTOGRAM | STATS_STREAM
    

Example: Returning all statistics information

The following example returns all statistics information for the IXC_ServerChecks_Results_LoadID index of the ServerChecks.Results table.

USE [SQLComplianceDW]
GO
DBCC SHOW_STATISTICS('ServerChecks.Results', IXC_ServerChecks_Results_LoadID);
    

The first result set of the DBCC SHOW_STATISTICS command returns the header information, including when the statistics were last updated. To only return the header information about the statistic, you executed DBCC SHOW_STATISTICS with STATS_HEADER option.

 USE [SQLComplianceDW]
GO
DBCC SHOW_STATISTICS('ServerChecks.Results'
  ,IXC_ServerChecks_Results_LoadID) WITH STAT_HEADER;
    

Note that STAT_HEADER, HISTOGRAM and DENSITY_VECTOR options are only available in SQL Server 2005 and later releases.

As you can see from above example, DBCC SHOW_STATISTICS returns statistics information for the index name, statistics name, or column name of the specified table or in indexed view, however, if you only want to see the statistics update date for all statistics objects that exists for the tables, indexes, and indexed views in the database, you query sys.stats and use STATS_DATE() function.

2) Through STATS_DATE() function and sys.stats system catalog view

According to SQL Server Books Online, sys.stats system catalog view is the best way to see each statistics object information that exists for the tables, indexes, and indexed views in the database. This catalog view exists in SQL Server 2005 and later. You can use this system catalog view with STATS_DATE() function, to view most recent update date for each statistics object that exists for the tables, indexes, and indexed views in the database. This function accepts two parameters, that is, object_id, stats_id. To determine date when the statistics where last updated, you execute sys.stats system catalog view with STATS_DATE() function, as follows:

 SELECT OBJECT_NAME(object_id) AS [ObjectName]
 ,[name] AS [StatisticName]
 ,STATS_DATE([object_id], [stats_id]) AS [StatisticUpdateDate]
FROM sys.stats
    

As you can see, this query returns date of the most recent update for statistics on a table or indexed view.

Further Readings:

• Read the following article from Kimberly L. Tripp (Blogg|Twitter), which discusses STATISTICS_NORECOMPUTE and why would anyone want to use it. I particularly found this article very informative.

• Microsoft Reference: Using Statistics to Improve Query Performance.

]]>

Leave a comment

Your email address will not be published.