Understanding SQL Server Query Optimization Statistics

In this
article, you will learn what query optimisation statistics are and how SQL
Server query optimizer uses statistics. You will also learn how to create and
manage statistics, and what are the different ways to determine when statistics
were last updated in SQL Server.

What
are Query Optimization Statistics?

The
term statistics refers specifically to distribution statistics. Distribution statistics describe
index key values, specifically:


Selectivity – Refers to how many
rows are identified by a key value. A unique index has high selectivity. A key
with many duplicates has low selectivity.


Distribution – Used when estimating how efficient an index will be in retrieving
data associated with a key value or range.

The
query optimization statistics are simply a form of dynamic metadata that contain statistical information about the distribution of
values in one or more columns of a table or indexed view. Statistics describe index key values, are maintained for index columns,
and are used by SQL Server when deciding on the most appropriate indexes to use
when running queries. Statistics helps to estimate the cardinality, or number of rows, in the
query result, and generally helps query optimizer in making better decisions.
For example, if there are only a dozen rows in a table, then there is no reason
to go to an index for a search, it is always better to do a full table scan.
But if that same table grows to one million rows, then you probably be better
off using the index. But if you query this table column that only has very few
unique values ​​(for example, it might
be a column of "sex" that only contains "M" or
"F"), then actually a full table scan would be a better choice
because query optimiser would need to retrieve the block anyway to build the
result set. Now say that the table is 99% "M", and only 1%
"F", then full table scan would be used in one case, and index seeks
in the other. In the same table, same query, potentially four different query
plans based on the contents of the table. These things are the
"statistics" and that are individual to each database – even two
databases with identical table and index structure will have different
statistics.

SQL
Server query optimizer uses statistics to create query plans that improve query
performance. For most queries, the query optimizer already generates the necessary
statistics for a high quality query plan; in a few cases, you need to create
additional statistics or modify the query design for best results.

By
default, SQL Server maintains statistics for index columns. You can also keep
statistics on selected non-indexed columns, if desired. Statistics on
non-indexed columns count toward the limit of non-clustered indexes (249) that
you can have on a table.

Database-wide
statistics options

There
are three options that relate to statistics. These are:


AUTO_CREATE_STATISTICS
– When ON, SQL Server automatically creates missing statistics needed for query
optimization.


AUTO_UPDATE_STATISTICS
– When ON, SQL Server automatically updates statistics, as needed by the query
optimizer for optimal query performance.


AUTO_UPDATE_STATISTICS_ASYNC
– When ON, SQL Server query optimiser uses asynchronous statistics updates.

You
use the SET command and configure either option
as ON or OFF. Both AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS are set ON
by default. You use database properties and the ALTER
DATABASE command to maintain the settings and change the database default for
all connections. The following is the general syntax for ALTER
DATABASE
command to set these database
statistics options:

ALTER DATABASE database_name

SET

  AUTO_CREATE_STATISTICS { ON | OFF } 
| AUTO_UPDATE_STATISTICS { ON | OFF } 
| AUTO_UPDATE_STATISTICS_ASYNC { ON | OFF }
    

You
can manage the automatic update statistics setting for an individual table or
index (enable or disable) with sp_autostats, using the syntax:

sp_autostats
table_name [[,] ‘ON | OFF’ [[,] index_name]

To
view the setting for the Ops.Employee table, you can run:

sp_autostats Ops.Employee

If
you specify an index name, you can set the update option for a selected index.

Creating
statistics

The SQL
Server query optimizer starts creating statistics automatically if it
determines that the index statistics are missing or out of date and the
database or table is configured to automatically create statistics. If you
configured a database for manual statistics updates, the query optimizer
(through execution
plan
, if shown) prompts you whether the statistics
are missing or out of date.

By default,
SQL Server uses a sample of rows available to update the statistics
automatically. The sample size is determined dynamically based on the number of
rows in the table or view. SQL Server uses a full scan to create statistics
when:


An
index is initially created


CREATE
STATISTICS is
executed with the FULLSCAN option


UPDATE
STATISTICS is
executed.


The
table size is less than 8 MB.

Automatic
statistics updates prevent out-of-date statistics from impairing the performance
of the query optimizer.

Manually
Creating Statistics

You can
choose to manually create statistics for columns for which they are not created
automatically by using CREATE
STATISTICS
statement.

The following
is the syntax of CREATE
STATISTICS
command:

        CREATE STATISTICS statistics_name

ON { table|view } (column_list)

[WITH

[[ FULLSCAN | SAMPLE number PERCENT | ROWS

| STATS_STREAM = stats_stream ] [ , ] ] [ NORECOMPUTE ]] 

    

You must
specify a statistics_name. You can create statistics for a table or view and,
if desired, limit the statistics for specific columns. You can perform the scan
of full source or limit the number of rows processed with the FULLSCAN
and SAMPLE
options. Use NORECOMPUTE
force manual statistics updates. Otherwise, the statistics set are updated
automatically. The statistics are available to the query optimizer, even if
there are no indexes on the base table or view.

Updating Statistics

SQL Server
automatically updates statistics when they become obsolete, unless you have
disabled the updates using ALTER DATABASE statement by setting AUTO_UPDATE_STATISTICS option to OFF. The query
optimizer initiates these updates as soon as it determines which statistics are
outdated. For example, SQL Server updates statistics when added or updated rows
represent a significant percentage of the total rows in a table. You may have
to manually update statistics when:


An
index was created on an empty table that was populated later.


You
ran TRUNCATE
TABLE on
a table and later repopulated the table.


You
want a full scan of the table for generating statistics.


Static
statistics (NORECOMPILE)
were generated.


Automatic
update is disabled.

You can use
the UPDATE
STATISTICS
statement to
manually update statistics using the following syntax:

        UPDATE STATISTICS table | view [index | (statistics_name)]

[WITH [FULLSCAN | RESAMPLE | SAMPLE number PERCENT | ROWS

[[,] ALL | COLUMNS | INDEX]] [[,] NORECOMPUTE]]
    

You must
specify a table or indexed view name. The UPDATE
STATISTICS
statement does not support non-indexed
views. If not specified index names or statistics, all table indexes are
updated. You specify the RESAMPLE option to use the original sampling ratio in
the query optimizer. If the query optimizer determines that the original
relationship is no longer valid, the sampling is corrected dynamically. You can
choose to update:


ALL
– Update all statistics.


COLUMNS
– Update column statistics only.


INDEX
– Update index statistics only.

These are
mutually exclusive options.

Determining
When Statistics were last updated in SQL Server

There
are two ways to access the last modified date of a statistic. These are:


Through
the header information using DBCC
SHOW_STATISTICS
.


Through
STATS_DATE()
function and sys.stats
system catalog view.

1)
Through
the header information using
DBCC SHOW_STATISTICS

According
to Microsoft Books Online, DBCC
SHOW_STATISTICS
returns the header, histogram, and
vector density based on the data stored in the statistics object. The syntax
lets you specify a table or indexed view along with a target index name,
statistics name, or column name (see below):

        USE [<Database_Name>]

GO

DBCC SHOW_STATISTICS (table_or_indexed_view_name

                     ,index_or_statistics_name_coloumn_name')

[ WITH [ NO_INFOMSGS ] < option > [ , n ] ]

< option > :: =

    STAT_HEADER | DENSITY_VECTOR | HISTOGRAM | STATS_STREAM


    

Example:
Returning all statistics information

The
following example returns all statistics information for the
IXC_ServerChecks_Results_LoadID index of the ServerChecks.Results table.

USE [SQLComplianceDW]

GO

DBCC SHOW_STATISTICS('ServerChecks.Results', IXC_ServerChecks_Results_LoadID);
    

The
first result set of the DBCC
SHOW_STATISTICS
command returns the header
information, including when the statistics were last updated. To only return
the header information about the statistic, you executed DBCC
SHOW_STATISTICS
with STATS_HEADER option.

 USE [SQLComplianceDW]

GO

DBCC SHOW_STATISTICS('ServerChecks.Results'

  ,IXC_ServerChecks_Results_LoadID) WITH STAT_HEADER;
    

Note that STAT_HEADER, HISTOGRAM and
DENSITY_VECTOR options are only available in SQL Server 2005 and later
releases.

As
you can see from above example, DBCC
SHOW_STATISTICS
returns statistics information for
the index name, statistics name, or column name of the specified table or in
indexed view, however, if you only want to see the statistics update date for
all statistics objects that exists for the tables, indexes, and indexed views
in the database, you query sys.stats and use STATS_DATE()
function.

2)
Through
STATS_DATE() function and sys.stats system catalog
view

According
to SQL Server Books Online, sys.stats system catalog view is the best way to
see each statistics object information that exists for the tables, indexes, and
indexed views in the database. This catalog view exists in SQL Server 2005 and
later. You can use this system catalog view with STATS_DATE()
function, to view most recent update date for each statistics object that
exists for the tables, indexes, and indexed views in the database. This
function accepts two parameters, that is, object_id, stats_id. To determine
date when the statistics where last updated, you execute sys.stats
system catalog view with STATS_DATE()
function, as follows:

 SELECT OBJECT_NAME(object_id) AS [ObjectName]

 ,[name] AS [StatisticName]

 ,STATS_DATE([object_id], [stats_id]) AS [StatisticUpdateDate]

FROM sys.stats
    

As
you can see, this query returns date of the most recent update for statistics
on a table or indexed view.

Further
Readings:


Read the following article from Kimberly
L. Tripp
(Blogg|Twitter),
which discusses STATISTICS_NORECOMPUTE and why would anyone want to use it. I
particularly found this article very informative.


Microsoft Reference: Using
Statistics to Improve Query Performance
.




Array

2 Responses to “Understanding SQL Server Query Optimization Statistics”

  1. You may also want to mention this new DMF with regards to statistics information:

    [SQL]
    SELECT OBJECT_SCHEMA_NAME([s].[object_id]) AS “SchemaName”
    , OBJECT_NAME([s].[object_id]) AS “TableName”
    , [s].[stats_id] AS “Statistic_ID”
    , [s].[name] AS “Statistic”
    , [sp].[last_updated] AS “Last_Updated”
    , [sp].[rows]
    , [sp].[rows_sampled]
    , [sp].[unfiltered_rows]
    , [sp].[modification_counter] AS “Modifications”
    , case when [sp].[last_updated] IS null then NULL
    else ‘UPDATE STATISTICS ‘ + quotename(object_schema_name(s.OBJECT_ID)) + ‘.’ + quotename(object_name(s.OBJECT_ID)) + ‘ ‘ + quotename(name) + ‘ /* WITH RESAMPLE */ ;’ + ‘ /* Previous Stats date [' + ISNULL(convert(varchar(23), STATS_DATE(OBJECT_ID, stats_id), 121), ' NULL ') + '] */’
    end as UpdStatsCmd
    , case when [sp].[last_updated] IS null then NULL
    else ‘DROP STATISTICS ‘ + quotename(object_schema_name(s.OBJECT_ID)) + ‘.’ + quotename(object_name(s.OBJECT_ID)) + ‘.’ + quotename(name) + ‘ ; ‘ + ‘ /* Previous Stats date [' + ISNULL(convert(varchar(23), STATS_DATE(OBJECT_ID, stats_id), 121), ' NULL ') + '] */’
    end as DelStatsCmd

    FROM [sys].[stats] AS [s]
    OUTER APPLY sys.dm_db_stats_properties([s].[object_id], [s].[stats_id]) AS [sp]

    order by SchemaName, TableName, Statistic_ID ;
    [/SQL]

  2. Hi Basit,

    Nice article!

    In the section that handles updating statistics i missed some clarification on when to update your stats :)

    Deciding on just by stats_date, sysindexes/rowmodctr or dm_db_stats_properties means you could be updating stats that have valid histograms.

    Any ideas about that?

    greetz,
    Theo

Software Reviews | Book Reviews | FAQs | Tips | Articles | Performance Tuning | Audit | BI | Clustering | Developer | Reporting | DBA | ASP.NET Ado | Views tips | | Developer FAQs | Replication Tips | OS Tips | Misc Tips | Index Tuning Tips | Hints Tips | High Availability Tips | Hardware Tips | ETL Tips | Components Tips | Configuration Tips | App Dev Tips | OLAP Tips | Admin Tips | Software Reviews | Error | Clustering FAQs | Performance Tuning FAQs | DBA FAQs |