How SQL Server Determines an Execution Plan Using Available Indexes and Statistics
As a SQL Server DBA, one of my tasks is to monitor the performance of our databases. Our primary database holds the data of over 18 million people. A web-based application queries the database with a variety of search options. When examining the performance of these queries, most performed very well. But on the other hand, a few queries have resulted in a time-out.
One of the first steps in analyzing the time out problem was to find the search pattern used in the slow running queries. I found that the fields queried were well indexed and often gave a good response time. But when I ran a trace of the execution plans used in these queries, I found that an available index was often not used.
This led me to investigate how SQL Server determines an execution plan for a query. By better understanding the principles behind them, I was able to improve the structure of my indexes and improve the overall performance of the troublesome queries.
When SQL Server executes a query, it will use an execution plan if one already exists. If there is no preexisting plan, SQL Server follows a series of steps to create a plan. The objective of this is to find an acceptable plan, not necessarily the optimal plan. The effort to find the best possible plan might take longer than to execute an acceptable, but not optimal plan. To come to this plan, the following steps are taken by the Query Optimizer. Each step might produce one or more plans. Each plan is assigned a total cost for execution, and SQL Server uses the plan with the lowest costs.
First, SQL Server checks if a trivial plan is available. If so, it will go ahead with this plan. For example, with an INSERT statement using the VALUES clause, there is only one possible execution.
If no trivial plan is available, SQL will try to simplify the query, so that a trivial plan will be available. This will not result in a new plan, but helps SQL to analyze the query. At this point SQL server will load any statistics that will help it in the cost-based process that follows. This cost-based process has three steps.
First is the transaction processing phase. In this phase, SQL Server picks out plans for simple queries typical of transaction processing databases. If a plan is formed with a total cost below a threshold, it will use this plan.
If a cheap plan can not be found, the Quick Plan phase starts. In this phase, SQL includes choices that are often useful in more complex queries. This might include use of indexes and nested loop joins. Again, if a cheap enough plan is found, SQL Server will use this plan.
The last phase is the Full Optimization Phase. In this phase, SQL Server compares every possible execution plan and then goes with the cheapest one. This phase has a time limit. When the time limit is reached, SQL Server goes ahead with the cheapest plan found up until that moment.
This process — except for the trivial plan — is based solely on statistics. Bad statistics lead to bad execution plans. If an index is placed on a small table, the optimal plan might be to ignore the index and use a table scan. If the statistics are not updated while the table grows significantly, SQL Server will still assume that the table is small and use the table scan, even if this is no longer the optimal plan.
In most cases it is wise to let SQL Server automatically update statistics. To turn on the automatic updates for all statistics in a database, if not already on, execute the following statement:
EXEC sp_dboption ‘MyDatabase’, ‘ auto update statistics’, ‘true’
To turn on the automatic updates for all statistics on a specific table, such as Clients, execute the following statement:
EXEC sp_autostats Clients, ‘ON’
To manually update the statistics on specific table, such as Clients, execute the following command:
UPDATE STATISTICS Clients
Other options for updating statistics can be found on this website.
How Statistics Work
Keeping statistics up to date is crucial for optimal performance of a database. In some cases, however, an available execution plan is not ideal for the given situation. To understand when and why this happens, it is important to understand how statistics work.
The following command displays the statistics for a given index (index IX1a on table Clients)
DBCC SHOW_STATISTICS (Clients, IX1a)
This gives all statistical information on this index. Among the date and time of the last update and it’s density, it returns samples of the index. This sample is taken at random from the database and gives the following information:
Figure 1: DBCC SHOW_STATISTICS output