Understanding SQL Server Query Optimization – Part 1
In the first of this four part series I will give an overview of the tools and techniques of query optimization in SQL Server
Most of the activity of the database server is related to queries. Because of this, query optimization is often a major concern. SQL Server tries to optimize query performance itself, but its effectiveness in this may be influenced by several problems. Indexes and poorly designed queries can lead to unacceptable performance, no matter how well you design your database. Key issues related to query optimization include designing indexes, statistics, and index fragmentation. Statistics describe index key values, are maintained for index columns, and are used by SQL Server to decide on the most appropriate index to use when running queries.
SQL Server Query optimizer
The query optimizer of SQL Server is a component that is responsible for optimizing the execution plan of the query. Key concerns in selecting an optimal execution plan are I/O processes needed and CPU resources required for query processing, and execution time required for query to return the results to the user. The result of the analysis performed by the query optimizer is an execution plan that acts as a step-by-step guide to process the query.
During the optimization process, the query optimizer considers several execution plans and selects the optimal plan of this set. The plan it selects is often neither the fastest nor the less intense, but it provides good performance while reducing resource requirements.
SQL Server stores the execution plan in the procedure cache. If appropriate, SQL Server caches both a serial execution plan and parallel execution plan. This means that the query optimizer can select an existing execution plan in the cache, if appropriate, instead of generating a new execution plan if the query is executed again. If the underlying tables, indexes or statistics change between each execution, the execution plan is recompiled before being reused. The steps of the query processing process are:
1. Parsing – Checks query for syntax errors. If it passes the syntax check, the syntax is broken into component parts that are recognizable to the database engine.
2. Standardization – Transforms query format, preparing it for optimization. Redundant clauses are removed as part of this process.
3. Query optimization – Query optimizer selects the most appropriate execution plan.
4. Compilation – After selecting an execution plan, SQL Server compiles the query into executable code based on that plan.
To determine the best execution plan, the query optimizer takes into account several factors, including:
• Indexes – The query optimizer first determines if there are any indexes on the source tables or views. If not, the database engine retrieves data by scanning the entire table. The database engine begins scanning at the physical beginning of the table, and scans through the full table page-by-page and row-by-row to look for qualifying data. Then extracts and returns rows meeting the selection criteria. The process is improved when indexes are present. However, If there is an appropriate index available, SQL Server uses it to locate the data. If source tables have multiple indexes then query optimiser determines which indexes or columns might minimize I/O activity by reducing the number of rows that must be examined while processing the query.
• Column statistics – If statistic updates are enabled, the query optimizer creates or updates column statistics as necessary.
• Join strategies – The query optimizer chooses the best join strategy (how to process a union/join), and the order of conducting join operations.
The query optimizer considers the data structures and data characteristics. This includes issues such as the size of the database file and its placement and index density. Index density refers to the percentage of duplicate rows in an index. Density is not necessarily constant along index, and because of this, the query optimizer may decide to use a table scan when retrieving some values, but use an index scan for other records. If query search conditions include a dense (duplicate) column value, it uses a table scan.
The query governor sets the query cost limit, which prevents long-running queries consume an excessive amount of query resources. You can set the cost limit the query to a value in seconds. SQL Server determines the limits of cost based on the value entered. Set the query governor to 0 disables the query governor. You can set the query governor through:
• Server properties
You can set the limit for all connections using the sp_configure system-stored procedure and server properties. You can set the limit for the current connection only using the following:
Overriding the query optimizer
You can fine-tune query performance in this situation by specifying optimizer hints in your query. You should avoid using optimizer hints unless you believe that the query optimizer is to choose less efficient execution plans and have no other solution available. You can enter optimizer hints as:
• Table hints – Specify indexes used, force a table scan, and set locking options.
• Join hints – Mutually exclusive choices that set the join type.
• Query hints – Specified as part of the OPTION clause and affecting all query operators.
If using optimizer hints, verify that you are, in fact, getting an improved performance by testing the query. Do not be surprised if performance does not improve. Often it is found that the query optimizer selects appropriate execution plans and that the problem is more in how the query is written and the resources (such as indexes) that are available to the query optimizer. If you use hints, you should check query performance periodically because the optimizer hints prevent the query optimizer dynamically adjust to changing conditions.
Key points to remember for indexes and queries
You support queries and query optimization by creating indexes to reduce input/output (I/O) requirements. Key points include:
• Use non-clustered indexes to support queries. This gives the query optimizer the information it needs without having to access data pages.
• Use included columns for columns that cannot be specified as key columns. This provides a more direct access to the data in included columns.
• Indexes must include all of the columns on which the query depends, as either key or non-key columns. This includes columns in the result set as well as those used for sorting, aggregation, and limiting the result set. It’s often appropriate to have all columns in some tables indexed through composite indexes.
• Provide aggregated data values for queries over indexed views. This provides better performance than the aggregate calculation when processing the query.
You should monitor the relationship between row sizes and index key sizes. If the index key size approaches the size of the row, the index offers little or no performance benefit. A clustered index, which includes a wide selection of columns in the table, could be an appropriate choice.