SQL Server Quantitative Performance Analysis
NOTE: See bottom of page to access all of the available articles.
Quantitative models for the SQL Server internal query cost formulas and the true measured query costs for basic SQL operations are presented. The internal cost formulas are used by SQL Server’s Query Optimizer to determine the execution plan based on existing indexes and statistical information on the distribution of data.
Understanding the quantitative model for the cost formulas employed by SQL Server enables developers and administrators to know why a particular plan is chosen over another and why an index is used or not used. The model also demonstrates the importance of populating test databases with the correct cardinality rather than sheer size to reflect the execution plan in a production database. The true cost formulas enable developers to make architecture and coding decisions critical to performance early in the development process and provides a solid basis from which to set performance goals.
This series of articles is still in draft form. Readers are invited to submit questions and comments to the author. The knowledge level required is very advanced. The reader should be familiar with SQL Server tables and index organizational structures, execution plans, query hints, and SQL Server internals from technical articles on the MSDN web site, Kalen Delaney’s Inside SQL Server 2000, and other sources.
This series of articles is intended to benefit database architects and developers who need to predict application performance in the early phases of development so as to minimize re-architecture and re-writing necessary to meet performance goals. Standard performance tuning methodologies are usually sufficient when the production database environment is available. However, database administrators with unusually difficult performance issues that resist brute force hardware and conventional index tuning approaches may also find the information presented here useful.
The purpose of this series of articles is to establish a quantitative model for predicting SQL Server performance. Every application can be characterized by a set of queries. Each query can have one or more possible execution plans. The execution plan consists of component operations. The cost of each component operations depends on a number of factors including the number of rows involved. The query optimizer picks an execution plan for either the lowest cost or fastest execution time based on estimates for the number of rows involved and the cost structure of the component operations.
The set of characteristic queries comes from the application and its usage. So a quantitative performance model begins with understanding how the optimal execution plan is determined. The SQL Server execution plan cost formulas are explored in some detail to this end. The execution plan cost formulas do not necessarily represent the true cost of a query. The true cost of the fundamental SQL operations needs to be measured. In addition, the expected processor and platform dependencies are addressed.
A survey of the actual measured costs for the basic SQL queries shows where query and lock hints can be employed substantial performance gains with merge and hash joins. The more consequential implications concern how the database architecture, table structure, queries, and indexes must be designed together, not one after the other.
The emphasis in this series of articles is on quantitative analysis instead of being qualitative or descriptive in nature. This necessitates heavy use of data and formulas. It is not a simple matter to internalize so much information. The benefit of a quantitative model is that we can predict which queries will cause performance issues from the execution plan and row count rather than only after extensive testing.
So far, performance and cost measurements have been conducted on only one type of query at a time. There is no guarantee that these results can be extended to production environments, where many queries are running simultaneously. The results collected so far are still deemed sufficiently useful. It can be argued that the chaotic nature of a production environment makes it difficult to predict performance. Fortunately, the business case for a major database project should never hinge on whether performance will be 1,000 queries per second or 990/sec, but it may hinge on the difference between 1,000 and 100. Even an order of magnitude estimate makes the quantitative model useful. Some information is better than no information until after the development funds are spent.
Click on the appropriate links below to view each article. New articles will be added as they are completed.
The utilities, tools and scripts used to examine execution plan costs.
Cost formulas employed by SQL Server in evaluating query costs. Part I: Index Seek, Bookmark Lookup and Table Scans. Part II: Loop, Hash and Merge Joins. Part III: Insert, Update and Delete
Cost definitions, processor and platform dependencies, runtime conditions, test programs, and data population scripts.
The measured costs for basic SELECT queries on a Pentium III, Pentium III Xeon, and (Pentium 4) Xeon server platforms.
Measured costs for basic INSERT, UPDATE and DELETE queries on server platforms in comparison to the execution plan formula costs.
A brief discuss on calculating database performance.
Useful items that have not been fitted into the grand scheme of things.
Joe Chang is a freelance consultant specializing in SQL Server, database architecture, design, performance tuning, and scalability analysis. Joe has more than 12 years experience in software development, including performance and scalability analysis, for microprocessors, server systems and database applications. The materials and tools in this series of articles are available as a 1- or 2-day onsite training course.
Article series published with express written permission. Copyright 2002 Joe Chang