An Exclusive Interview with David Gugick, SQL Server Performance Expert and Software Developer
Once you have identified poorly performing Transact-SQL code, what is the best way to determine what the problem(s) are and how to resolve them?
I like to do a couple of things:
- Understand what the query has to do in order to perform its function (tables accessed and the Transact-SQL code used to perform that access).
- Know how often the query is executed.
- Understand how much information is returned to the client.
- Determine if the development environment is sufficient.
- Know how SQL Server is going about running the query (execution plan).
Let’s address each item briefly. First, I need to understand what the query is supposed to do. For example, a query that returns the number of customer orders between a date range. Once I understand this, I review the current Transact-SQL code with a copy of the database design next to me. Is the Transact-SQL code designed correctly? If not, it needs to be changed. If it is, is it designed in the most efficient way, given the database design. In other words, am I using the correct tables, am I accessing any unnecessary tables, would it help to add another table, do I access extraneous columns, is the WHERE clause optimizable by SQL Server, is the result set sorted, and if so, does it need to be? These are some of the questions I’d ask myself when first reviewing a query. I may make changes at this stage.
Next, I want to know how often the query is executed on the server. I may have to estimate this if we are strictly in development mode, or I may be able to use Coefficient to see the result from the production environment. I like to know this because I want to use my time as efficiently as possible. During query tuning, a query that is executed once a day and is not performing well should probably take a back seat to tuning one that is executed 1,000 times a day.
Next, I want to know how much information is likely to be returned to the client. A query that returns 100,000 rows will generate a large number of reads, whereas a query that inserts a single row in a table, should generate very few. This is true whether a query is tuned properly or not. I use this information to gauge my overall impression of how well the query is tuned as I create each new iteration for testing. It’s also important because queries that generate large result sets, even when tuned properly, may cause strain on a database server. This is why some companies employ read-only copies of databases for reporting purposes. Armed with the knowledge of what the query returns can help you decide when and where the query should be executed in the first place.
Next, I examine my test environment. Am I using performance statistics generated from a different server than the one on which I’m testing? Ideally, the environments should be as similar is terms of table sizes and data distribution as possible. Many times development servers contain only a small subset of production data. This can make development go more quickly, but can prevent you from seeing problems that can show up when larger data sets are used. Since I’m primarily interested in reducing the number of reads, I’m not as concerned right now with overall CPU and duration. It’s likely that the development server runs on less powerful hardware than production databases and this will show up as slower execution times. However, with similar data sets, the number of reads should be similar.
Once I’ve completed these preliminary steps, which can be done quickly if I have a good understanding of the underlying database and the design of the query, I need to know how SQL Server is processing the query. I can examine the query’s execution plan to see what methods SQL Server is using to generate the results. In particular, I want to see if there are any Table Scans or Index Scans. Scans of this type are potentially the most damaging to a server’ performance.
I check to see if the column(s) in the WHERE/JOIN have corresponding indexes available to them. SQL Server best optimizes queries when the proper indexes are in place. If an index is missing, I check to see whether the missing column(s) can be added to an existing index or gauge whether a new one should be created. Is the table’s clustered index on the right set of columns? If not, I may propose a change to the clustered index. If an index is already in place, but SQL Server is not using it, I examine the query to see why. Is it because too many rows from the table are returned to the client? Am I using non-optimizable SQL criteria? For example, SQL Server may not be able to optimize WHERE <col> LIKE ‘%XYZ%’ or WHERE <col> != ‘ABC’. I may also try different ways to access the same data. For example, I may use table variables on SQL Server 2000 instead of temp tables, temp tables instead of cursors, or break up large queries into smaller, more optimizable ones, etc.
Armed with the knowledge from these items, I make modifications to the query and re-analyze. I continue this iterative process until the query is executing as well as is expected. Since index changes can have far-reaching effects well beyond the query being tuned, it’s important to retest the entire database when these changes are complete.