An Introduction to SQL Server Query Tuning
During a recent interview to find a new job before my contract ran out, I was asked a question that took me by surprise. The interviewer simply asked me what steps I took to decide which stored procedures needed optimizing and what steps I used to optimize those queries. What took me by surprise was not the question; it was the realization that I did not have a “formal” plan of attack to determine stored procedures that needed to be optimized or a plan to actually optimize them. After the interview, I decided to formalize the basic steps I used to find stored procedures to optimize, and the steps I took whenever I decided a query needed to be optimized. After jotting down some notes, I decided that I would turn this “plan” into an article so others would know what basic steps are needed to optimize a query.
As you can guess, the first step you need to do is to identify what queries are in need of tuning. When you boil it down, there are actually two types of queries in a SQL Server installation: pre-deployment queries and post-deployment queries. It goes without saying that all new (pre-deployment) queries need to be optimized fully before you even schedule them for deployment. However, those queries that have already been deployed, even those you have already optimize, should be looked at again as the database environment changes over time to determine if they have developed new problems.
Most companies will usually only work on post-deployment queries when they create a problem and are placed on a “bug” list of some kind. Yes, you can find problem queries that way, but if you wait until they are on a “bug” list, you tend to only have the minimum time needed to “fix” the problem and get the application back up and running. If you are lucky, you will have time to incorporate any new performance tuning tips you learned since you last worked on the query as you fix the new problem.
This ad-hoc way of tuning queries may or may not allow you to identify and correct major performance issues with your stored procedures because it deals with only those that are bad enough to gain major attention. A better plan would allow you to not only work on problem queries as they arise, but also to obtain lists of queries that show performance problems before they make your “bug” list. These queries can then be optimized as you have time or as you enhance them for new application releases. Using the power of the SQL Profiler and running a simple trace to capture the performance of your stored procedures can easily obtain these new query lists. This trace is simple to create, and depending on your installation, should be run during your production time frame off and on for a few days to obtain a good sampling of data.
Creating Query Lists with a SQL Profiler Trace To create a lists of queries, you will need to create and run a trace capturing all the Stored Procedure events, as well as the following data columns:
This trace may create a large amount of data, so it might be easier to create the trace so it will place the information to a table so you can use a query to view the data. If can find a way to filter the trace in order to limit the data output, I would suggest that you do so. I usually start my stored procedure names with ‘spXXXX’ so I can filter by object name. But as you look into your particular situation, you can usually come up with something to use to filter some of the unneeded data out to lessen the impact of the trace and the amount of data you have to sift through.
Now that you have trace data saved to a table. Create several stored procedures that you can use to identify the worst performing procedures by looking at duration (duration is displayed as milliseconds), SP:CacheMiss, SP:Recompile, Reads, and Writes. Create lists with the largest durations, the most Reads, the most Writes, and stored procedures that include the SP:CacheMiss and the SP:Recompile events. It might take you a few times to learn what is considered excessive Reads and Writes in your database, but if you see a stored procedure that it way above the average, it might give you somewhere to start. The other lists are easier to define for problem queries and can be worked on right away.
Now that you have your hit lists of pre-production queries, long running queries, queries with excessive reads and writes, and queries that are recompiling, how do you proceed to optimize them?
Long Running and New Queries: For long running queries and queries that you have just created, the first thing you need to do is to produce an Execution Plan for that query. Using the Execution Plan you should:
- Look for anything that shows up in red. Query Analyzer will color icons red if it determines there are certain problems. Most of the time, red means that the statistics for the indexes being used by a part of the query are out-of-date, but it can mean other things. Review any icons that show up red and determine how to fix the issues.
- Look at how you are obtaining the data from the tables. Is the query doing a Table Scan? Can you do anything to change the Index or Clustered Index Scans into Index or Clustered Index Seeks? Can you rework your query to use the Clustered Index instead of a Non-Clustered Index? Simply moving the base data retrieval from a Table Scan to an Index or Clustered Index seek will usually solve the problem of a slow-running query. This is a very quick way to improve the performance of a majority of the problem queries you will encounter.
- Look at the Cost of each query segment. A simple thing to remember is that the Cost of each segment roughly equates to the percentage of time it takes that segment to complete. Look at the larger percentages and see if you can optimize that segment of code any. The result may still mean that the segment remains the most costly, but the goal is to optimize that segment so it runs faster than before, not necessarily so that it becomes the least costly segment.