What the IT Decision-Maker Needs to Know About SQL Server Performance Tuning
Identifying Problems in Application Code
Trying to find poorly performing routines in half a million lines of source code is a daunting prospect, but it ’s fairly easy to come up with the top 10 worst performing queries in a running application.
Take a quick, simple look with SQL Profiler, a tool that comes with SQL Server. Capture the events Stored Procedures: RPC Completed and TSQL: Batch Completed. Throw in the Exceptions event to make sure your applications aren’t generating errors that have been going undetected. For metrics, capture the Duration, CPU, Reads, Writes and the Textdata (text of the actual code that is being executed at the moment these measurements are taken). Let Profiler run for a time during a period of high database usage and log the results to a file. After you’ve captured enough data, sort it by duration and note the statements that are taking the longest to run. You can sort by the other metrics too, but you will probably find the statements with the longest duration are also the ones with most CPU usage, reads, etc.
When you find the poorly performing queries, you can use the ShowPlan and Statistics I/O options in SQL Query Analyzer to find out why the statements are inefficient and how they can be fixed. This can require some experience to understand what’s going on, but there are some simple things to look for. ShowPlan helps out by highlighting these problems in red. Read the output from left to right, top to bottom. The percentages refer to the percent of total work involved that occurs in that step of the job. Look for the heavy hitters. Hover your mouse over a graphic to pop up a window of detailed information.
The most common problem is that the query does not have an appropriate index to use and is using a table or index scan on a huge table. This is the condition we described above where it is reading the whole encyclopedia from A to Z. The fix is simple and the results can be dramatic. Put an index on the column or columns that are being used in the JOIN or WHERE clause of the highlighted statement. This often requires experience and/or experimentation to find the best column(s) to index.
SQL Server locking problems often result from inexperienced developers who do not realize that they can and should change the default locking behavior for certain statements. Locking problems often go undetected during the development and testing of a database application because it is difficult and expensive to generate testing scenarios to simulate dozens or hundreds of users hitting a database simultaneously. The problems usually start to show up a few weeks after roll-out as the amount of data and the number of users increase.
Locking is a complex issue, but it is necessary to understand a little bit about it in order to understand how it can be fixed. The crux of the issue is the contention between processes that only read the data in a table and those that modify it. SQL Server’s default locking behavior is conservative in this regard. By default, any number of reading processes can read the table at the same time, but no process can change data while it is being read. The other side of the coin is that no process can read the data while another process is changing it This assures the data consistency that is necessary in some kinds of database operations.
For example, if one process were updating the Accounts table to reflect a transfer of funds from one account to another, and second process was summing the total of funds in all accounts, it would be critical to enforce this kind of consistency. Otherwise, the second process might read the data after the first process had debited one account but before it had credited the other. The total of funds would be in error. But in many other cases, the processes reading the table do not need an absolutely consistent view of the data.
For example, if someone is opening a report that summarizes last month’s sales figures, why lock the tables involved? Even something that is presumed to be up to the minute often doesn’t need absolute consistency. A librarian might want to see if a particular book has been checked out. The chances that it’s status will change in the 10 milliseconds that it takes the query to run are very slim, and it doesn’t matter much even if it happens. In these kinds of queries, you may safely override SQL Server’s default behavior to avoid these locks. It can be done several ways, but for a variety of reasons placing a locking hint in the query is most often the best method. Usually the developer should add the directive “WITH READUNCOMMITTED” to queries of his type to eliminate locking altogether.
This is especially important when batch processes update a table. Users can only type or click mouse buttons so fast, but an automated process can issue hundreds or thousands of inserts, updates, or deletes per second. There can be a huge performance bottleneck when other processes need to read the table involved. And locks are a lot like freeway traffic at rush hour. A slight delay anywhere causes traffic to back up behind it very quickly.
Although SQL Profiler and System Monitor can capture some locking information, they are less useful in diagnosing locking problems than they are in other areas. Microsoft has more useful utilities available for free download, however the documentation is very thin. As a place to start, take a look at “How to Monitor SQL Server 2000 Blocking” on the Microsoft web site.
To repeat what we said above: a hardware bottleneck does not always mean that your hardware is inadequate. Inefficient application code also manifests itself in hardware bottlenecks. Too many table scans will keep your disks spinning, and bad application design or poorly written SQL code can waste memory and CPU cycles.
Since we will be talking about monitoring tools in this section, here are some general considerations to keep in mind. Monitoring puts its own load on the server and, depending on how much you are monitoring, it can be a big load. You should not run System Monitor or Profiler from the SQL Server machine itself. You should run them on another machine and have them write their logs there also. Don’t try to monitor everything at once. A few carefully selected counters, such as the ones described below, should be enough to start with.
The Standard Edition of SQL Server can only use 2GB of memory, no matter how much you have on the server. Keep in mind that memory is cheap, but SQL Server Enterprise Edition is not, nor are the advanced versions of the server operating system that may be required to utilize all the memory you buy. Before starting down this road, make sure you understand the true cost of upgrading memory. That being said, get as much memory as you can afford. SQL Server loves memory, and it is often the cheapest way to get the performance boost you need.
If you have SQL Enterprise Edition running on Windows Advanced Server or Datacenter Edition, with 4GB or more of RAM, you need to configure special memory management options at the operating system level. Otherwise you should leave SQL Server’s default memory settings alone, unless you are sure you know exactly what you are doing.
How Do You Know If You Need More Memory?
The simplest measure is to look at the SQL Server Buffer Manager: Cache Hit Ratio counter in System Monitor (a tool that comes with Windows and is already installed on your server). Log a typical 24-hour period so you get the big picture. Average cache hit ratio should be in the high nineties. 99% is OK, 95% is not. The counter Memory:PagesSec is also important. It is normal for this reading to spike very high for brief periods, but it should average 20 or below over a typical 24 hour period.
Log the following counters in System Monitor for a typical 24 hour period. If the Processor: % Processor Time frequently exceeds 80% for several minutes at a time and/or System: Processor Queue Length (total) exceeds an average of 2 per processor (i.e. a total of 4 on a 2 CPU server) for several minutes at a time, you may want to add more or faster processors. Keep in mind that the size of L2 cache is sometimes more important than actual processor speed, so get as much as possible when you upgrade.
Disk I/O can be measured with System Monitor also. Take measurements for each individual physical array. The totals for all disks mean very little. If Physical Disk: % Disk time averages over 60% for several minutes at a time and/or Physical Disk: Avg. Disk Queue Length exceeds 2 per physical disk in an array (i.e. a 5 disk RAID array is good for a queue length of 10), you probably could use more or faster disk arrays.
Disk configuration on a database server is a complex subject. There aren’t any quick generalities that are useful here. If System Monitor indicates a serious I/O problem, you probably need to consult with an expert. However, remember that the disks may be spinning because your application is not adequately indexed. Take care of that first. The disk problem might disappear.
Most of the default database configuration options that SQL Server uses aren’t going to give you serious performance problems. If an inexperienced person has been tinkering with database options, you may need to readjust them. Auto Shrink is one to watch out for.
The Auto Shrink database option should be off. If yours is the rare case where the database needs to be shrunk periodically, do it manually or schedule a job to do it in off-peak hours. This is usually turned on for the wrong reasons, and can really, negatively, affect performance.
We hope this has helped you better understand the issues involved in improving SQL Server performance. It may not have been easy reading, but we did cover a lot of ground in a short time. If you have further questions or would like clarification on some point we covered, I would be glad to help. If you have comments or suggestions to improve this background paper, I would like to hear those too.
Copyright 2005 by the author.
Pages: 1 2