SQL Server Hardware Performance Checklist

Physical Disk: Avg. Disk Queue Length

Besides watching the Physical Disk: % Disk Time counter, you will also want to watch the Avg. Disk Queue Length counter as well. If it exceeds 2 for continuous periods (over 10 minutes or so during your 24 hour monitoring period) for each disk drive in an array, then you may have an I/O bottleneck for that array. Like the Physical Disk: % Disk Time counter, if this happens occasionally in your 24 hour monitoring period, I wouldn’t worry too much, but if it happens often, then I would start looking into finding ways to increase the I/O performance on the server, as described previously.

You will need to calculate this figure because Performance Monitor does not know how many physical drives are in your array. For example, if you have an array of 6 physical disks, and the Avg. Disk Queue Length is 10 for a particular array, then the actual Avg. Disk Queue Length for each drive is 1.66 (10/6=1.66), which is well within the recommended 2 per physical disk.

Before using this counter under NT 4.0, be sure to manually turn it on by going to the NT Command Prompt and entering the following: “diskperf -y”, and then rebooting your server. This is required to turn on the disk counters on for the first time under Windows NT 4.0. If you are running Windows 2000, this counter is turned on by default.

Use both the % Disk Time and the Avg. Disk Queue Length counters together to help you decide if your server is experiencing an I/O bottleneck. For example, if you see many time periods where the % Disk Time is over 55% and when the Avg. Disk Queue Length counter is over 2 per physical disk, you can be confident the server is having a I/O bottleneck.

Processor: % Processor Time

The Processor Object: % Processor Time counter, is available for each CPU (instance), and measures the utilization of each individual CPU. This same counter is also available for all of the CPUs (total). This is the key counter to watch for CPU utilization. If the % Total Processor Time (total) counter exceeds 80% for continuous periods (over 10 minutes or so during your 24 hour monitoring period), then you may have a CPU bottleneck. If these busy periods are only occur occasionally, and you think you can live with them, that’s OK. But if they occur often, you may want to consider reducing the load on the server, getting faster CPUs, getting more CPUs, or getting CPUs that have a larger on-board L2 cache. 

System: Processor Queue Length

Along with the Processor: % Processor Time counter, you will also want to monitor the Processor Queue Length counter. If it exceeds 2 per CPU for continuous periods (over 10 minutes or so during your 24 hour monitoring period), then you probably have a CPU bottleneck. For example, if you have 4 CPUs in your server, the Processor Queue Length should not exceed a total of 8 for the entire server.

If the Processor Queue Length regularly exceeds the recommended maximum, but the CPU utilization is not correspondingly as high (which is typical), then consider reducing the SQL Server “max worker threads” configuration setting. It is possible the reason that the Processor Queue Length is high is because there are an excess number of worker threads waiting to take their turn. By reducing the number of “maximum worker threads”, what you are doing is forcing thread pooling to kick in (if it hasn’t already), or to take greater advantage of thread pooling.

Use both the Processor Queue Length and the % Total Process Time counters together to determine if you have a CPU bottleneck. If both indicators are exceeding their recommended amounts during the same continuous time periods, you can be assured there is a CPU bottleneck.

SQL Server Buffer: Buffer Cache Hit Ratio

This SQL Server Buffer: Buffer Cache Hit Ratio counter indicates how often SQL Server goes to the buffer, not the hard disk, to get data. In OLTP applications, this ratio should exceed 90%, and ideally be over 99%. If your buffer cache hit ratio is lower than 90%, you need to go out and buy more RAM today. If the ratio is between 90% and 99%, then you should seriously consider purchasing more RAM, as the closer you get to 99%, the faster your SQL Server will perform. In some cases, if your database is very large, you may not be able to get close to 99%, even if you put the maximum amount of RAM in your server. All you can do is add as much as you can, and then live with the consequences.

In OLAP applications, the ratio can be much less because of the nature of how OLAP works. In any case, more RAM should increase the performance of SQL Server.

SQL Server General: User Connections

Since the number of users using SQL Server affects its performance, you may want to keep an eye on the SQL Server General Statistics Object: User Connections counter. This shows the number of user connections, not the number of users, that currently are connected to SQL Server.

If this counter exceeds 255, then you may want to boost the SQL Server configuration setting, “Maximum Worker Threads” to a figure higher than the default setting of 255. If the number of connections exceeds the number of available worker threads, then SQL Server will begin to share worker threads, which can hurt performance. The setting for “Maximum Worker Threads” should be higher than the maximum number of user connections your server ever reaches.

Where to Go From Here

While there are a lot more counters than the ones you find on this page, these cover the key counters that you need to monitor during your Performance Audit. Once you have completed your Performance Monitor analysis, use the recommendations presented here, and later in this article series, to make the necessary changes to get your SQL Server performing as it should.

View the Next Article in the Series

Return to Previous Article in the Series

]]>

Leave a comment

Your email address will not be published.