As you measure CPU activity with CPU counters, keep in mind that the following processes are those that use the most CPU resources in SQL Server:
- Context switching: This occurs when threads are switched between CPUs in your SQL Server, and excessive context switching eats up CPU resources. In some cases, context switching can be reduced by turning on Lightweight Pooling. See this website for more details on this option.
- Recompilation: Excessive SP recompilation wastes CPU resources and slows SQL Server. See this website for ways to reduce SP recompilation.
- Sorting: At some point, data may have be be sorted. But to reduce the load sorting can have on SQL Server, only sort the minimum amount of data that needs to be sorted, or consider pushing off the sorting function onto the client.
- Hashing: Hashing often occurs when JOINs are used. They are also used in UNIONs. Often, but not always, a HASH JOIN can be modified so that is runs as a LOOP JOIN instead, which reduces CPU activity and speeds performance. See this website for more details on LOOP JOINs.
If you want to reduce CPU activity, you will need to reduce one or more of the above SQL Server functions.
Measuring the CPU activity of your SQL Server is a key way to identify potential CPU bottlenecks. The Process Object: % Processor Time counter is available for each CPU (instance) in your server, and measures the utilization of each individual CPU. If you have more than a single CPU, or if you have a single hyperthreading CPU, then watching this counter for specific CPUs is not of much value. Instead, monitor the _Total instance, which provides you with the total overall CPU utilization for your server, which is a good overall indicator of how busy your server is.
As an alternative to the _Total instance, you can also use the System Object: % Total Processor Time counter, which measures the average of all the CPUs in your server.
Generally speaking, if the total CPU utilization of your server exceeds 80% for continuous periods (over 10 minutes or so), then you may have a CPU bottleneck. Some potential solutions to a CPU bottleneck are to reduce the server load (tune those queries), get faster CPUs, or get more CPUs.
Another valuable indicator of CPU performance is the System Object: Processor Queue Length. If the Processor Queue Length exceeds 2 per CPU for continuous periods (over 10 minutes or so), then you probably have a CPU bottleneck. For example, if you have 4 CPUs in your server, the Processor Queue Length should not exceed a total of 8 for the entire server.
If the Processor Queue Length regularly exceeds the recommended maximum, but the CPU utilization is not correspondingly as high (which is typical), then consider reducing the SQL Server “max worker threads” configuration setting. It is possible the reason that the Processor Queue Length is high is because there are an excess number of worker threads waiting to take their turn. By reducing the number of “maximum worker threads”, what you are doing is forcing thread pooling to kick in (if it hasn’t already), or to take greater advantage of thread pooling.
Use both the Processor Queue Length and the % Total Process Time counters together to determine if you have a CPU bottleneck. If both indicators are exceeding their recommended amounts during the same continuous time periods, you can be assured there is a CPU bottleneck.
If the System Object: % Total Processor Time counter in your multiple CPU server regularly runs over 80% or so, then you may want to start monitoring the System: Context Switches/Sec counter. This counter measures how often NT Server switches between threads. Heavy context switching hurts performance and should be minimized. If the System: Context Switches/Sec counter nears 8000, then you should consider using NT Windows Fibers. Fibers are subcomponents of threads that perform similarly to threads. The advantage of using them is that they have less overhead when being switched between CPUs in a multiple CPU server.
This benefit does not show up unless the server’s CPUs are running at near maximum, or the Performance Monitor System: Context Switching/Sec counter nears 8000 switches per second for continual periods (over 10 minutes), so don’t select this option unless your server’s CPUs are nearly maxed out. You will also want to carefully test (before and after) the effect of this setting on your server’s performance, because you may not always get the results you expect when making this change.