Tips for Using SQL Server Performance Monitor Counters

A latch is in essence a “lightweight lock”. From a technical perspective, a latch is a lightweight, short-term synchronization object (for those who like technical jargon). A latch acts like a lock, in that its purpose is to prevent data from changing unexpectedly. For example, when a row of data is being moved from the buffer to the SQL Server storage engine, a latch is used by SQL Server during this move (which is very quick indeed) to prevent the data in the row from being changed during this very short time period. This not only applies to rows of data, but to index information as well, as it is retrieved by SQL Server.

Just like a lock, a latch can prevent SQL Server from accessing rows in a database, which can hurt performance. Because of this, you want to minimize latch time.

SQL Server provides three different ways to measure latch activity. They include:

  • Average Latch Wait Time (ms): The wait time (in milliseconds) for latch requests that have to wait. Note here that this is a measurement for only those latches whose requests had to wait. In many cases, there is no wait. So keep in mind that this figure only applies for those latches that had to wait, not all latches.
  • Latch Waits/sec: This is the number of latch requests that could not be granted immediately. In other words, these are the amount of latches, in a one second period, that had to wait. So these are the latches measured by Average Latch Wait Time (ms).
  • Total Latch Wait Time (ms): This is the total latch wait time (in milliseconds) for latch requests in the last second. In essence, this is the two above numbers multiplied appropriately for the most recent second.

When reading these figures, be sure you have read the scale on Performance Monitor correctly. The scale can change from counter to counter, and this is can be confusing if you don’t compare apples to apples.

Based on my experience, the Average Latch Wait Time (ms) counter will remain fairly constant over time, while you may see huge fluctuations in the other two counters, depending on what SQL Server is doing.

Because each server is somewhat different, latch activity is different on each server. Tt is a good idea to get baseline numbers for each of these counters for your typical workload. This will allow you to compare “typical” latch activity against what is happening right now, letting you know if latch activity is higher or lower than “typical”.

If latch activity is higher than expected, this often indicates one of two potential problems. First, it may mean your SQL Server could use more memory. If latch activity is high, check to see what your buffer cache hit ratio is. If it is below 99%, your server could probably benefit from more RAM. If the hit ratio is above 99%, then it could be the I/O system that is contributing to the problem, and a faster I/O system might benefit your server’s performance.

If you really like to get your hands dirty, here are a couple of commands you might want to experiment with to learn more about latching behavior of your software.

SELECT * FROM SYSPROCESSES WHERE waittime>0 and spid>50

This query will display currently existing SPIDs that are waiting, along with the waittype, waittime, lastwaittype, and waitresource. The lastwaittype and waitresource tells you what your latch type, and the waitresource will tell you what object the SPID is waiting on. When you run it, you may not get any results because there are no waiting occuring at the time you ran the query. But if you run the query over and over, you will eventually get some results.

DBCC SQLPerf (waitstats, clear)      –clears stats
DBCC SQLPerf (waitstats)      –give you stats as of the last clear (or SQL Server service restart)

This query displays the current latches (among other stuff), along with their Wait Type and Wait Time. You may first want to clear the stats, then run DBCC SQLPerf (waitstats) periodically over a short time period to see what latches are taking the most time.

Thanks to these forum members who contributed to this tip: josephobrien, rortloff, harryarchibald.  

*****

SQL Server performs faster and with less resources if it can retrieve data from the buffer cache instead of reading it from disk. In some cases, memory intensive operations can force data pages out of the cache before they ideally should be flushed out. This can occur if the buffer cache is not large enough and the memory intensive operation needs more buffer space to work with. When this happens, the data pages that were flushed out to make extra room must again be read from disk, hurting performance.

There are three different SQL Server counters that you can watch to help determine if your SQL Server is experiencing such a problem.

  • SQL Server Buffer Mgr: Page Life Expectancy: This performance monitor counter tells you, on average, how long data pages are staying in the buffer. If this value gets below 300 seconds, this is a potential indication that your SQL Server could use more memory in order to boost performance.
  • SQL Server Buffer Mgr: Lazy Writes/Sec: This counter tracks how many time a second that the Lazy Writer process is moving dirty pages from the buffer to disk in order to free up buffer space. Generally speaking, this should not be a high value, say more than 20 per second or so. Ideally, it should be close to zero. If it is zero, this indicates that your SQL Server’s buffer cache is plenty big and SQL Server doesn’t have to free up dirty pages, instead waiting for this to occur during regular checkpoints. If this value is high, then a need for more memory is indicated.
  • SQL Server Buffer Mgr: Checkpoint Pages/Sec: When a checkpoint occurs, all dirty pages are written to disk. This is a normal procedure and will cause this counter to rise during the checkpoint process. What you don’t want to see is a high value for this counter over time. This can indicate that the checkpoint process is running more often than it should, which can use up valuable server resources. If this has a high figure (and this will vary from server to server), consider adding more RAM to reduce how often the checkpoint occurs, or consider increasing the “recovery interval” SQL Server configuration setting.

These performance monitor counters should be considered advanced and only used to “refine” a potential diagnosis of “not enough memory” for your SQL Server.

]]>

Leave a comment

Your email address will not be published.