Tips for Using SQL Server Performance Monitor Counters
If your users are complaining that they have to wait for their transactions to complete, you may want to find out if object locking on the server is contributing to this problem. To do this, use the SQL Server Locks Object: Average Wait Time (ms). You can use this counter to measure the average wait time of a variety of locks, including: database, extent, Key, Page, RID, and table.
As the DBA, you have to decide what an acceptable average wait time is. One way to do this is to watch this counter over time for each of the lock types, finding average values for each type of lock. Then use these average values as a point of reference. For example, if the average wait time in milliseconds of RID (row) locks is 500, then you might consider any value over 500 as potentially a problem, especially if the value is a lot higher than 500, and extends over long periods of time.
If you can identify one or more types of locks causing transaction delays, then you will want to investigate further to see if you can identify what specific transactions are causing the locking. The Profiler is the best tool for this detailed analysis of locking issues.
While table scans are a fact of life, and sometimes faster than index seeks, generally it is better to have fewer table scans than more. To find out how many table scans your server is performing, use the SQL Server Access Methods Object: Full Scans/sec. Note that this counter is for an entire server, not just a single database. One thing you will notice with this counter is that there often appears to a pattern of scans occurring periodically. In many cases, these are table scans SQL Server is performing on a regular basis for internal use.
What you want to look for are the random table scans that represent your application. If you see what you consider to be an inordinate number of table scans, then break out the Profiler and Index Tuning Wizard to help you determine exactly what is causing them, and if adding any indexes can help reduce the table scans. Of course, SQL may just be doing its job well, and performing table scans instead of using indexes because it is just plain more efficient. But you won’t know unless you look and see what is really happening under the covers.
If you suspect that your backup or restore operations are running at sub-optimal speeds, you can help verify this by using the SQL Server Backup Device Object: Device Throughput Bytes/sec. This counter will give you a good feel for how fast your backups are performing. You will also want to use the Physical Disk Object: Avg. Disk Queue Length counter to help collaborate your suspicions. Most likely, if your are having backup or restore performance issues, it is because of an I/O bottleneck.
As the DBA, it will be your job to determine the I/O bottlenecks you may be experiencing and dealing with them appropriately. For example, the cause of slow backups or restores could be something as simple as a DTS job that is running at the same time, and could be fixed by rescheduled the job.
If you are using transactional replication, you may want to monitor the latency that it takes the Log Reader to move transactions from a database’s transaction log until it moves it to the distribution database, and also to monitor the latency it takes the Distributor Agent to move transactions from the distribution database to the subscriber database. The total of these two figures is the amount of time it takes a transaction to get from the publication database to the subscriber database.
The counters for these two processes are the: SQL Server Replication LogReader: Delivery Latency counter and the SQL Server Replication Dist.: Delivery Latency counter.
If you see a significant increase in the latency for either of these processes, this should be a signal to you to find out what new or different has happened to cause the increased latency.
A key counter to watch is the SQL Server Buffer Manager Object: Buffer Cache Hit Ratio. This indicates how often SQL Server goes to the buffer, not the hard disk, to get data. The higher this ratio, the less often SQL Server has to go to the hard disk to fetch data, and performance overall is boosted.
Unlike many of the other counters available for monitoring SQL Server, this counter averages the Buffer Cache Hit Ratio from the time the last instance of SQL Server was restarted. In other words, this counter is not a real-time measurement, but an average of all the days since SQL Server was last restarted. Because of this, if you really want to get an accurate record of what is happening in your Buffer Cache right now, you must stop and restart the SQL Server service, then letting SQL Server run several hours of normal activity before you check this figure (in order to get a good reading).
If you have not restarted SQL Server lately, then the Buffer Cache Hit Ratio figure you see may not be accurate for what is occurring now in your SQL Server, and it is possible that although your Buffer Cache Hit Ratio looks good, it may really, in fact, not be good, because of the way this counter averages this ratio over time.
In OLTP applications, this ratio should exceed 90-95%. If it doesn’t, then you need to add more RAM to your server to increase performance.
In OLAP applications, the ratio could be much less because of the nature of how OLAP works. In any case, more RAM should increase the performance of SQL Server OLAP activity.
Consider watching these two counters: SQLServer:Memory Manager: Total Server Memory (KB) and SQLServer:Memory Manager: Target Server Memory (KB). The first counter, SQLServer:Memory Manager: Total Server Memory (KB), tells you how much the mssqlserver service is currently using. This includes the total of the buffers committed to the SQL Server BPool and the OS buffers of the type “OS in Use”.
The second counter, SQLServer:Memory Manager: Target Server Memory (KB), tells you how much memory SQL Server would like to have in order to operate efficiently. This is based on the number of buffers reserved by SQL Server when it is first started up.
If, over time, the SQLServer:Memory Manager: Total Server Memory (KB) counter is less than the SQLServer:Memory Manager: Target Server Memory (KB) counter, then this means that SQL Server has enough memory to run efficiently. On the other hand, if the SQLServer:Memory Manager: Total Server Memory (KB) counter is more or equal than the SQLServer:Memory Manager: Target Server Memory (KB) counter, this indicates that SQL Server may be under memory pressure and could use access to more physical memory.
SQL Server performs faster and with less resources if it can retrieve data from the buffer cache instead of reading it from disk. In some cases, memory intensive operations can force data pages out of the cache before they ideally should be flushed out. This can occur if the buffer cache is not large enough and the memory intensive operation needs more buffer space to work with. When this happens, the data pages that were flushed out to make extra room must again be read from disk, hurting performance.
There are three different SQL Server counters that you can watch to help determine if your SQL Server is experiencing such a problem.
- SQL Server Buffer Mgr: Page Life Expectancy: This performance monitor counter tells you, on average, how long data pages are staying in the buffer. If this value gets below 300 seconds, this is a potential indication that your SQL Server could use more memory in order to boost performance.
- SQL Server Buffer Mgr: Lazy Writes/Sec: This counter tracks how many time a second that the Lazy Writer process is moving dirty pages from the buffer to disk in order to free up buffer space. Generally speaking, this should not be a high value, say more than 20 per second or so. Ideally, it should be close to zero. If it is zero, this indicates that your SQL Server’s buffer cache is plenty big and SQL Server doesn’t have to free up dirty pages, instead waiting for this to occur during regular checkpoints. If this value is high, then a need for more memory is indicated.
- SQL Server Buffer Mgr: Checkpoint Pages/Sec: When a checkpoint occurs, all dirty pages are written to disk. This is a normal procedure and will cause this counter to rise during the checkpoint process. What you don’t want to see is a high value for this counter over time. This can indicate that the checkpoint process is running more often than it should, which can use up valuable server resources. If this has a high figure (and this will vary from server to server), consider adding more RAM to reduce how often the checkpoint occurs, or consider increasing the “recovery interval” SQL Server configuration setting.
These performance monitor counters should be considered advanced and only used to “refine” a potential diagnosis of “not enough memory” for your SQL Server.