This document is the first in a series of four where I want to cover subjects as baselining (this article), monitoring, stress testing, and the interpretation of performance behavior (articles in the pipeline).
These will not be theoretical essays, but lessons from the field. Look on it as a collection of features that I found seldom, or not at all covered in other articles. Where possible, I want to give concrete answers to questions like:
What set of counters must be logged?
What should my monitoring architecture look like?
Is it bad to have a value x for counter y?
It is my intention to give you a list of tips — building upon BOL, not repeating it — which will be useful when you establish your monitoring environment from which your baselines will be produced. Storing and analyzing logged information over an extended period of time is the take-off point of any baselining scenario.
In a second article, I will go deeper into detail on how to set up online and analytical system monitoring and alerting. A third article will cover dos and don’ts on how to set up a stress test environment The last article will discuss how to interpret the results of our monitoring processes.
For all four articles, I’ll stick with Microsoft out-of-the-box tools, like Performance Monitor (PerfMon) and SQL Profiler, tools everyone is familiar with. You’ll be surprised to see the real power that comes with them.
I don’t pretend for this list to be complete, or the way to go. Some of these tips are just nice-to-know; others are indispensable in my eyes. Most important for me is that you don’t lose time figuring out the things I have already spent time on.
In this article, I won’t bore you with an exposition on what baselining means and why you should monitor. I’m convinced everybody can find some definition on the Internet, and I hope everyone is convinced that baselining and monitoring is essential for all your production SQL Server systems.
When I get called for a ‘performance’ intervention, once on site, I notice too often that no decent logging of system behavior is available, which makes it more difficult to troubleshoot performance issues. I’ll give you some practical tips you can use to set up monitoring. Baselines will be produced from these monitoring results.
Remember: To count is to know.
This document is applicable to SQL 2000, as well as SQL 2005, unless specifically mentioned.
Tip 1: Set Up Your Monitoring System
We will be concentrating on two Microsoft out-of-the-box tools: Performance Monitor (PerfMon) and SQL Profiler.
Of course, at some point in time these tools will show there limitations and other will come into the picture. But for the scope of this set of articles, and for a first time intervention at a client’s site who has performance issues, PerfMon and SQL Profiler will often be your only companions.
These tools also have a number of advantages compared to third-party monitoring tools:
They are widely accepted in Microsoft environments, and that is what we are focusing on. The client is often not keen on installing and using software on its production environment they aren’t familiar with.
Third-party monitoring tools often have features I’m not interested in (e.g. page splits in msdb), but nevertheless generate additional overhead.
Third-party monitoring tools often lack flexibility, e.g., in what can be done with logged data.
I prefer data to be accessible, so I can integrate them in my own tools the way I want to, and not in some vendor proprietary black box. But flexibility has its price.
To use PerfMon and SQL Profiler, you ideally should use a dedicated monitoring server with sufficient disk capacity to save resources on you production servers (>10 GB per system monitored on condition that you have archiving in place). This way one central system can trace multiple remote systems, storing its data centrally.
Collect PerfMon Counters
One exception to this rule is the way PerfMon should collect its data. To avoid making expensive DCOM call’s to your database production system, you can run PerfMon locally on the DB server you want to monitor. If you think this generates too much overhead on your system, check all counters with Instance Name = smlogsvc (this is the service responsible for PerfMon). My experience tells me it’s not as bad as some people say.
A default interval of 15 seconds can be retained in a first phase. A detailed set of counters to be included is listed in tip 5. Results can better be redirected to your monitoring server to limit disk I/O pressure. Here, check network counters (bandwidth).