SQL Server Monitoring in 8 Steps: Lessons From the Field

“To count is to know.”

SQL Server database systems are more than ever before being chosen as the preferred backend database solution for large business’s critical systems. And SQL Server richly deserves this status. As a result of this, more users than ever before are blocked in their daily activity when the database is not available, or in a bad shape. It’s your responsibility as a DBA to know the health of your SQL Server so you can take proactive action to minimize chances anything bad happens to your system. But don’t panic. If you have the right tools in place, your system will tell you how it feels and will warn you well before it reaches the alarm phase. Through its performance behavior, you can spot potential problems not yet visible to the end-user. If, for example, for no reason response times of a certain query slow down from 100 to 500 ms, the end-user will not be alarmed, but you should.

I’m not only addressing those DBA’s among you who are responsible for SQL Server systems with thousands of users connected. If your system goes down with only 10 users connected, they’ll haunt you for not delivering what they expect you to.

I don’t want to frighten you or put unnecessary pressure on you, because I consider being a SQL Server DBA is probably the greatest jobs there is — next to being a professional football player — but you’d better feel comfortable with the system you are responsible for. This can only be achieved if your system tells you what is going on inside. That is why you should have reliable monitoring tools in place, starting even before going live with a new piece of software. You will have to convince your manager that investing in monitoring this early in a project’s lifecycle is no luxury.

This document is the second in a series of four where I want to share my findings on baselining, monitoring, stress testing and performance tuning with you. This article builds on the foundations of my previous article that you can find here.

In this article, I want to give you a screenplay of how to set up a monitoring environment step-by-step; starting with how you can retrieve valuable information about your system in a ‘nice and fast” way, to a more advanced scenario suitable for long-term monitoring. However, it is my intension to keep things as simple as possible. This article has a lot of code you can use to start playing with monitoring yourself right away without the need of complicated external tools or extra hardware. Also, in this article, I won’t describe fancy third party tools, but instead concentrate on Microsoft out-of-the-box tools, like Performance Monitor (PerfMon), SQL Profiler, and Excel; tools everyone should be familiar with.

Step1: Nice-and-fast: Master.dbo.sysperfinfo

The master.dbo.sysperfinfo system table contains all the SQL Server internal performance counters that are also displayed by PerfMon. Using sysperfinfo has a number of pros, but also some cons. Pros:

Sysperfinfo is available online without the need to set up a monitoring environment. When you expect to see fluctuations in your data, but they are not visible on your regular monitoring system because the polling interval is not tight enough, you can start a script retrieving performance data from sysperfinfo at a smaller polling interval for a short period of time. (see script 1.1)
As opposed to Perfmon, no OS permissions are required to use sysperfinfo; you only need read access to the master database. OS permissions are sometimes hard to get when you don’t get along very well with your company’s sysadmin team or when system support is outsourced. If you are a developer and the bad guys from the operations DBA team refuse you access to their monitoring system (or they don’t have any!) but you need detailed SQL Server statistics to tune your queries, you might ask them polite to get read access to the master database. Good luck!

Cons:

Sysperfinfo covers only SQL Server counters. There are no system counters like CPU, memory, or disk I/O statistics available.
Most counters are cumulative. If you need to store them for long-term analysis afterwards, a staging environment is required. (see scripts 1.2-1.4)

Script 1.1 shows you how to retrieve the ‘Log Bytes Flushed/sec’ counter from sysperfinfo over a 3 minute period using a 5 second interval. Something you can not do for all counter objects for a longer period of time. (You will need to replace the test database name I used in the following script.)

–Script 1.1:
CREATE TABLE #Writes(ts DATETIME, LogBytesFlushes BIGINT)
DECLARE @lbf INT, @lbfold INT, @starttime DATETIME
SET @starttime = getdate() SELECT @lbfold=cntr_value FROM master..sysperfinfo WHERE counter_name = ‘Log Bytes Flushed/sec’ AND instance_name = ‘test’
WAITFOR DELAY ’00:00:05′ WHILE getdate() < dateadd(mi,3,@starttime)
BEGIN
SELECT @lbf=cntr_value FROM master..sysperfinfo WHERE counter_name = ‘Log Bytes Flushed/sec’ AND instance_name = ‘test’
INSERT INTO #writes (ts,LogBytesFlushes) VALUES (getdate(), @lbf – @lbfold)
SET @lbfold = @lbf
WAITFOR DELAY ’00:00:05′
END SELECT * FROM #writes
Script 1.2 creates all tables required for your staging environment.
–Script 1.2:
CREATE TABLE dbo.wt_current_sysperfinfo (
InsertTime DATETIME NULL CONSTRAINT DF_wt_current_sysperfinfo_InsertTime DEFAULT (getdate()),
object_name NCHAR (128) NOT NULL ,
counter_name NCHAR (128) NOT NULL ,
instance_name NCHAR (128) NULL ,
cntr_value INT NOT NULL ,
cntr_type INT NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE dbo.wt_previous_sysperfinfo (
InsertTime DATETIME NULL CONSTRAINT DF_wt_previous_sysperfinfo_InsertTime DEFAULT (getdate()),
object_name NCHAR (128) NOT NULL ,
counter_name NCHAR (128) NOT NULL ,
instance_name NCHAR (128) NULL ,
cntr_value INT NOT NULL ,
cntr_type INT NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE dbo.tbl_log_sysperfinfo(
InsertTime datetime NULL CONSTRAINT DF_tbl_log_sysperfinfo_InsertTime DEFAULT (getdate()),
[Batch Requests/sec] FLOAT NULL,
[Buffer cache hit ratio] FLOAT NULL,
[Page life expectancy] FLOAT NULL,
[User Connections] INT NULL
) ON [PRIMARY] GO

Continues…

Leave a comment Cancel reply