Creating Your Own Custom Data Collections
In my last couple of articles (http://sql-server-performance.com/articles/per/Management_Data_Warehouse_p1.aspx and http://www.sql-server-performance.com/articles/per/System_Data_Collection_Reports_p1.aspx) I discussed the Management Data Warehouse (MDW) a new feature that collects and manages statistics that came out with SQL Server 2008. In those articles I described how to install the components that make up the MDW, the system data collections, and the history reports that are available with the MDW. In this article I’m going to expand on the MDW topic and discuss how you can extend the data collection capabilities of the MDW by creating your own custom data collections.
What is the MDW?
For a quick review for those that have not read my other articles let me describe the basic MDW concept. The MDW is a database that contains data populated and managed by different Data Collections. A Data Collection defines the specific data that will be stored in the MDW, the method of collecting that data, and a schedule when the data will be collected and purged. For each Data Collection a series of SSIS packages and SQL Agent jobs are built to collect and manage the data related to the collection. The MDW is more than just a database it is the whole process of collecting, managing and stored data that can be used to monitor your SQL Server environment. Also included with the MDW are some Reporting Services reports that provide a wealth of information related to the data collected using the system data collections.
Defining your own Custom Data Collection
When you think of defining your own custom Data Collection you probably envision creating a collection using point and click within SQL Server Management Studio (SSMS). But using the SSMS GUI tool to create a custom Data Collection is not possible. For some reason Microsoft only allows you to define a custom Data Collection using a T-SQL script. Once the Data Collection is defined some modifications can be done using the GUI interface within SSMS.
To create your own custom Data Collection you will need to first build a script. You can write these scripts by hand if you’d like. Or you can take the cloning approach and modify a script built from a system Data Collection to meet your custom data collection needs. SSMS allows you to script out the system data collections, as well as any custom data collection.
To demonstrate how to collect your own data using the Data Collection process I need to first identify some data to collect. For the purpose of this article I want to show the growth rate for tables, based on row counts over time. In order to do this I would need to collect the number of rows in each table and in each database daily over time. Also let me add an additional requirement that my data is will be periodically purged. I want to retain row counts for the most recent 6 months (180 days).
The first step to collecting this information is to determine which data collector type I will use to collect this information. SQL Server 2008 provides 4 different data collector types, but basically only three make any sense to use. They are: T-SQL Query, SQL Trace and Performance Counters. The T-SQL Query collector type is used to collect statistics using a T-SQL batch. SQL Trace allows you to collect SQL Server Profiler trace information. Using the Performance Counter collector type you gather operating system performance counters like those you can see when you use System Monitor (PERFMON).
For my demo of creating a user defined Data Collection I’m going to use the T-SQL Query data collector type. This collector type allows me to write a T-SQL SELECT statement to define the information I want to collect. Below is the T-SQL statement I plan to have my custom data collection use:
SELECT OBJECT_NAME(o.object_id) ObjectName
, SCHEMA_NAME(o.schema_id) SchemaName
, SUM(p.Rows) NumOfRows
FROM sys.partitions p
JOIN sys.objects o ON p.object_id = o.object_id
WHERE Index_ID < 2 AND o.type = ‘U’
GROUP BY SCHEMA_NAME(o.schema_id), OBJECT_NAME(o.object_id)
ORDER BY SCHEMA_NAME(o.schema_id), OBJECT_NAME(o.object_id);
This script will return the number of rows for each table in whatever database context it is run. By incorporating this T-SQL statement into a Data Collection it will be able to gather row counts for every table in all my databases.
The next step is either to code a script by hand that uses this T-SQL code to create my Data Collection, or clone an existing system Data Collection that uses a T-SQL Query Data Collector type. As luck would have it the “Disk Usage” system Data Collection uses the “T-SQL Query“ Data Collector type. To script this system Data Collection I would just right click on the Data Collection and use the “CREATE TO…” menu item to generate a script.