Query Execution Statistics

In my last article I talked about tracking performance of TSQL statements (http://www.sql-server-performance.com/articles/per/tsql_statement_performance_p1.aspx).   In that article I discussed a number of different ways to obtain performance information for your query statement or set of statements.  One of my methods I mentioned to obtain query statistics was to use the “sys.dm_exec_query_statistics” dynamic management view (DMV).   In this article I’m going expand on the usage of the “sys.dm_exec_query_stats” DMV.  I will also be exploring other DMVs and related dynamic management functions (DMFs) that can be used to gather additional performance information that SQL Server 2005 maintains.

Understanding DMV statistics
Before I start discussing how to gather performance statistics I want to review some basic information about DMV statistics.  With SQL Server 2005, Microsoft introduced a number of system views called DMVs.  These views allow you to probe SQL Server to determine the health, diagnosis problems, or review operational information of a SQL Server instance. 

DMV statistics are collected while SQL Server is running, and are reset every time SQL Server starts.  Statistics for certain DMV can also be reset when you drop and re-create their components.  This is true for objects like stored procedures, and tables.  For other DMV’s information can be reset by running DBCC commands.

When you use any DMV you need to keep in mind how long SQL Server has been collecting DMV information to determine how useful the data returned from the DMV might be.  If SQL Server has only been up for a short period of time then you might not want to use some DMV statistics because they don’t represent a relative sampling of the true work load that an instance might encounter.   Also SQL Server can only maintain so much information, so some information can be lost while SQL Server performs management activities.  So if SQL Server has been up for a long period of time there is a potential that some statistical information has been overwritten.

Therefore any time you use a DMV keep these points in mind while you review the information returned via SQL Server 2005 DMVs. Only make database or application code changes when you are confident the information obtained from the DMVs is accurate and complete.

Using  sys.dm_exec_query_stats DMV Statistics
The sys.dm_exec_query_stats DMV returns query statistics for individual TSQL statements within an instance of SQL Server 2005.  This DMV returns a number of useful statistics for TSQL statements run on an instance like, execution count, amount of CPU used, the number of I/O’s used, etc.   You can read SQL Server Books Online for more detailed information about all the available columns returned when you run this DMV.

When you select information from the sys.dm_exec_query_stats DMV it does return a lot of statistics, but it doesn’t give you the text of the SQL query associated with the statistics displayed.  To get the text of the SQL query associated with the statistics you need to join the sys.dm_exec_query_stats DVM record set with the columns returned from the sys.dm_exec_sql_text  dynamic management function (DMF), like so:

SELECT * FROM
  sys.dm_exec_query_stats
  CROSS APPLY
  sys.dm_exec_sql_text(sql_handle)

Here I used the CROSS APPLY operator to join the sys.dm_exec_query_stats DMV with the sys.dm_exec_sql_text  table-valued DMF.  The CROSS APPLY operator allows you to join a set with a function.  This operator uses a column value from  each row in the set on the left, and then uses that value as input to the function on the right.  A row is populated into the final result set containing columns from the left and columns returned from the function on the right any time the DMF function returns any information.    In this example the CROSS APPLY operator joins the row from the sys.dm_exec_query_stats with the output from the sys.dm_exec_sql_text DMF using the “sql_handle” value from a sys.dm_exec_query_stats row.  

Identifying Statements within a Batch
The SQL query text returned from sys.dm_exec_sql_text DMF could be a stored procedure, a single TSQL statement, or a series of statements.  This DMF returns what is commonly referred to as a TSQL batch.  So in reality the above SELECT statement only joins the statistics from a single statement (sql_handle), with the SQL batch it is associated with.  Therefore you need to do a little more to get the actual statement for which the query statistics apply.   To get the single statement associated with the query stats you need use the “offset” column values of sys.dm_exec_query_stats DMF.  The code below takes those offset values to return the single TSQL statement associated with each set of SQL query statistics:

SELECT SUBSTRING(text,
        — starting value for substring
        CASE WHEN statement_start_offset = 0
             OR statement_start_offset IS NULL
               THEN 1
               ELSE statement_start_offset/2 + 1 END,
        — ending value for substring
        CASE WHEN statement_end_offset = 0
          OR statement_end_offset = -1
          OR statement_end_offset IS NULL
               THEN LEN(text)
               ELSE statement_end_offset/2 END –
                   CASE WHEN statement_start_offset = 0
                          OR statement_start_offset IS NULL
                               THEN 1
                               ELSE statement_start_offset/2  END + 1
        )  AS TSQL,
        *
FROM sys.dm_exec_query_stats 
   CROSS APPLY sys.dm_exec_sql_text(sql_handle);

Here I used the “statement_start_offset” column value to determine where the first character in the TSQL statement would be in the SQL query text.  For a single statement batch this value is normally 1, but for multi statement batches this value would something greater than 1 for all statement but the first statement in the batch.  The “statement_end_offset” column is used to calculate the last character of the TSQL statement in the SQL query text string returned from the sys.dm_exec_sql_text DMF.

Determining DBID for Statement Stats
If you run either one of the TSQL statement above you will see that these TSQL statement returns query statistics, but yet it doesn’t consistently show you which “dbid” the query might be executed against.  It does show some DBIDs, but not all.  In a single database application environment that that information is not important.  But if you have a single instance of SQL Server supporting multiple applications you most likely will have many databases.  When this is the case you need to be able to identify the database for each statement, so you can then tune the queries with the worst performance related statistics for a given database.
To obtain the “dbid” for all rows returned from the sys.dm_exec_query_stats you need to use an additional DMV.   The “dbid” can be found by using information in the cached plan that is associated with the TSQL statement.  Here is an example of how to obtain those missing “dbid” for those TSQL statements by using cached plan information:

SELECT pa.value AS dbid,
       COALESCE(DB_NAME(st.dbid),
                DB_NAME(CAST(pa.value as int))+’*') AS DBNAME,
       SUBSTRING(text,
        — starting value for substring
        CASE WHEN statement_start_offset = 0
             OR statement_start_offset IS NULL
               THEN 1
               ELSE statement_start_offset/2 + 1 END,
        — ending value for substring
        CASE WHEN statement_end_offset = 0
          OR statement_end_offset = -1
          OR statement_end_offset IS NULL
               THEN LEN(text)
               ELSE statement_end_offset/2 END –
                   CASE WHEN statement_start_offset = 0
                          OR statement_start_offset IS NULL
                               THEN 1
                               ELSE statement_start_offset/2  END + 1
        )  AS TSQL,
        *
FROM sys.dm_exec_query_stats 
   CROSS APPLY sys.dm_exec_sql_text(sql_handle) st
   OUTER APPLY sys.dm_exec_plan_attributes(plan_handle) pa
    WHERE attribute = ‘dbid’   ;

In the above query I used the “dbid” attribute of the cached plan to identify the missing “dbid”.  The attributes of a plan are obtained by using the “sys.dm_exec_plan_attributes” DMF.  Each cached plan has many attributes so I obtained only the “dbid” attribute information by constraining the information when the attribute column is equal to the value “dbid”.  Notice I used the “OUTER APPLY” operator to join the query statistics information to the results of the sys.dm_exec_plan_attribute DMF.  The “OUTER APPLY” operator returns all the columns on the left side even if they don’t return a value from the function on the right.  So even if the “plan_handle” from the query statistics does not find a plan attribute the statistics will be returned.    I do this because in some cases plans are not cached.

If you inspect my code above you will see that I use the COALESCE function to determine what I will display for a DBNAME.  If the “dbid” is returned from the sys.dm_exec_sql_text  table-valued DMF, then I use that “dbid” to determine the database name using the DB_NAME function, otherwise I use the “dbid” value from the sys.dm_exec_plan_attribute DMF to get the database name.  Note that any time the sys.dm_exec_plan_attribute “dbid” is used to obtain the database name I append an “*” next to the name.  I do to identify database names that where obtained by using the cache plan “dbid” attribute.  Note that you can’t guarantee that the “dbid” from the cached plan is the actual database that is the TSQL statement was processed against.  The plan attribute “dbid” value is the database that was being “used” at the time the SQL query text was compiled into a cached plan.  Most likely this will be the appropriate database, but if you are using a three part naming convention for database objects then there is a possibility that you are using one database while executing SQL commands against a different database. 

There is one last thing that should be mentioned regarding the dbid’s returned using the above query, and that is the DB_NAME function might return a “NULL” value.  This happens when the dbid value is 32767.  This database id number is associated with a system database that is commonly called the “Resource” database.  This “Resource” database is not that well known, and it is an actual database that does exist on your system but you can’t see in SQL Server Management Studio.  It can be found by browsing your DATA directory and looking for mdf and ldf files that start with “mssqlsystemreource”.  The Resource database contains all the compiled system stored procedures and functions.  So to completely identify all the possible database names I’ve modified the above code to stuff in “Resource” when either one of the DB_NAME function calls return a NULL value:

SELECT pa.value AS dbid,
       COALESCE(DB_NAME(st.dbid),
                DB_NAME(CAST(pa.value as int))+’*',
                ‘Resource’) AS DBNAME,
       SUBSTRING(text,
        — starting value for substring
        CASE WHEN statement_start_offset = 0
             OR statement_start_offset IS NULL
               THEN 1
               ELSE statement_start_offset/2 + 1 END,
        — ending value for substring
        CASE WHEN statement_end_offset = 0
          OR statement_end_offset = -1
          OR statement_end_offset IS NULL
               THEN LEN(text)
               ELSE statement_end_offset/2 END –
                   CASE WHEN statement_start_offset = 0
                          OR statement_start_offset IS NULL
                               THEN 1
                               ELSE statement_start_offset/2  END + 1
        )  AS TSQL,
        *
FROM sys.dm_exec_query_stats 
   CROSS APPLY sys.dm_exec_sql_text(sql_handle) st
   OUTER APPLY sys.dm_exec_plan_attributes(plan_handle) pa
    WHERE attribute = ‘dbid’   ;

Meaningful Statistics
So far each one of my queries displayed lots of different statistics.  These are great, but when you are doing performance analysis you want to be more specific, and order the query results to quickly identify the worst performing queries.  In this section I will refine the above code to provide only the top offenders of CPU usage, IO, and Duration.

The “total_worker_time” column of the sys.dm_exec_query_stats identifies the total CPU used by the statement.  So we can use this column to determine the worst CPU usage TSQL statements.  Since this column is an accumulative column that summarizes the CPU used each time the statement was executed we need to perform a calculation to determine the average CPU used for each statement execution.  You can use the code below to find the TOP 10 commands that use the most CPU for a single statement execution:

SELECT TOP 10 COALESCE(DB_NAME(st.dbid),
                DB_NAME(CAST(pa.value as int))+’*',
                ‘Resource’) AS DBNAME,
       SUBSTRING(text,
        — starting value for substring
        CASE WHEN statement_start_offset = 0
             OR statement_start_offset IS NULL
               THEN 1
               ELSE statement_start_offset/2 + 1 END,
        — ending value for substring
        CASE WHEN statement_end_offset = 0
          OR statement_end_offset = -1
          OR statement_end_offset IS NULL
               THEN LEN(text)
               ELSE statement_end_offset/2 END –
                   CASE WHEN statement_start_offset = 0
                          OR statement_start_offset IS NULL
                               THEN 1
                               ELSE statement_start_offset/2  END + 1
        )  AS TSQL,
        SUBSTRING(CONVERT(CHAR(23),
                  DATEADD(ms,(total_worker_time/execution_count)/1000,0),
                          121),
                  12,23)  AVG_CPU_MS
FROM sys.dm_exec_query_stats 
   CROSS APPLY sys.dm_exec_sql_text(sql_handle) st
   OUTER APPLY sys.dm_exec_plan_attributes(plan_handle) pa
    WHERE attribute = ‘dbid’ 
   ORDER BY AVG_CPU_MS DESC ;

The sys.dm_exec_query_stats DMF provides a number of different columns related to I/O statistics.  You can use these columns to determine which TSQL statements used the most I/O.  Here is some code that will show you the commands that used the most logical read I/O per statement execution:

SELECT TOP 10 COALESCE(DB_NAME(st.dbid),
                DB_NAME(CAST(pa.value as int))+’*',
                ‘Resource’) AS DBNAME,
       SUBSTRING(text,
        — starting value for substring
        CASE WHEN statement_start_offset = 0
             OR statement_start_offset IS NULL
               THEN 1
               ELSE statement_start_offset/2 + 1 END,
        — ending value for substring
        CASE WHEN statement_end_offset = 0
          OR statement_end_offset = -1
          OR statement_end_offset IS NULL
               THEN LEN(text)
               ELSE statement_end_offset/2 END –
                   CASE WHEN statement_start_offset = 0
                          OR statement_start_offset IS NULL
                               THEN 1
                               ELSE statement_start_offset/2  END + 1
        )  AS TSQL,
       total_logical_reads/execution_count AS AVG_LOGICAL_READS
FROM sys.dm_exec_query_stats 
   CROSS APPLY sys.dm_exec_sql_text(sql_handle) st
   OUTER APPLY sys.dm_exec_plan_attributes(plan_handle) pa
    WHERE attribute = ‘dbid’ 
   ORDER BY AVG_LOGICAL_READS DESC ;

For my last meaningful statistical gathering query let me show you how to find the TOP 10 commands that took the longest time to run.  By longest time I mean duration.  To do this I use the “total_elapsed_time” column of the sys.dm_exec_query_stats DMV.  Here is the code:

 SELECT TOP 10 COALESCE(DB_NAME(st.dbid),
                DB_NAME(CAST(pa.value as int))+’*',
                ‘Resource’) AS DBNAME,
       SUBSTRING(text,
        — starting value for substring
        CASE WHEN statement_start_offset = 0
             OR statement_start_offset IS NULL
               THEN 1
               ELSE statement_start_offset/2 + 1 END,
        — ending value for substring
        CASE WHEN statement_end_offset = 0
          OR statement_end_offset = -1
          OR statement_end_offset IS NULL
               THEN LEN(text)
               ELSE statement_end_offset/2 END –
                   CASE WHEN statement_start_offset = 0
                          OR statement_start_offset IS NULL
                               THEN 1
                               ELSE statement_start_offset/2  END + 1
        )  AS TSQL,
               SUBSTRING(CONVERT(CHAR(23),
                  DATEADD(ms,(total_elapsed_time/execution_count)/1000,0),
                          121),
                  12,23) AS AVG_ELAPSED_TIME
FROM sys.dm_exec_query_stats 
   CROSS APPLY sys.dm_exec_sql_text(sql_handle) st
   OUTER APPLY sys.dm_exec_plan_attributes(plan_handle) pa
    WHERE attribute = ‘dbid’ 
   ORDER BY AVG_ELAPSED_TIME DESC ;

Conclusion
SQL Server 2005 provides you with a number of new DMVs and DMFs to help you with gathering information about what is going on with SQL Server.  The sys.dm_exec_query_stats DMV is a lovely little gem that allows you to return statistics for TSQL statements.  By using this DMV and joining it with other DMV’s and DMFs you can develop a number of different  TSQL queries to quickly identify your worsst performing  application queries.  I would suggest you take the information I’ve given you in this article and develop yourself a set of TSQL performance monitoring queries to monitor and develop performance baselines for your SQL Server 2005 environment.




Related Articles :

  • No Related Articles Found

No comments yet... Be the first to leave a reply!

Software Reviews | Book Reviews | FAQs | Tips | Articles | Performance Tuning | Audit | BI | Clustering | Developer | Reporting | DBA | ASP.NET Ado | Views tips | | Developer FAQs | Replication Tips | OS Tips | Misc Tips | Index Tuning Tips | Hints Tips | High Availability Tips | Hardware Tips | ETL Tips | Components Tips | Configuration Tips | App Dev Tips | OLAP Tips | Admin Tips | Software Reviews | Error | Clustering FAQs | Performance Tuning FAQs | DBA FAQs |