Monitoring Transactional Replication – The Distribution Queue

Note : The scripts for this article can be downloaded


here
.

Monitoring
the distribution queue is an important link in the chain of transactions
delivery over a replicated topology. A chain that includes the log reader agent
scanning the published database’s transaction log for commands to be moved to
the distribution database from which a distribution agent delivers the commands
to the subscription database typically located on the subscriber server.

In an
upcoming articles I will be examining other key links in that chain but
for now let’s focus on the distribution database and see what information we
can get there and how we can use that information for monitoring transactional
replication.

The bellow SQL
statement returns valuable information on the current status of the distribution
database in terms of commands delivery. It allows us to see the number of commands
delivered and the number of commands not yet delivered at the article level per
each subscriber.

USE distribution;

SET NOCOUNT ON;

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

SELECT DISTINCT

       @@SERVERNAME

      –,a.article_id

      ,a.Article

      ,p.Publication

      ,SUBSTRING(agents.[name], 16, 35) AS [Name]

      –,s.agent_id

      ,s.UndelivCmdsInDistDB

      ,s.DelivCmdsInDistDB 

–,UndelivCmdsInDistDB + DelivCmdsInDistDB  AS TaotalTrans

FROM dbo.MSdistribution_status AS s

INNER JOIN dbo.MSdistribution_agents AS agents ON agents.[id] = s.agent_id

INNER JOIN dbo.MSpublications AS p ON p.publication = agents.publication

INNER JOIN dbo.MSarticles AS a ON a.article_id = s.article_id and p.publication_id = a.publication_id

WHERE 1=1 AND s.UndelivCmdsInDistDB <> 0 AND agents.subscriber_db NOT LIKE ‘virtual’

–AND p.Publisher_db = ”

–AND a.Article LIKE  ”

–AND p.Publication IN (”)

–AND s.agent_id NOT IN ()

ORDER BY s.UndelivCmdsInDistDB DESC

OPTION(RECOMPILE);

This allows us to see if a specific article is slow on delivery per a
given subscriber or to all subscribers (if are more than one subscriber defined).

The MonitorReplicationArticleCommands
stored procedure is built on top of the above SELECT statement and is designed
to be executed by a monitoring tool.

The
procedure accepts three input parameters: @article, @publication and @threshold
and returns the number of commands that had crossed the predefined threshold for
the specific article and had not yet been delivered to the subscriber.

Before implementing
the procedure by your monitoring tool it is recommended that you are familiar
with the environment so that false alarms are minimized.

I recommend
starting with a single article that is highly active, that is an article that gets
frequent modifications (INSERT, UPDATE or DELETE). This allows you to come up
with a figure that can be used as a threshold indicating the number of commands
that you do not expect to be reached at the distribution queue and having
reached that threshold you will know something is not working as expected.

If you find
it hard to come up with a correct representing figure I suggest you gather some
statistics first by running the procedure in the form of INSERT…EXEC every 5
minutes or so for a week or two and save the results to a table which will
allow you to see the MIN, MAX and AVG number of transactions in the queue and
also you will be able to see if the pick numbers are related to a specific time
in the day.

Once you
figure out your correct threshold you can execute the stored procedure every 5
minutes or so from your monitoring tool while passing the @threshold parameter
a value that you do not expect to be reached as follows:

EXEC distribution.dbo.MonitorReplicationArticleCommands @article = N’Spot’, @publication = N’FX’, @threshold = 3000;

The above
example is based on the fact that under normal conditions the number of commands
in the queue for the given article is around 1000. Since the article is
frequently modified If there is a problem it is only a matter of a short time
until the threshold is crossed and some action is triggered. I do not use a
lower threshold to avoid false alarms.

Having done
that you now have the infrastructure needed to be notified if a specific
article has a queue in the distribution database exceeding your predefined
threshold.

If the
procedure does not return an empty result set a notification email should be
sent and you should perform a manual check to understand the reason for the
growth in queue or alternately, automate that check by executing a set of
actions from your monitoring tool.

There are various
reasons for the queue to grow and I may delve into that in greater depth in the
future but for now I will mention the most common reasons I think of at this
time.

1. A wait
for resource (a blocking lock) at the subscriber slows down or prevents the
commands to be applied at the subscription database.

2. The
network connectivity between the distributor and the subscriber is slow or down
for some time.

3. A high
number of DML commands applied at the publisher within a short time period.

Note that
even if you publish several articles but only monitor the most suitable
candidate in this way, many of the problems detected via this article can be
relevant for the entire publication. In the above three reasons that I mention,
number 2 is a common source of failure for all articles.

To support
this query and other queries I added several indexes on tables MSdistribution_history,
MSpublications and MSarticles in the distribution database (see
attached script Distribution_Indexes.sql).

Note that I
do not add indexes on MSrepl_commands table even though we can sure
benefit from some indexes on that table while reading data but these typically
lead to deadlocks when modifying data.

In addition
to our discussion so far I provide a stored procedure named MonitorReplicationCommands
(built on the code used in sp_tablesize)
that returns the number of rows and size in MB of a any given table that is
passed as a parameter to the @table input variable received by the procedure.

It allows
you to monitor MSrepl_commands table which is the primary table along
with MSrepl_transactions that serve as the distribution queue for commands to
be delivered to subscribers.

Monitoring
the size of this table provides a more generic view on the queue and can be
used in the same manner as the procedure MonitorReplicationArticleCommands
but at the entire queue level instead of the table level.




Array

No comments yet... Be the first to leave a reply!

Software Reviews | Book Reviews | FAQs | Tips | Articles | Performance Tuning | Audit | BI | Clustering | Developer | Reporting | DBA | ASP.NET Ado | Views tips | | Developer FAQs | Replication Tips | OS Tips | Misc Tips | Index Tuning Tips | Hints Tips | High Availability Tips | Hardware Tips | ETL Tips | Components Tips | Configuration Tips | App Dev Tips | OLAP Tips | Admin Tips | Software Reviews | Error | Clustering FAQs | Performance Tuning FAQs | DBA FAQs |