Monitoring Transactional Replication – The Distribution Queue
Before implementing the procedure by your monitoring tool it is recommended that you are familiar with the environment so that false alarms are minimized.
I recommend starting with a single article that is highly active, that is an article that gets frequent modifications (INSERT, UPDATE or DELETE). This allows you to come up with a figure that can be used as a threshold indicating the number of commands that you do not expect to be reached at the distribution queue and having reached that threshold you will know something is not working as expected.
If you find it hard to come up with a correct representing figure I suggest you gather some statistics first by running the procedure in the form of INSERT…EXEC every 5 minutes or so for a week or two and save the results to a table which will allow you to see the MIN, MAX and AVG number of transactions in the queue and also you will be able to see if the pick numbers are related to a specific time in the day.
Once you figure out your correct threshold you can execute the stored procedure every 5 minutes or so from your monitoring tool while passing the @threshold parameter a value that you do not expect to be reached as follows:
EXEC distribution.dbo.MonitorReplicationArticleCommands @article = N’Spot’, @publication = N’FX’, @threshold = 3000;
The above example is based on the fact that under normal conditions the number of commands in the queue for the given article is around 1000. Since the article is frequently modified If there is a problem it is only a matter of a short time until the threshold is crossed and some action is triggered. I do not use a lower threshold to avoid false alarms.
Having done that you now have the infrastructure needed to be notified if a specific article has a queue in the distribution database exceeding your predefined threshold.
If the procedure does not return an empty result set a notification email should be sent and you should perform a manual check to understand the reason for the growth in queue or alternately, automate that check by executing a set of actions from your monitoring tool.
There are various reasons for the queue to grow and I may delve into that in greater depth in the future but for now I will mention the most common reasons I think of at this time.
1. A wait for resource (a blocking lock) at the subscriber slows down or prevents the commands to be applied at the subscription database.
2. The network connectivity between the distributor and the subscriber is slow or down for some time.
3. A high number of DML commands applied at the publisher within a short time period.
Note that even if you publish several articles but only monitor the most suitable candidate in this way, many of the problems detected via this article can be relevant for the entire publication. In the above three reasons that I mention, number 2 is a common source of failure for all articles.
To support this query and other queries I added several indexes on tables MSdistribution_history, MSpublications and MSarticles in the distribution database (see attached script Distribution_Indexes.sql).
Note that I do not add indexes on MSrepl_commands table even though we can sure benefit from some indexes on that table while reading data but these typically lead to deadlocks when modifying data.
In addition to our discussion so far I provide a stored procedure named MonitorReplicationCommands (built on the code used in sp_tablesize) that returns the number of rows and size in MB of a any given table that is passed as a parameter to the @table input variable received by the procedure.
It allows you to monitor MSrepl_commands table which is the primary table along with MSrepl_transactions that serve as the distribution queue for commands to be delivered to subscribers.
Monitoring the size of this table provides a more generic view on the queue and can be used in the same manner as the procedure MonitorReplicationArticleCommands but at the entire queue level instead of the table level.
Pages: 1 2