Monitor SQL Server Replication Jobs

The Replication
infrastructure in SQL Server is implemented using SQL Server Agent to execute the various components involved in the form of a
job (e.g. LogReader agent job, Distribution agent job, Merge agent job)

SQL Server jobs
execute a binary executable file which is basically C++ code.

You can download all the scripts for this article here

SQL Server Job Schedules

By default
each job has only one schedule that is set to Start automatically when SQL
Server Agent starts
. This schedule ensures that when ever the SQL Server
Agent service is started all the replication components are also put into
action. This is OK and makes sense but there is one problem with this default configuration
that needs improvement  -  if for any reason one of the components fails it remains
down in a stopped state.  

Unless you
monitor the status of each component you will typically get to know about such
a failure from a customer complaint as a result of missing data or data that is
not up to date at the subscriber level. Furthermore, having any of these
components in a stopped state can lead to more severe problems if not corrected
within a short time.

The action
required to improve on this default settings is in fact very simple. Adding a
second schedule that is set as a Daily Reoccurring schedule which runs
every 1 minute does the trick. SQL Server Agent’s scheduler module knows how to
handle overlapping schedules so if the job is already being executed by another
schedule it will not get executed again at the same time. So, in the event of a
failure the failed job remains down for at most 60 seconds.

Many DBAs are
not aware of this capability and so search for more complex solutions such as
having an additional dedicated job running an external code in VBS or another
scripting language that detects replication jobs in a stopped state and starts
them but there is no need to seek such external solutions when what is needed
can be accomplished by T-SQL code.

SQL Server Jobs Status

In addition
to the 1 minute schedule we also want to ensure that key components in the
replication are enabled so I can search for those components by their Category,
and set their status to enabled in case they are disabled, by executing the
stored procedure MonitorEnableReplicationAgents.

The jobs that
I typically have handled are listed below but you may want to extend this, so
below is the query to return all jobs along with their category.

SELECT
category_id, name FROM
msdb.dbo.syscategories
ORDER BY
category_id;

  • Distribution
    Cleanup
  • LogReader Agent
  • Distribution Agent

Snapshot Agent Jobs

By default
when a publication is created, a snapshot agent job also gets created with a
daily schedule. I see more organizations where the snapshot agent job does not
need to be executed automatically by the SQL Server Agent  scheduler than organizations
who   need a new snapshot generated automatically. To assure this setting is in
place I created the stored procedure MonitorSnapshotAgentsSchedules which
disables snapshot agent jobs and also deletes the job schedule.

It is worth
mentioning that when the publication property immediate_sync is turned
off then the snapshot files are not created when the Snapshot agent is executed
by the job. You control this property when the publication is created with a
parameter called @immediate_sync passed to sp_addpublication
and for an existing publication you can use sp_changepublication.

Implementation

The
scripts assume the existence of a database named PerfDB.

Steps:

  1. Run the scripts to
    create the stored procedures in the PerfDB database.
  2. Create a job that
    executes the stored procedures every hour.

-- Verify that the 1_Minute schedule exists.
EXEC PerfDB.dbo.MonitorReplicationAgentsSchedules @CategoryId = 10; /* Distribution */ 
EXEC PerfDB.dbo.MonitorReplicationAgentsSchedules @CategoryId = 13; /* LogReader     */
 
 
-- Verify all replication agents are enabled.
EXEC PerfDB.dbo.MonitorEnableReplicationAgents @CategoryId = 10; /* Distribution */ 
EXEC PerfDB.dbo.MonitorEnableReplicationAgents @CategoryId = 13; /* LogReader   */
EXEC PerfDB.dbo.MonitorEnableReplicationAgents @CategoryId = 11; /* Distribution clean up   */
 
 
-- Verify that Snapshot agents are disabled and have no schedule
EXEC PerfDB.dbo.MonitorSnapshotAgentsSchedules;

Want to read
more of about replication? Check at my replication posts
at my blog.




Array

One Response to “Monitor SQL Server Replication Jobs”

  1. Hi Yaniv,
    Good article.
    Wouldn’t it be right to filter out schedules for snapshot publication jobs?

    SELECT distinct sc.schedule_id, sc.name
    FROM dbo.MSpublications p
    INNER JOIN MSsnapshot_agents sa on sa.publisher_id = p.publisher_id and sa.publisher_db = p.publisher_db
    INNER JOIN msdb.dbo.sysjobs j on j.job_id = sa.job_id
    INNER JOIN msdb..sysjobschedules s ON s.job_id = j.job_id
    INNER JOIN msdb..sysschedules sc ON sc.schedule_id = s.schedule_id
    WHERE j.category_id = 15 and p.publication_type 1 –exclude snapshot publications;

Software Reviews | Book Reviews | FAQs | Tips | Articles | Performance Tuning | Audit | BI | Clustering | Developer | Reporting | DBA | ASP.NET Ado | Views tips | | Developer FAQs | Replication Tips | OS Tips | Misc Tips | Index Tuning Tips | Hints Tips | High Availability Tips | Hardware Tips | ETL Tips | Components Tips | Configuration Tips | App Dev Tips | OLAP Tips | Admin Tips | Software Reviews | Error | Clustering FAQs | Performance Tuning FAQs | DBA FAQs |