Transactional Replication – Unable to Replicate Data onto the Subscriber – A Practical Example

Scenario: A couple of days back, I got a call from one of my users saying that the Data is not being replicated properly onto the Production Reporting Server from the Production OLTP server.

We have Transactional Replication scheduled for every 15 minutes between our OLTP and the Reporting server.

Approach:

I started investigating this issue by first logging on to the Production OLTP server and checking the following parameters:

  1. I explored the Activity Monitor in SSMS and checked whether there is any sleeping process which has consumed the majority of the CPU or I/O but I found no issues here.
  2. I then examined the long running queries using the Standard Reports present in SSMS and found that the CPU utilization for those queries was not very high.

  3. I then looked at the status of the Replication Log Reader agent and found that it was capturing the Replicated Commands. The message viewed in the Log Reader agent was as below:
  4. 4 transactions with 75 commands were generated. 
  5. After performing Step 3, I looked at the synchronization status of the subscriber and found that there was an issue with it. Below is the message which was shown in the synchronization status:
  6. The process is running and is waiting for a response from the server. 
  7. I then looked for any blocking processes on the OLTP server where the publisher database is hosted. To do this, I executed the below T-SQL query against the master database on the OLTP server.
  8. Select * from sysprocesses
  9. After executing this query, I found that under the column named blocked all the values were 0 which meant that there was no blocking on the OLTP server.
  10. I then investigated whether the distib.exe file on the OLTP server was causing any performance issues but noted that were no issues with it.
  11. In addition, there were no issues with respect to the N/W speed. Both the OLTP as and Reporting servers were located in the same Data Center.

After performing all the above steps, I was 100% sure that there were no issues on the OLTP server and the issue was almost certainly on the Reporting server.

I then logged on to the Reporting Server and performed the following checks:

  1. I first checked the synchronization status of the subscriber and found that it wasn’t receiving any data and the message shown was:
  2. The process is
    running and is waiting for a response from the server.
  3. I then explored the Activity Monitor and found that there were a few processes in which were sleeping and whose CPU as well as IO utilization were high, I therefore kill these processes.
  4. After performing the step 2, I then had a look at the Long Running queries report under the Standard Reports in SSMS and found no issues.
  5. I then decided to have a look at the blocking by executing the below query against the master database.
  6. Select * from sysprocesses
  7. Once the above query was executed, I noticed that under the blocked column there were three values named 65,61 and 60 which were the SP IDs of the blocking processes. On further investigation it was found that these were the system processes. I decided to KILL them by issuing the following command against the master database.
    KILL 60
    KILL 61
    KILL 65
    
  8. After executing this query, I again executed the below query against the master database to check whether there was still any blocking.
  9. Select * from sysprocesses
  10. Once the above query got executed under the column named blocked all the values were 0 which indicated that there was no blocking.
  11. I then had a look at the synchronization status of the subscriber and found that the data had started synchronizing well. The message viewed were as follows:
  12. 110 Transactions with 2295 Commands were delivered.
    57 Transactions with 8767 commands were delivered.
  13. I then had a look at the Replication Activity Monitor and found that the latency period shown was 8 hours and 57 minutes which meant that the data last was replicated 8 hours and 57 minutes ago. Once the synchronization started well it just took around 50 minutes for both the servers to get back into sync.

Thanks for taking the time to read this, please let me know if you have any questions or comments. My sincere thanks to my colleagues Arti Samkaria, Sharanjeet Singh Sansoya as well as Khushboo Jatav.




Related Articles :

4 Responses to “Transactional Replication – Unable to Replicate Data onto the Subscriber – A Practical Example”

  1. Is it worth to kill sleeping processes unless they were participating in blocking(open tran,etc)?

  2. Nice case study.

    Would it be worth to take a look at the transactions at subscriber to check as to why they are blocked. Were they insert statements and if so asking why subscribers are getting updated in transactional replication.

    Capturing SQL Text by using DMV or using Activity to check queries which were blocked would have been helpful.

    I know in situations like this the first thing anybody/dba would think of restoring replication as soon as possible but I think it would be worth checking as to why this issue was caused.(Root Cause Analysis).

    Good case study. :)

  3. Hi Satnam,

    Its a usefull article. Small correction you have missed the numbering. Under OLTP points, the point 2 should be 3 i believe.

    Thanks

  4. Was really helpful.

    Thanks bro :)

Software Reviews | Book Reviews | FAQs | Tips | Articles | Performance Tuning | Audit | BI | Clustering | Developer | Reporting | DBA | ASP.NET Ado | Views tips | | Developer FAQs | Replication Tips | OS Tips | Misc Tips | Index Tuning Tips | Hints Tips | High Availability Tips | Hardware Tips | ETL Tips | Components Tips | Configuration Tips | App Dev Tips | OLAP Tips | Admin Tips | Software Reviews | Error | Clustering FAQs | Performance Tuning FAQs | DBA FAQs |