Transactional Replication – Violation of Primary Key Constraint

Recently, I encountered a case where the Transactional Replication between our Production OLTP server named XYZ and Reporting database server named PQR went out of sync. Both the servers were located in the same Data Centre in the USA. The Reporting database was used by the users in Canada for fetching the data from the SSRS reports. At 06:30 AM IST, I was notified regarding this issue. Canada Business hours normally start at around 1800 hrs IST. Considering the fact that I have around 12 hours to fix this issue, my main aim was to fix it not by setting up the Replication procedure again but to do a deep investigation and find an alternative and a convenient way to fix it.

I started by opening the Replication Monitor, the error message was

Violation of Primary Key Constraint in the table ABC

Just FYI, the similar error message would also be shown in the View Synchronization Agent status tab . In the Replication Monitor along with the error message, I noticed that the command_id and xact_seqno for the culprit command was 1 and 0x00040C1F00002931001900000000 respectively.

Based upon the command_id and xact_seqno as discussed above, my next aim was to find the exact command due to which the Replication procedure was failing.

Against the distribution database on the publisher instance, I executed the below T-SQL to find the article id and the publication database id (in the Replication topology each replicated table is referred as an article).

select *
 from msrepl_commands (nolock)
 where command_id = 1 and xact_seqno =
0x00040C1F00002931001900000000

Once the above T-SQL query was executed, I received the corresponding database id in the output message as 3 and the article id of 2291.

The next step was to find the exact table name which was having an issue on the subscriber. In order to do that, I executed the below T-SQL query against the database nameddistribution on the Production OLTP Server (Publisher).

select * from dbo.MSarticles
 where article_id IN (SELECT Article_id from
MSrepl_commands
 where xact_seqno = 0x00040C1F00002931001900000000)
 and publication_id =3

xact_seqnoandpublication_idis the one which we have obtained by executing the T-SQL commands as discussed above which were 0x00040C1F000029310019 and 3 respectively.

When the above T-SQL query was executed, the output message showed the table name to be astbl_ABC. At this stage, I was sure thattbl_ABC was the table which was having an issue.

After identifying the table name the next challenge was to find the Replicated command which was having an issue to get propagated onto the subscriber.

In order to find the Replicated command, I executed the below T-SQL query against the distribution database on the Production OLTP Server.

sp_browsereplcmds
@article_id=2291,
@command_id=1,
@xact_seqno_start='0x00040C1F000029310019',
@xact_seqno_end='0x00040C1F000029310019',
@publisher_database_id=3
    

Each and every input parameter specified in the sp_browsereplcmdssystem stored procedure has been obtained from the steps which we have discussed above. As soon as the above T-SQL query was executed, I received the following command:

{CALL [sp_MSIns_dbotbl_ABC] (91962)}

 Once the above T-SQL was executed, I noticed a very strange thing. One of the records in the table named tbl_ABC  which was already present on the subscriber was present on the publisher as well and the system was trying to re-enter the same entry into the table again on the subscriber.

Select * from tbl_abc where LogID=91962

Where LogID was the Primary Key column.

On the subscriber, I decided to delete this particular record. Before deleting the record, I took the backup of the table on the subscriber as,

Select * into tbl_abc_backup_20130830 from tbl_abc

Once the backup of the table was taken successfully after executing the above command, I deleted the particular record from the table on the subscriber as below.

delete from tbl_abc where LogID=91962

Once the above T-SQL was executed the Transactional Replication procedure started working fine again. In the View Synchronization Status tab present under the Replication tab in the Object Explorer, I noticed that the Replicated Commands were being delivered correctly.

This was one such approach that we used to ensure that we brought the Replication Procedure on track successfully before the business started. I have seen many SQL DBA’s preferring setting up the entire procedure again but that is not an ideal approach especially when the database size is huge and setting up Replication takes a long long time. Please do let me know if you have any alternative solution/suggestions regarding the same.

]]>

Leave a comment

Your email address will not be published.