Recently we had an issue in our Production environment, on the Reporting Database Server the Transactional Replication procedure was being blocked due to execution of some heavy T-SQL queries and as a result business users were unable to view the latest data.
During our investigation, I had a looked at the Synchronisation status of the subscriber and I received the below error message:
The process is running and is waiting for a response from the server.
On further investigation I noticed that due to the execution of a resource-intensive T-SQL query, the database was blocking.
Service Centre Operation Manager (the monitoring utility we use to monitor the servers) was not capturing an alert message. Based upon this, I decided to write a T-SQL script which will execute every 30 minutes and check if the Transactional Replication is functioning properly and alert a particular user or a team.
During the investigation of the case, I have noticed that whenever the system shows the error message as :
The process is running and is waiting for a response from the server
it keeps on executing and at a later stage it throws a message as Query Timeout Expired which is stored in sysreplicationalerts table in the msdb database on the publisher server.
I queried this table to find the list of all the records ehere the Replication Process was halted using the below T-SQL:
select row_number() over(order by alert_id),[time],subscriber, alert_error_text from msdb..sysreplicationalerts where datepart(yyyy,[time]) = datepart(yyyy,getdate()) and datepart(mm,[time]) = datepart(mm,getdate()) and datepart(dd,[time]) = datepart(dd,getdate()) and alert_error_text like '%query timeout%'
I first created 2 temporary tables named #replicationalerts and #subscriber_alert_message which will be used for conditional processing.
The schema for #replicationalerts temporary table is as follows:
create table #replicationalerts ( replication_alert_id int, [time] datetime, subscriber varchar(100), alert_error_message varchar(1000) )
The schema for #subscriber_alert_message temporary table is as follows:
create table #subscriber_alert_message ( subscriber_name varchar(1000), [time] datetime, alert_message varchar(1000) )
The data in a replication procedure which has been halted is transferred into the #replicationalerts temporary table using the below T-SQL.
insert #replicationalerts select row_number() over(order by alert_id),[time],subscriber, alert_error_text from msdb..sysreplicationalerts where datepart(yyyy,[time]) = datepart(yyyy,getdate()) and datepart(mm,[time]) = datepart(mm,getdate()) and datepart(dd,[time]) = datepart(dd,getdate()) and alert_error_text like '%query timeout%'
Once this is done then I declared two local variables which will later be used in the conditional processing:
Declare @i int Declare @max_count int set @i = 1 select @max_count = max(replication_alert_id) from #replicationalerts
Whilst the value of the variable @i is less than or equal to the value of the variable @max_count, the fetching of records continues:
while(@i<=@max_count) begin select @subscribername=subscriber from #replicationalerts where replication_alert_id = @i select @patindexname=patindex('%\%',@subscribername) select @count = @patindexname-1 select @alerterrormessage = alert_error_message from #replicationalerts where replication_alert_id = @i select @alert_message_patindexname = patindex('%.%',@alerterrormessage) select @count_alert_message = @alert_message_patindexname -1 select @length_alert_message = len(alert_error_message) from #replicationalerts where replication_alert_id = @i select @final_count = @length_alert_message-@count_alert_message select @left_alert_error_message = right(@alerterrormessage,@final_count-1) from #replicationalerts select @time = [time] from #replicationalerts where replication_alert_id = @i insert #subscriber_alert_message select 'Subscriber Server Name', ,@time,@left_alert_error_message set @i = @i+1 end
I then created a temporary table named #alert_message to hold the text of the alert message to be displayed to the users.
create table #alert_message ( row_id int, subscriber_name varchar(1000), alert_message varchar(1000), [time] datetime )
The data in the temporary table was inserted using the #subscriber_alert_message, keeping in mind that only last 30 minutes data needs to be retrieved for which we have used the datediff function as shown below:
insert #alert_message select rank() over(order by subscriber_name,alert_message), subscriber_name, alert_message, [time] from #subscriber_alert_message where datediff(mi,[time],getdate())<=30
I declared two local variables named @j and @max_count as shown below:
Declare @j int Declare @max_count int set @j =1 select @max_count = max(row_id) from #alert_message while(@j<=@max_count) begin select @mail_subject = ‘Subscriber Server Name’+space(1)+'Replication Has Failed' from #alert_message where row_id = @j select @message = alert_message from #alert_message where row_id = @j select @mail_body = ' Hi Team,<br><br> The Transactional Replication between the OLTP and Reporting Server is not happening properly.<br><br> The reason for failure is:'+convert(varchar(100),@message)+'.'+'<br><br> The time when the failure occured is:'+convert(varchar(100),@time)+'.'+'<br><br> Regards,<br><br> DBA Support Team ' exec msdb..sp_send_dbmail @profile_name = 'DB Mail' ,@recipients ='semi comma seperated email address of the recipients' ,@subject = @mail_subject ,@body = @mail_body ,@body_format = 'HTML' ,@importance = 'High' set @j = @j+1 end drop table #alert_message drop table #replicationalerts drop table #subscriber_alert_message end
This is just one such approach which I used on my production environment to alert the users in case if the transactional replication gets halted. If there are any suggestions or approaches then please let me know.
]]>