SQL Server Performance

Log Shipping fails some times

Discussion in 'General DBA Questions' started by 909, Mar 13, 2007.

  1. 909 New Member

    Hopefully someone can help out here. I am by no means a DBA but have to figure out this issue. We have a 2K cluster doing log shipping. The logs are shipped every minute because of the critical nature of the data (financial) so we don't want to lose anything in the process. What happens every now and again, is we get a failure, then a subsequent alert. Below is the event info.<br /><br />Event: 17055<br />Category (2)<br /><br />18204 :<br />BackupDiskFile:<img src='/community/emoticons/emotion-3.gif' alt=':O' />penMedia: Backup device 'E:program FilesMicrosoft SQL ServerMSSQLBACKUPXXXXX.TRN' failed to open. Operating system error = 32(The process cannot access the file because it is being used by another process.).<br /><br /><br />For more information, see Help and Support Center at<a target="_blank" href=http://go.microsoft.com/fwlink/events.asp.>http://go.microsoft.com/fwlink/events.asp.</a><br /><br />I have been unsuccessful in finding the locking process because after some time, the shipping will continue and we have thus far ruled out AV. We have also ruled out backup software because there seems to be no specific time frame as to when these occur. They usually occur about 2 days apart or sometimes up to 1 week. <br /><br />Hopefully someone here can shed some light on this because i'm going crazy trying to figure it out.<br /><br />TIA
  2. MohammedU New Member

    How often you run the log backup? Why you want to ship the logs every minute? I don't think any one runs the log backup every minute?

    Increase the log shipping interval and see what happens...

    MohammedU.
    Moderator
    SQL-Server-Performance.com
  3. 909 New Member

    I have a little more information as I dig deeper into this which may or may not help. I'm not the DBA here, but the engineer. It has become a political issue of me having to prove that this is not a MS Win issue, and the only way to do this seems to be proving it's a SQL issue with the scheduling or execution. So on with what I found...<br /><br />There are 3 DB's that ship logs. It looks like all 3 are executed from the same maintenance schedule. Event viewer shows db #1 restoring 1 second after each minute, db #2 restoring around 4 seconds after the minute, then db #3 restoring around 6-8 seconds after the minute mark. An interesting note is that I have been unable to locate a failure of a restore on db #1. The file locks are all on db #2 and db #3 exclusively. Again, i'm no DBA so i'm looking at this from an engineer point of view, so my conclusions may be innacurate but I have little/no input from the DBA here so i'm looking to you guys for insight.<br /><br />My belief at this point is that the restore time is not sufficient between db #1 and the following restore jobs to allow a complete backup and transfer. I believe the file lock noted in the events comes from the backup or transfer process still working on the shipped log. I could be way off base here so please educate me on how the process works.<br /><br />If the above is a correct assumption, I don't believe that the shipping jobs can be scheduled in seconds, only minutes. If it is possible to say job #1 kicks off on the minute, job #2 kicks off 20 seconds later, then job #3 runs at the 40 second mark. All 3 of these backup/restore processes is all happening within the first 10 seconds of every minute, then nothing for 50 seconds. Is there a way to do this? I know in the olden days of batch jobs, the sleep command could be added between jobs to allow for a completion window and I don't believe this is the case in the SQL scheduler. Please educate me....<br /><br />Also, the 1 minute schedule was brought about by someone (not mentioning names here) telling the application owner that we can replicate every minute and in the event of a complete failure, we would lose 1 minute of data at the maximum, so they went with that. Believe that if this would log ship every 10ms, they would have done it. <br /><br />With that being said and the politics being accounted for, if I were to increase the log shipping interval, and there were to be a failure of any kind, it would be my a$$ to prove that it didn't have anything to do with the failure. Getting everyone to agree to an increase would be like signing my own pink slip for sure. I believe that if this can be broken up to run over the whole minute, instead of the first 10 seconds, it would relieve the issue. Another log that gets added to the fire is that our monitoring utiility also reports that the DB is out of sync for 1500 seconds due to a restore failure. I cannot see any such issue in EV, but if it's in print (email) and people read it, it must be true. It would be much better to resolve the issue completely, and when all else fails, then it's time to disable the failure logging [<img src='/community/emoticons/emotion-2.gif' alt=':D' />]
  4. MohammedU New Member

    Sorry! to hear all this but we recomend to resolve the issues based on the information provided....
    For this type of issue you guys need to figure it out what is causing the problem irrespective of your area of interest...

    It may be configuration error or it could be SQL/OS bug...

    Good luck...


    MohammedU.
    Moderator
    SQL-Server-Performance.com

Share This Page