SQL Server Performance

Clean up SQL backups - with dependencies

Discussion in 'Contribute Your SQL Server Scripts' started by merrillaldrich, Dec 11, 2006.

  1. merrillaldrich New Member

    I recently had a need to create a utility for managing backup files, and thought I would post it both to share and to see if anyone had suggestions for it.

    Problem: the conventional method of managing backups uses a "sliding window" to delete backup files when they reach a certain age. There are two issues with that, especially for a larger database, because it does not take into account the dependencies between the backup files (i.e. transaction logs do no good if you have no full backup; differentials, likewise). First, if a backup fails and the failure is not immediately addressed, the "sliding window" that goes along removing files can create hole in your backup sequence, by removing a file on which others depend. Second, it can take unnecessary drive space to store the backups, because the system will not differentiate a file that has been superseded by other backups from one that is needed for the current recovery sequence.

    For our organization, I developed a small .NET utility that is capable of "cleaning out" a folder of old SQL backups while taking into account their dependencies, leaving only the files that are actually needed to recover to the state indicated in the last file present.

    Some preparation is required to do this, mainly in regard to how the backups are named. We settled on a simple naming convention that would make it easy to identify each type of file and when it was created:

    databasename_fullBackup_yyyymmddhhmm.bak
    databasename_diffBackup_yyyymmddhhmm.bak
    databasename_tlogBackup_yyyymmddhhmm.trn

    Our backup scripts were then set up this way:

    Full backup:


    declare @dbase varchar(2000);
    declare @folder varchar(4000);
    declare @backupdate char(12);
    declare @file varchar(4000);
    declare @name varchar(2000);

    set @dbase = 'mydatabase';
    set @folder = 'E:SQL_BAKmydatabase';

    set @backupdate =
    left(
    replace(
    replace(
    replace(
    convert( varchar(50), getdate(), 126 ), 'T', '')
    , ':', '')
    , '-', '')
    , 12 );

    set @file = @folder + '' + @dbase + '_fullBackup_' + @backupdate + '.bak';
    set @name = @dbase + '-Full Database Backup';

    print getdate();
    print 'Backing up ' + @dbase + ' to ' + @file;
    print '';

    BACKUP DATABASE @dbase
    TO DISK = @file
    WITH NAME = @name;
    GO

    Differential:


    declare @dbase varchar(2000);
    declare @folder varchar(4000);
    declare @backupdate char(12);
    declare @file varchar(4000);
    declare @name varchar(2000);

    set @dbase = 'mydatabase';
    set @folder = 'E:SQL_BAKmydatabase';

    set @backupdate =
    left(
    replace(
    replace(
    replace(
    convert( varchar(50), getdate(), 126 ), 'T', '')
    , ':', '')
    , '-', '')
    , 12 );

    set @file = @folder + '' + @dbase + '_diffBackup_' + @backupdate + '.bak';
    set @name = @dbase + '-Differential Database Backup';

    print getdate();
    print 'Backing up ' + @dbase + ' to ' + @file;
    print '';

    BACKUP DATABASE @dbase
    TO DISK = @file
    WITH DIFFERENTIAL, NAME = @name;
    GO

    Transaction log:


    declare @dbase varchar(2000);
    declare @folder varchar(4000);
    declare @backupdate char(12);
    declare @file varchar(4000);
    declare @name varchar(2000);

    set @dbase = 'mydatabase';
    set @folder = 'E:SQL_BAKmydatabase';

    set @backupdate =
    left(
    replace(
    replace(
    replace(
    convert( varchar(50), getdate(), 126 ), 'T', '')
    , ':', '')
    , '-', '')
    , 12 );

    set @file = @folder + '' + @dbase + '_tlogBackup_' + @backupdate + '.trn';
    set @name = @dbase + '-Trans Log Backup';

    print getdate();
    print 'Backing up log for ' + @dbase + ' to ' + @file;
    print '';

    BACKUP LOG @dbase
    TO DISK = @file
    WITH NAME = @name;
    GO

    With that naming convention in place, its pretty simple to create a program that can remove the files that are redundant from the backup sequence.

    There is one additional feature we've implemented, which you may not need should you try this method: we have a separate routine that creates compressed copies of each backup file and ships them off to another server; the code sample below includes a check against the set of compressed backup files as an additional condition before deleting them. You could remove that from the process if you elect to try this.

    In summary, the code below does this:

    1.Make a custom log called "RemoveStaleSQLBackups" if it does not exist (that log, afterward, records the actions of the utility and any errors, on the assumption this will run as a Windows Scheduled task or regularly via SQL Server Agent)

    2.Remove any transaction log backups from a specified folder, where there is a later full or differential backup for the database in the folder (and the file is present in our compressed backups folder)

    3.Remove any differential backups where there is a later full backup or later differential backup (and the file is present in our compressed backups folder)

    4.Remove any full backups where there is a later full backup (and the file is present in our compressed backups folder)

    As the backup process proceeds over days or weeks, this has the effect of automatically removing the old backups, not based on date, but instead based on the fact that a new backup file is present making the old one(s) unnecessary.

    This is .NET Framework 1.1 code which we run on Windows Server 2003. It's a console application; in our setup its called "RemoveStaleSQLBackups," invoked as:



    RemoveStaleSQLBackups dbname backupFolder zippedBackupFolder



    Imports System.IO, System.collections

    Module Module1

    ' This utility is free; you may use, modify, or redistribute it as you like. However,
    ' it is provided AS IS with no warranty whatsoever. Please review the code
    ' carefully to be sure it meets your needs, and test it.
    ' 11 Dec 2006 Merrill Aldrich

    ' 15 Dec 2006 Modified to keep only the last DIFF backup, rather than a sequence of DIFFs

    Sub Main(ByVal cmdArgs() As String)

    If cmdArgs.Length = 3 Then
    Try
    Dim sbm As New SQLBackupManager(cmdArgs(0), cmdArgs(1), cmdArgs(2))
    Console.WriteLine("Removing stale SQL Backups for " & cmdArgs(0) & " from " & cmdArgs(1))
    sbm.clean()
    Console.WriteLine("Cleanup finished.")
    Catch ex As Exception
    Console.WriteLine(ex.ToString)
    End Try
    Else
    Console.WriteLine("Usage: RemoveStaleSQLBackups dbname backupFolder zippedBackupFolder")
    End If

    End Sub

    Private Class SQLBackupManager
    ' Class to manage removal of redundant SQL Server backup files
    ' If any file (log, differential or full backup) is made redundant
    ' by a later file in the backup sequence, then remove the redundant file.
    ' e.g. Diff Backups are superceded by Full Backups or later differentials;
    ' Trans Log Backups are superceded by both.

    Private db, backupf, compf As String
    Private log As EventLog

    Public Sub New(ByVal dbname As String, ByVal backupfolder As String, ByVal compfolder As String)

    ' Attempt to construct, if passed directories are accessible

    If Not Directory.Exists(backupfolder) Or Not Directory.Exists(compfolder) Then
    Throw New Exception("The specified backup folder or compressed backup folder was not found.")
    End If

    db = dbname
    backupf = backupfolder
    compf = compfolder

    ' Configure logging
    If Not EventLog.SourceExists("RemoveStaleSQLBackupsSource") Then
    EventLog.CreateEventSource("RemoveStaleSQLBackupsSource", "RemoveStaleSQLBackups Log")
    End If
    log = New EventLog
    log.Source = "RemoveStaleSQLBackupsSource"

    End Sub

    Public Sub clean()

    Debug.WriteLine("Beginning clean up of SQL Backups for " & db)
    log.WriteEntry("Beginning clean up of SQL Backups for " & db)

    ' Get lists of files according to type
    Dim di As DirectoryInfo = New DirectoryInfo(backupf)
    Dim tlogBackups As FileInfo() = di.GetFiles(db & "_tlogbackup_????????????.trn")
    Dim diffBackups As FileInfo() = di.GetFiles(db & "_diffbackup_????????????.bak")
    Dim fullbackups As FileInfo() = di.GetFiles(db & "_fullbackup_????????????.bak")

    ' Sort the lists of files (name format makes them sort chronologically)
    Dim fc As fileInfoComparer = New fileInfoComparer

    Array.Sort(tlogBackups, fc)
    Array.Sort(diffBackups, fc)
    Array.Sort(fullbackups, fc)

    ' Examine each file; if the file is made redundant by a later backup set member, remove it:
    Dim redundant As Boolean = False

    For Each f As FileInfo In tlogBackups
    Debug.WriteLine(f.Name)
    'If the last full or differential is after this tlog,
    'and we have a zipped version, this tlog is safe to delete

    redundant = False

    If fullbackups.GetLength(0) > 0 Then
    If compareFileDates(f, fullbackups(fullbackups.GetUpperBound(0))) = -1 Then
    redundant = True
    End If
    End If

    If diffBackups.GetLength(0) > 0 Then
    If compareFileDates(f, diffBackups(diffBackups.GetUpperBound(0))) = -1 Then
    redundant = True
    End If
    End If

    If redundant Then
    deleteIfZipped(f)
    Else
    Debug.WriteLine(" KEEP")
    End If
    Next

    For Each f As FileInfo In diffBackups
    Debug.WriteLine(f.Name)
    'If there exists either a full or differential backup after this one, and
    'we have a zipped version of this f this one

    redundant = False

    If fullbackups.GetLength(0) > 0 Then
    If compareFileDates(f, fullbackups(fullbackups.GetUpperBound(0))) = -1 Then
    redundant = True
    End If
    End If

    If diffBackups.GetLength(0) > 1 Then
    If compareFileDates(f, diffBackups(diffBackups.GetUpperBound(0))) = -1 Then
    redundant = True
    End If
    End If

    If redundant Then
    deleteIfZipped(f)
    Else
    Debug.WriteLine(" KEEP")
    End If
    Next

    For Each f As FileInfo In fullbackups
    Debug.WriteLine(f.Name)
    'If the last full backup is after this one (and there are multiple full backups)
    'then this one is safe to delete
    If compareFileDates(f, fullbackups(fullbackups.GetUpperBound(0))) = -1 Then
    deleteIfZipped(f)
    Else
    Debug.WriteLine(" KEEP")
    End If
    Next

    Debug.WriteLine("End clean up of SQL Backups for " & db)
    log.WriteEntry("End clean up of SQL Backups for " & db)

    End Sub

    Private Function compareFileDates(ByRef x As FileInfo, ByRef y As FileInfo)

    ' Get the last 12 characters of the file names (the dates) and compare them
    Dim xname As String = Path.GetFileNameWithoutExtension(x.Name)
    Dim yname As String = Path.GetFileNameWithoutExtension(y.Name)

    Dim xdate As String = xname.Substring(xname.Length - 12, 12)
    Dim ydate As String = yname.Substring(yname.Length - 12, 12)

    Return String.Compare(xdate, ydate)

    End Function

    Private Sub deleteIfZipped(ByVal f As FileInfo)
    ' Check to see that there's a compressed version of file f; if so, delete f
    If File.Exists(Path.Combine(compf, f.Name) & ".gz") Then
    Debug.WriteLine(" DELETE")
    Try
    f.Delete()
    log.WriteEntry("Deleted file " & f.FullName)
    Catch ex As Exception
    ' If delete failed, print the error, but continue with next task
    Debug.WriteLine("Error deleting " & f.Name & vbNewLine & ex.ToString)
    log.WriteEntry("Error deleting " & f.FullName & vbNewLine & _
    ex.ToString, EventLogEntryType.Error)
    End Try
    Else
    Debug.WriteLine(" Could DELETE, but there's no GZIP version. KEEP")
    log.WriteEntry(f.FullName & " was eligible to be deleted, but there was no zipped file " & _
    Path.Combine(compf, f.Name) & ".gz", EventLogEntryType.Warning)
    End If

    End Sub

    Private Class fileInfoComparer
    Implements IComparer

    ' Class required to sort arrays of fileinfo objects by file name

    Public Function Compare(ByVal x As Object, ByVal y As Object) As Integer Implements IComparer.Compare

    ' Compare the file names of two files, case insensitively

    Return String.Compare(CType(x, FileInfo).Name, CType(y, FileInfo).Name, True)

    End Function

    End Class

    End Class

    End Module


Share This Page