Spotlight on ApexSQL Data Diff – Server-based database comparison tool

  • Introduction
  • ApexSQL Data Diff
  • Installation, System Requirements, supported SQL Server versions
  • Using ApexSQL Data Diff in Data Mode
  • The Command Line Interface
  • Summary
  • Conclusion
  • Introduction

    Synchronizing data between database in non-production environments is one of the tasks I find myself performing quite regularly for a variety of reasons. While you surely can write your own tool for that purpose based on freely available code snippets, I fail to see the point as to why I should reinvent the wheel, when there are industry-strength tools available. ApexSQL Data Diff is one of the few tools in this niche market. And a strong competitor in this market it is as we will see.

    ApexSQL Data Diff

    The remainder of this extended spotlight will cover installation of ApexSQL Data Diff and a review of its key features.

    Installation, System Requirements, supported SQL Server versions

    Software Prerequisite:

    Microsoft SQL Server versions 7.0, 2000, 2005 or 2008

    Required Components
    not included with Installation:

    Microsoft ActiveX Data Access Objects Library (ADO) (MDAC) 2.8 or greater
    Microsoft .NET Framework 2.0 SP1
    Microsoft Windows Script 5.5 or greater

    Operating System:

    Windows 2000
    Windows XP
    Windows 2003
    Windows Vista
    Windows 2008
    Compatible with 64-bit OS versions.

    CPU:

    Pentium Class Processor 600 MHz or higher.

    RAM:

    512 MB RAM (256 MB minimum).

    Hard Disk Drive Space:

    40 MB Free Disk Space (100 MB Preferred).

    As you can see are the system requirements for ApexSQL Data Diff quite moderate and almost any modern PC should be able to run the application. You can download the software as a 30-days trial version from the ApexSQL homepage. The download is just about 3.9 MB in size. For this spotlight I have used the latest publicly available version of ApexSQL Data Diff which is 2008.02. You can find this version on the product homepage. Note that the software prerequisites mentioned above are not included in the download package. If they aren’t already installed on your machine, you need to download and install them manually.

    Installation of ApexSQL Data Diff is no big deal. It is completely wizard-driven, straightforward, and takes only a few minutes, so I won’t need to get into any details here about this process. Once the installation is completed, you are ready to run and explore ApexSQL Data Diff.

    Using ApexSQL Data Diff in Data Mode

    When you start ApexSQL Data Diff for the first time, the project wizard pops up and you can create your first Data Diff project right away.

    First thing to specify in a project is which databases you want to compare. These “databases” can either be live databases or backups of a database. The rest on this page of the wizard is self-explaining for anyone using SQL Server on a regular basis. You provide the servername, select the database on that server (or select a backup file in case you want to analyze a backup). This is basically everything you need to provide to Data Diff in order to run a project.

    It is worth noting that you can decide whether to include “special” data types such as uniqueidentifier, Blobs (text, image, varchar(MAX), and varbinary(MAX)) in a comparison or not.

    One of the most interesting features when using ApexSQL Data Diff in Data Mode is its ability to map tables and columns within these tables. To explain what “mapping” in ApexSQL Data Diff terms means, let’s consider the following scenario:

    Typically you compare almost identical databases. Meaning you have a development database that is a more or less exact copy of the production database. You then make your changes to the development database and deploy them after a testing phase to the production database. In this case ApexSQL Data blahDiff can automatically map tables since it assumes tables of the same name to have the same meaning. That is, a table “Orders” is very likely to have the same meaning in both databases. This automatic mapping is done by default.

    Now, when table names in Source and Destination do not match, that can, amongst other reasons, either mean that the tables do not exist in one of these databases or that tables with the same meaning do exist in both database, but they differ in their names.

    Look at the screenshot above. The upper – empty – grid displays mapped table objects. The lower grid displays unmapped objects. Let’s have a closer look at this lower grid:

    As you can see there are a few table objects in the Source schema and 1 in the Destination schema. Usually ApexSQL Data Diff would pickup these tables because they match in their name, but I’ve cheated a bit here, because I have created the tempdb.dbo.nums2 table after the initial comparison to demonstrate the process of mapping an object.

    To map both tables, just select its counterpart in either the Source or Destination schema name column.

    Once this is done, just click on the Map button to establish the connection.

    ApexSQL Data Diff instantly moves the mapped table pair to the mapped objects grid.

    In an ideal world we could now leave things as they are as we assume the table name typo to be the only difference. Given this holds true, ApexSQL Data Diff would do the rest for us and map the columns in Source and Destination based on their names. Unfortunately it isn’t an ideal world in which we can rely on the column names to be identical in both tables. Before getting into any trouble, we can take action here, and instruct ApexSQL Data Diff which columns to be matched with each other.

    When you scroll the horizontal scrollbar a bit to the right, you will find the “Columns” column. From the dropdown box you can open the dialog in which you can map columns:

    Just select the columns to be mapped from the dropdown box and you’re done.

    Now, the rationale behind the whole mapping thing is to tell ApexSQL Data Diff what data in which columns of which tables to compare during the comparison process. As I have demonstrated you can either accept the defaults in which tables and columns within those tables are mapped based on their names, or you can take over full control and map tables and the columns within those tables exactly according to your needs.

    Next you can choose what data in what object types is to be compared. You can select base tables and indexed views here.

    Once you have completed the wizard, you can instruct ApexSQL Data Diff to compare source and destination of your project and display its findings in the main grid of the application.

    As you can see from the above screenshot, there are quite a few rows missing in the destination. To start the synchronization, just click on the “Synchronize” icon in the Outlook bar or choose synchronize from the context menu of the main grid.

    This invokes the Synchronization Wizard that guides you through the rest of the process.

    First step in this wizard is to decide upon the direction of the synchronization. By default this is taken from the overall project setting, but as you can see you can reverse this direction by clicking on the “Reverse” button.

    When you opt to display the “Advanced options” the next page you see lets you decide on whether to include SQL Server transaction handling in your synchronization script or not. ApexSQL Data Diff also lets you specify a maximum number of rows per batch in case you’re going to perform larger synchronizations with many rows and you want to give the transaction log some room to breathe.

    As the next step you can specify if there are any special script that should be executed before and/or after the synchronization or not. I’m tempted to say that in most cases you don’t need this option, but it’s good to know that it’s there.

    Another difference when running ApexSQL Data Diff in Data Mode is shown in the screenshot above. You can choose how certain constraints should be handled by the application. Interesting and useful is also the option to apply different date formats when you need to work with data stored in (small)datetime columns in SQL Server. To avoid any issues when synchronizing data between databases with different settings you should always use an unambiguous date format, such as the one shown in the screenshot above.

    You continue with some optional script elements, such as header or comments. Here I just accept the default settings.

    Probably the most important decision you need to make is to decide upon which rows should be synchronized at all.

    Here’s a brief explanation of each possible type:

    • Different Records: These are rows in table pairs that are different in at least one of their column pair values.
    • Missing Records: These are rows in table pairs that only exist in the Source part of the comparison, but not in the Destination.
    • Additional Records: These are rows in table pairs that only exist in the Destination part of the comparison, but not in the Source.

    Each one is individually selectable depending on your needs.

    Last step of the wizard is to specify how to process the output. You can choose to open the script in an editor (either the built-in one or any other of your choice), to write the output to a file, or to execute it directly against the database. In my case I decided to open the script in the built-in editor.

    From this editor you can inspect the output of the script and execute it against the destination database of your project once you’re fine with this output.

    Coming back to my running example, I have decided to execute the script against the Destination database. In my case there are some 1000 rows to synchronize, so the whole process doesn’t take too long. Upon completion, ApexSQL Data Diff informs you about the successful execution:

    Reporting is also part of ApexSQL Data Diff. The available options and setting can be seen in the above screenshot. The output is a HTML file that be viewed with a web browser.

    Using the options of ApexSQL Data Diff you can customize many aspectes of the application to suit your needs, such as what to display as author in the script header, or if you want to display some “legal” information as part of the header.

    Some lines above I mentioned that you can use any editor of your choice to view a script generated by ApexSQL Data Diff. Here on this options tab you can actually specify which editor to use. If you’re using another editor not mentioned in the options, just specify the path to that executable, pass along the necessary arguments and this editor will then be opened.

    On the last options tab you can tweak the user interface if you’re not happy with the default settings. To me this is more of a nice-to-have, but still, if you want, you can.

    The Command Line Interface

    ApexSQL Data Diff hosts, just like any other of their tools, a feature-rich and powerful Command Line Interface (CLI). This CLI can be used to automate any task by creating a batch file that can be executed via the Windows Scheduler. The CLI is certainly a pretty complex topic in and of itself and takes some time to get acquitant with. However, in its most simple, yet probably also most versatile form, you can create your project with the ApexSQL Data Diff GUI with all its options and settings, save the project and just pass the whole project file to the CLI via the /argfile switch.

    Here are some examples on how to use the CLI (taken from the ApexSQL Data Diff Online Help):

    • Compare selected objects of two databases in data mode and create SQL synchronization script:
      ApexSQLDiff /m:data /s1:CS /d1:pubs /s2:CI /d2:pubs2 /of:sync.sql /i:profile_data.xml
    • Compare selected objects of two databases in data mode and create HTML report:
      ApexSQLDiff /m:data /s1:CS /d1:pubs /s2:CI /d2:pubs2 /ot:html /of:comp_report.html /i:profile_data.xml
    • Compare selected objects of two databases in data mode and create VB synchronization script:
      ApexSQLDiff /m:data /s1:CS /d1:pubs /s2:CI /d2:pubs2 /ot:vbs /of:sync.vbs /i:profile_data.xml

    Summary

    ApexSQL Data Diff provides the following key features:

    1. Compare and synchronize data differences between SQL Server data sources (Live or backup).
    2. Automated execution of comparison processes via a powerful Command Line Interface.

    Advantages

    1. A very rich set of options and setting from which you can choose.
    2. Economically priced.
    3. Moderate system requirements.
    4. Overall easy to use.

    Disadvantages

    1. The reporting features are basic. Though you can export to XML, I would like to see some more options like Exporting to PDF or XLS.

    Conclusion

    ApexSQL Data Diff is easy to use, fast, and reliable. It helps you designing a smooth transition of data from one source to another destination. If you have a need for such a tool (and, as a matter of course, you do have such a need, if you are just half-way professionally managing your SQL Server environment), make sure to stop on the products’ homepage and give the product a try.

    ]]>

    Leave a comment

    Your email address will not be published.