Compare SQL Server Databases with sp_CompareDB
Editor’s Note: Not only has this author produced a very useful stored procedure to compare SQL Server database structure and data, his code provides valuable insights in to how to write advanced Transact-SQL code. This article include much of the thought process used when the author created the sp_CompareDB stored procedure. Even if you don’t understand all of the coding, you can still use this powerful stored procedure in your day-to-day work. Instructions for creating and running the stored procedure are at the bottom of this article.
I bet almost every DBA has at times desperately wanted to be able to compare data and structure between database tables, or even whole databases for that matter. Wouldn’t it be nice to have a stored procedure that uses only two parameters — the name of the first database and the name of the second database — that compares structures of the databases and the data in them, and tells you the differences between them?
While it is relatively simple to compare database or table structures, either programmatically or manually, comparing data is not so easy. Unlike comparing structures, comparing data between tables or databases is one that you can’t easily do manually. I’m not saying that nobody likes this sort of intellectual work, comparing table-by-table, column-by-column in bulky queries using JOINs and NOT IN. I’m saying it’s usually physically impossible to achieve if the databases are not diminutive, and mistakes are unavoidable.
What Does It Take to Write Such a Stored Procedure?
Well, let’s compare data structures and data between two databases programmatically. If we let ourselves think a bit about the implementation, we will come to a quick solution: create a cycle through all tables and compose SELECTs to obtain the differences. But when we start to put this primitive into practice, we run into a long series of “Oops!” Let me list just some of them I came upon while creating my SP to compare database structures and data.
- Tables might or might not contain a unique key. It’s much easier to recognize differences in data and to expose them if there is a unique key for a table. First, I thought that since an absence of a unique key is a violation of 1NF, I could just skip such tables in the SP I was writing, and these tables could just be compared manually. However, after using the SP to compare various databases, I came to the conclusion that in the real world many databases do have tables without unique keys, so, it is better not to ignore them.
For tables that have a unique key, my SP takes a look first for a primary key to compare and show differences. If there is no a primary key, then the first of unique constraint or unique index is used.
- Before data can be compared, comparing the structures is a first step. Otherwise, you will compare apples and oranges. If you have this first step, then it is better to make it valuable and produce a clear report about the structural differences. So my SP does show structural differences like:
Column phone: in db1 – char(12), in db2 – char(14)
Column Location not in db2
Tables with different structures are not used for comparing data. However, if the tables contain columns of the same type and different length (like “phone” in the example above), or if the tables have compatible data types (have the same type in syscolumns – char and nchar, varchar and nvarchar, etc.) they can have their data compared.
- Datatypes text, ntext and image cannot be used in a predicate. In fact, it’s easily avoidable. You can use:
in predicates for text and ntext data and just:
for image data, supposing that if an image value was modified then its length was changed for sure.
- Some columns in some tables store nulls. Even if you set ANSI_NULLS OFF, you cannot get TRUE when comparing two nulls. It’s possible if you compare two local variables or a database column value and a local variable, but comparing two columns with nulls will always return UNKNOWN. Anyway, you cannot use this SQL Server extension to the ANSI standard (SET ANSI_NULLS OFF) if you assume the comparison of databases when at least one of them resides on a linked server. In this case, SET ANSI_NULLS, as well as SET ANSI_WARNINGS, must be set to ON.
- It’s apparently the case that comparing SELECTs must be executed dynamically, because they are generated on the fly. The problem is that some tables contain too many columns and those columns sometimes have very long names. I prefer self-explanatory names, but then I ran into an error message “Incorrect syntax near …” because the SELECT could not fit into the maximum of 8000 bytes, I was frustrated. There was no way to go ahead, but to limit the number of columns checked. The solution here involves a trick. I saved the strings containing predicates in a temporary table and then in a dynamic script I declared as many local variables with type varchar(8000) as necessary to cover all strings in the temporary table. Then I executed them as:
Thus, there is no restriction on the number of such strings and, therefore, there are no restrictions (except that SQL Server establishes) on the number of columns or name length in the tables of compared databases.
- To compare two databases, at least one of them, by definition, must have 3-storey-qualified tables, because you cannot have two current databases. Therefore, the database name must be hard-coded in the script. To avoid this and make the script easy-to-run, we might use a so-called macro-substitution and make the script dynamic. (i.e., the SP is dynamic T-SQL code containing dynamic T-SQL code).