Site sponsored by: Idera Try Idera’s new SQL admin toolset
SQL Server Performance

  • Home
  • Articles
  • Forums
  • Tips
  • Quiz
  • FAQ's
  • Blogs
  • Software
  • Books
  • About Us
RSS Feeds
Sign in | Join


Article Topics

All Articles
Peformance Tuning
Audit
Business Intelligence
Clustering
Reporting Services
Developer
General DBA
ASP.NET / ADO.NET

SQL Server 2008 - Worth the Wait

SQL Server’s first significant upgrade in three years features a number of envelope-pushing enhancements and improvements. Which will have the greatest impact on SQL administration and development? More...
Latest Articles

Slowly Changing Dimensions in SQL Server 2005
Audit Data Modifications
SQL Server 2008’s Management Data Warehouse
Same Report but Different Methods in SQL Server Reporting Services ...

More     
 
Latest FAQ's

How to Integrate Performance Monitor and SQL Profiler
SSIS Lookups are Case Sensitive
Convert Number to Words in SSRS
After installing SP2 on SQL Server 2005 x64, when trying to ...

More     
   
Latest Software Reviews

SQL Server DBA Dashboard
SwisSQL DBChangeManager
SQLMesh - SQL Server Search Tool
SoftTreeTech SQL Assistant

More     

articles >> developer >> How to Delete SQL Server Database Duplicate ...

How to Delete SQL Server Database Duplicate Rows from a Table Using a Specified Column List and No Temp Tables

By : David VanDeSompele
Jul 06, 2004
Printer friendly

As much as we try and prevent it, duplicate data still finds its way into corporate data systems, but the good news is that purging duplicate rows can be a relatively simple task.

Prevention, through the use of good match code algorithms and constraints, will stop most duplicate data from entering your databases, but there are times in which duplicate data will still land in your data tables. Merging data from disparate systems is one way that duplicate data can enter a table. Inheriting a poorly designed database is another way in which you may receive duplicate data. (This author has inherited several poorly data systems.) The good news is that there is a relatively easy method of resolving the problem, but first a bit of history.

Years ago I worked for Microrim Incorporated, makers of R:BASE, a desktop relational database system. Among its many unique features, R:BASE had an extension to the SQL DELETE syntax as follows: DELETE DUPLICATES FROM tablename. This syntax would delete all duplicate rows from a table, leaving only one instance of each row. The operation was thorough, but very slow because the database engine had to compare every column value for a given row with every column value for every other row in the table. To help improve performance, an enhancement was later added to the DELETE syntax: DELETE DUPLICATES FROM tablename USING columnlist.

There are many obvious benefits gained by allowing developers to specify the list of columns to use for duplicate comparison, performance being the greatest. For example, suppose you have a table called CustomerTable and you determine that identical data in the columns CompanyName, FirstName, LastName and Address1 constitutes a duplicate.

Build a multi-column index using these four columns and then run the delete command: DELETE DUPLICATES FROM CustomerTable USING CompanyName, FirstName, LastName, Address1. When I started working with SQL Server I discovered, much to my dismay, that it had no comparable feature. We can, however, simulate the same action using some TRANSACT-SQL.

This script uses a CURSOR in conjunction with a GROUP BY colname HAVING COUNT(*) clause to build the list of duplicates. Choosing the list of columns to compare for duplicates is all that is required. We will use the same column list mentioned in the previous example: CompanyName, FirstName, LastName and Address1. In other words, if the combined value of CompanyName, FirstName, LastName and Address is identical in two or more rows, remove all but one copy of the row. I have included the COUNT function in the select list to capture the number of duplicates for a given row set. We will need this value for the DELETE clause. Here is our complete DECLARE CURSOR statement:

DECLARE DelDupe CURSOR FOR

SELECT COUNT(*) AS Amount,

vchFirstName,

vchLastName,

vchAddress1

FROM TestTable

GROUP BY vchFirstName,vchLastName,vchAddress1

HAVING COUNT(*) > 1

Now that we’ve defined the duplicate row criteria and built the data set we’ll use the ROWCOUNT function and build a dynamic SQL DELETE statement.

As a reminder, our goal here is to retain one instance of each duplicate row, removing all other occurrences. The count function was included to capture the number of times a given data set repeated. If we subtract 1 from the count and then set the ROWCOUNT to this value then we will have the number of rows to delete for a given set of duplicates. We next need to convert our ROWCOUNT value to a character data type so that we can build it into our dynamic SQL statement. Here is the code to do that:

SELECT @iCount = @iCount - 1

SELECT @chCount = CONVERT(char(3),@iCount)

Now we can build our dynamic ROWCOUNT and DELETE statement:

SELECT @nvchCommand = N'SET ROWCOUNT ' + @chCount +

'DELETE TestTable ' +

' WHERE vchCompanyName = ' + CHAR(34) + @vchCompanyName + CHAR(34) +

' AND vchFirstName = ' + CHAR(34) + @vchFirstName + CHAR(34) +

' AND vchLastName = ' + CHAR (34) + @vchLastName + CHAR(34) +

' AND vchAddress1 = ' + CHAR(34) + @vchAddress1 + CHAR(34)

And finally we can execute the statement:

EXEC sp_executesql @nvchCommand

The rest of the script is standard error checking and all the necessary code for looping through the CURSOR.


    Next Page>>    








Home | Peformance Articles | Audit Articles | Business Intelligence Articles | Clustering Articles | Developer Articles | Reporting Services Articles | DBA Articles | ASP.NET / ADO.NET Articles | DBA FAQ's | Developer Peformance FAQ's | DBA Peformance FAQ's | Developer FAQ's | Clustering FAQ's | Error Messages | Audit Tool Reviews | Backup Tool Reviews | Coding Tool Reviews | Compare Tool Reviews | Documentation Tool Reviews | Design Tool Reviews | Monitoring Tool Reviews | Log Tool Reviews | Reporting Tool Reviews | Clustering Tool Reviews | Security Tool Reviews | Change Management Tool Reviews | Remote Access Tool Reviews | Book Reviews | Security Tool Reviews | QDPMA Performance Tuning | ADO.NET / ASP.NET | Administration | Analysis/OLAP Services | Application Development | Configuration | Components | ETL | Hardware | High Availability | Hints | Index | Misc | Operating Systems | Performance Tuning | Replication | T-SQL | Views