Site sponsored by: Idera The gold standard of SQL Server performance monitoring & diagnostics.
SQL Server Performance

  • Home
  • Articles
  • Forums
  • Tips
  • Quiz
  • FAQ's
  • Blogs
  • Software
  • Books
  • About Us
RSS Feeds
Sign in | Join


Article Topics

All Articles
Performance Tuning
Audit
Business Intelligence
Clustering
Reporting Services
Developer
General DBA
ASP.NET / ADO.NET

Write for Us

Share your SQL Server knowledge with others and raise your profile in the community More...
Latest Articles

Recover Data Using Database Snapshots
Analyze and Fix Index Fragmentation in SQL Server 2008
Powerful Geographical Visualisations made easy with SQL 2008 Spatial (Part 2) ...
Backup User Databases Using a Maintenance Plan

More     
 
Latest FAQ's

How to alter a User Defined Data Type?
How to unzip a File in SSIS?
How to view previous query plans?
ALTER TABLE SWITCH statement failed because the object '%.*ls' is not ...

More     
   
Latest Software Reviews

Spotlight on ApexSQL Doc 2008
ApexSQL Enforce
Embarcadero Change Manager
SQL Server DBA Dashboard

More     

articles >> general dba >> How to Identify and Delete Duplicate SQL ...

How to Identify and Delete Duplicate SQL Server Records

By : Randy Dyess
Jul 26, 2003

Recently, I was asked to help someone clean up their database after they had double loaded an import file. The problem they were having in identifying and deleting the duplicate information was the fact that a timestamp is applied to each row of data as it is inserted into the table. While the rest of the row of data was duplicated, the timestamp made the row unique. It was this uniqueness that caused the simple methods of determining and deleting duplicate data to fail. They needed a way to delete data from a table in which they determine the criteria of what made the data duplicate.

After helping them out with their problem, I decided to write a short article to show the simple solution I came up with to delete the duplicate data from a table, even if that data is considered unique by SQL Server. I know there are many ways to delete duplicate data, but bear with me as I explain my way. As always, if you have another way, great write it up and let us know about it. If not, look over these scripts and see if you can use them to create your own method.

Before I get into the example that actually deals with the described problem, I am going to start by showing a method to delete simple duplicate data for those who may be new to SQL Server and do not know how to clean up duplicate data from a table.

 

/**********************************************
Example of a simple duplicate data delete script.
**********************************************/

/**********************************************
Set up test environment
**********************************************/
SET NOCOUNT ON

--Create test table
IF OBJECT_ID('tDupData') IS NOT NULL
DROP TABLE tDupData
GO

CREATE TABLE tDupData
(
lngCompanyID INTEGER 
,strCompanyName VARCHAR(20)
,strAddress VARCHAR(10)
,dtmModified DATETIME
)

--Create test data
INSERT INTO tDupData VALUES (1,'CompanyOne','Address1','01/15/2003')
INSERT INTO tDupData VALUES (2,'CompanyTwo','Address2','01/15/2003')
INSERT INTO tDupData VALUES (3,'CompanyThree','Address3','01/15/2003')
INSERT INTO tDupData VALUES (2,'CompanyTwo','Address','01/16/2003') 
INSERT INTO tDupData VALUES (3,'CompanyThree','Address','01/16/2003')

-- Dup Data 
INSERT INTO tDupData VALUES (1,'CompanyOne','Address1','01/15/2003') 
GO

/**********************************************
Finish set up
**********************************************/

/**********************************************
Simple duplicate data
**********************************************/

--Create temp table to hold duplicate data
CREATE TABLE #tempduplicatedata
(
lngCompanyID INTEGER 
,strCompanyName VARCHAR(20)
,strAddress VARCHAR(10)
,dtmModified DATETIME
)

--Identify and save dup data into temp table
INSERT INTO #tempduplicatedata
SELECT * FROM tDupData
GROUP BY lngCompanyID,strCompanyName,strAddress, dtmModified
HAVING COUNT(*) > 1

--Confirm number of dup rows
SELECT @@ROWCOUNT AS 'Number of Duplicate Rows'

--Delete dup from original table
DELETE FROM tDupData 
FROM tDupData
INNER JOIN #tempduplicatedata
ON  tDupData.lngCompanyID = #tempduplicatedata.lngCompanyID
AND tDupData.strCompanyName = #tempduplicatedata.strCompanyName
AND tDupData.strAddress = #tempduplicatedata.strAddress
AND tDupData.dtmModified = #tempduplicatedata.dtmModified

--Insert the delete data back
INSERT INTO tDupData
SELECT * FROM #tempduplicatedata

--Check for dup data.
SELECT * FROM tDupData
GROUP BY lngCompanyID,strCompanyName,strAddress,dtmModified
HAVING COUNT(*) > 1

--Check table
SELECT * FROM tDupData

--Drop temp table
DROP TABLE #tempduplicatedata

--drop test table
IF OBJECT_ID('tDupData') IS NOT NULL
DROP TABLE tDupData
GO

As you can see, it is not hard to delete data that is duplicated across all columns of a table. What is harder to do is to delete data that you consider duplicate based on your business rules while SQL Server considers it unique data. This usually happens when one or more columns contain different data, but your business rules have determined that because the main columns of the table are the same, you have duplicate data. This usually happens when you have a problem during a data load and data is loaded multiple times generating new timestamps or identity values for each row. The identify value or the data field will cause uniqueness in the data and the simple delete method will fail.


    Next Page>>    








Home | Peformance Articles | Audit Articles | Business Intelligence Articles | Clustering Articles | Developer Articles | Reporting Services Articles | DBA Articles | ASP.NET / ADO.NET Articles | DBA FAQ's | Developer Peformance FAQ's | DBA Peformance FAQ's | Developer FAQ's | Clustering FAQ's | Error Messages | Audit Tool Reviews | Backup Tool Reviews | Coding Tool Reviews | Compare Tool Reviews | Documentation Tool Reviews | Design Tool Reviews | Monitoring Tool Reviews | Log Tool Reviews | Reporting Tool Reviews | Clustering Tool Reviews | Security Tool Reviews | Change Management Tool Reviews | Remote Access Tool Reviews | Book Reviews | Security Tool Reviews | QDPMA Performance Tuning | ADO.NET / ASP.NET | Administration | Analysis/OLAP Services | Application Development | Configuration | Components | ETL | Hardware | High Availability | Hints | Index | Misc | Operating Systems | Performance Tuning | Replication | T-SQL | Views


              © 1999-2008 by T10 Media. All rights reserved