SQL Server Performance

  • Home
  • Articles
  • Forums
  • Tips
  • Training
  • FAQ's
  • Blogs
  • Software
  • Books
  • About Us
RSS Feeds
Sign in | Join


Article Topics

All Articles
Performance Tuning
Audit
Business Intelligence
Clustering
Reporting Services
SQL Azure
Developer
General DBA
ASP.NET / ADO.NET
SQL Azure

USEFUL SITES :

ASP.NET Tutorials
Windows and SQL Azure Tutorials
Cloud Hosting Magazine
SharePoint Tutorials
Windows Server Help

Write for Us

Share your SQL Server knowledge with others and raise your profile in the community More...
Latest Articles

A High Level Comparison Between Oracle and SQL Server - Part ...
A High Level Comparison Between Oracle and SQL Server - Part ...
A High Level Comparison Between Oracle and SQL Server - Part ...
A High Level Comparison Between Oracle and SQL Server

More     
 
Latest FAQ's

Add Node to A SQL Server failover Cluster failed with invalid ...
SQL Server Destination remote server error
Setting Up Data And Log Files For SQL Server
Will Check Constraints Improve Database Performance?

More     
   
Latest Software Reviews

dbForge Review
Spotlight on ApexSQL Diff - Server-based database comparison tool ...
Spotlight on ApexSQL Data Diff - Server-based database comparison tool ...
Spotlight on ApexSQL Doc 2008

More     

articles >> general dba >> How to Identify and Delete Duplicate SQL ...

How to Identify and Delete Duplicate SQL Server Records

By : Randy Dyess
Jul 26, 2003

Page 2 / 2

/**********************************************
Example of a complex duplicate data delete script.
**********************************************/

/**********************************************
Set up test environment
**********************************************/
SET NOCOUNT ON

--Create test table
IF OBJECT_ID('tDupData') IS NOT NULL
DROP TABLE tDupData
GO

CREATE TABLE tDupData
(
lngCompanyID INTEGER 
,strCompanyName VARCHAR(20)
,strAddress VARCHAR(10)
,dtmModified DATETIME
)

--Create test data
INSERT INTO tDupData VALUES (1,'CompanyOne','Address1','01/15/2003')
INSERT INTO tDupData VALUES (2,'CompanyTwo','Address2','01/15/2003')
INSERT INTO tDupData VALUES (3,'CompanyThree','Address3','01/15/2003')
INSERT INTO tDupData VALUES (1,'CompanyOne','Address1','01/15/2003')
-- Simple Dup Data and complex dup data
INSERT INTO tDupData VALUES (2,'CompanyTwo','Address','01/16/2003')
-- complex dup data
INSERT INTO tDupData VALUES (3,'CompanyThree','Address','01/16/2003')
-- complex dup data
GO

/**********************************************
Finish set up
**********************************************/

/**********************************************
Complex duplicate data
**********************************************/

--Clean table out to include only one row per company
--Create temp table to hold duplicate data
CREATE TABLE #tempduplicatedata
(
lngCompanyID INTEGER 
,strCompanyName VARCHAR(20)
,strAddress VARCHAR(10)
,dtmModified DATETIME
)

--Clean out simple duplicate data first
--Identify and save dup data into temp table
INSERT INTO #tempduplicatedata
SELECT * FROM tDupData
GROUP BY lngCompanyID,strCompanyName,strAddress, dtmModified
HAVING COUNT(*) > 1

--Confirm number of dup rows
SELECT @@ROWCOUNT AS 'Number of Duplicate Rows'

--Delete dup from original table
DELETE FROM tDupData 
FROM tDupData
INNER JOIN #tempduplicatedata
ON  tDupData.lngCompanyID = #tempduplicatedata.lngCompanyID
AND tDupData.strCompanyName = #tempduplicatedata.strCompanyName
AND tDupData.strAddress = #tempduplicatedata.strAddress
AND tDupData.dtmModified = #tempduplicatedata.dtmModified

--Insert the delete data back
INSERT INTO tDupData
SELECT * FROM #tempduplicatedata

--Check for dup data.
SELECT * FROM tDupData
GROUP BY lngCompanyID,strCompanyName,strAddress,dtmModified
HAVING COUNT(*) > 1

--Clean out temp table
TRUNCATE TABLE #tempduplicatedata

--Identify and save dup data into temp table
INSERT INTO #tempduplicatedata (lngCompanyID,strCompanyName)
SELECT lngCompanyID,strCompanyName FROM tDupData
GROUP BY lngCompanyID,strCompanyName
HAVING COUNT(*) > 1

--Confirm number of dup rows
SELECT @@ROWCOUNT AS 'Number of Duplicate Rows'

--Update temp table to add strAddress and dtmModified
UPDATE #tempduplicatedata
SET strAddress = tDupData.strAddress
,dtmModified = tDupData.dtmModified
FROM #tempduplicatedata
INNER JOIN tDupData
ON #tempduplicatedata.lngCompanyID = tDupData.lngCompanyID
AND #tempduplicatedata.strCompanyName = tDupData.strCompanyName


--Delete dup from original table
DELETE FROM tDupData 
FROM tDupData
INNER JOIN #tempduplicatedata
ON  tDupData.lngCompanyID = #tempduplicatedata.lngCompanyID
AND tDupData.strCompanyName = #tempduplicatedata.strCompanyName
AND tDupData.strAddress = #tempduplicatedata.strAddress
AND tDupData.dtmModified = #tempduplicatedata.dtmModified

--Verify original table only has three rows of data
SELECT * FROM tDupData

--Drop temp table
DROP TABLE #tempduplicatedata

--drop test table
IF OBJECT_ID('tDupData') IS NOT NULL
DROP TABLE tDupData
GO

This is a little more complicated than the simple duplicate data delete script, but easy to figure out once you see it. A word of caution here, you should investigate any child tables before you delete data from a table in order to prevent creating orphan rows. You can ether delete the data from the child tables first or update them to reflect the identity key value of the data row in the main table you are going to keep. The choice will be determined by your situation and any operating standards you may have.

While having to clean up duplicate data is not something you should have to do every day, the processes you learn from playing with these two scripts should give you a starting point the next time you find duplicate information in your database.

 

Copyright 2003 by Randy Dyess, All Rights Reserved

Make sure you purchase your copy of Transact-SQL Language Reference Guide from my website www.TransactSQL.Comtoday to learn more about Transact-SQL by reviewing the more than 1200 examples contained within the book


<< Prev Page         








C# Help and Tutorials | PHP MySQL Tutorial | Sharepoint Tutorial | Azure Tutorial | Cloud Hosting Magazine | ASP.NET Tutorials | Windows Server Help | Windows Phone Pro | Silverlight Ace | Visual Studio Tutorials | Home | Peformance Articles | Audit Articles | Business Intelligence Articles | Clustering Articles | Developer Articles | Reporting Services Articles | DBA Articles | ASP.NET / ADO.NET Articles | SQL Server Training Videos | DBA FAQ's | Developer Peformance FAQ's | DBA Peformance FAQ's | Developer FAQ's | Clustering FAQ's | Error Messages | Audit Tool Reviews | Backup Tool Reviews | Coding Tool Reviews | Compare Tool Reviews | Documentation Tool Reviews | Design Tool Reviews | Monitoring Tool Reviews | Log Tool Reviews | Reporting Tool Reviews | Clustering Tool Reviews | Security Tool Reviews | Change Management Tool Reviews | Remote Access Tool Reviews | Book Reviews | Security Tool Reviews | ADO.NET / ASP.NET | Administration | Analysis/OLAP Services | Application Development | Configuration | Components | ETL | Hardware | High Availability | Hints | Index | Misc | Operating Systems | Performance Tuning | Replication | T-SQL | Views


              © 2010 Jude O'Kelly. All rights reserved