avoiding duplicates | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

avoiding duplicates

Dear sir/madam, I have got a table which contains about 30,000 rows including duplicate rows. Now, I’ve been asked to remove the duplicate records from this table and I should not allow any duplicates in the future. How to get around this problem?? Please mail me the solution to: [email protected] Regards,
Rajendar ok
You can remove the duplicate records from your table by something like the following: BEGIN TRAN
SELECT DISTINCT * FROM myTable INTO #temptable
DELETE FROM myTable
INSERT INTO myTable SELECT * FROM #temptable
DROP #temptable
COMMIT TRAN Next you have to define a Primary Key on the table, this will prevent duplicates from being inserted.
If you have an identity field or some other unique value in table you can do this: DELETE
FROM table_name
WHERE EXISTS
(
SELECT NULL
FROM table_name a WITH (NOLOCK)
WHERE table_name.unique_column = a.unique_column
HAVING table_name.id_column + < MAX(a.id_column)
) unique_column – is the column you need to be DISTINCT.
id_column – is an identity column.
You can change the MAX() function to your needs. Bambola.
I just had the same issue and used the info below which is from an article at
http://www.devguru.com/features/tutorials/AdvancedSQL/advancedSQL.html it worked great. I set it up as a stored procedure which uses a view to return the duplicates Finding Duplicates – I modified these to my data and set it up as a view
If we group records together by certain identifying fields we can then use a Count function to extract those that are duplicated. This query utilizes a sub-query and, based on their first and last names, returns a list of DelegateIDs for those delegates that have duplicate records. SELECT Delegate.DelegateID, Delegate.FirstName, Delegate.LastName
FROM Delegate INNER JOIN
(SELECT FirstName, LastName
FROM Delegate
GROUP BY FirstName, LastName
HAVING Count(DelegateID) > 1
) AS Duplicates
ON Delegate.FirstName = Duplicates.FirstName
AND Delegate.LastName = Duplicates.LastName Removing Duplicates
The following query removes all duplicates from the delegate table, leaving only the originals of the duplicated record. Here, we define a record as having a duplicate if there is another record with matching FirstName and LastName fields. This query relies on a view being set up that returns a list of duplicate delegates. The view is called DupDels. DELETE FROM Delegate
WHERE DelegateID =
ANY ( SELECT DupDels.DelegateID
FROM DupDels LEFT JOIN
(SELECT Min(DelegateID) AS DelegateID, FirstName, LastName
FROM DupDels GROUP BY FirstName, LastName) AS FirstDup
ON DupDels.DelegateID = FirstDup.DelegateID
WHERE FirstDup.DelegateID IS NULL
) This worked out great for me and I hope it helps. MacDaddio

I just had the same issue and used the info below which is from an article at
http://www.devguru.com/features/tutorials/AdvancedSQL/advancedSQL.html it worked great. I set it up as a stored procedure which uses a view to return the duplicates Finding Duplicates – I modified these to my data and set it up as a view
If we group records together by certain identifying fields we can then use a Count function to extract those that are duplicated. This query utilizes a sub-query and, based on their first and last names, returns a list of DelegateIDs for those delegates that have duplicate records. SELECT Delegate.DelegateID, Delegate.FirstName, Delegate.LastName
FROM Delegate INNER JOIN
(SELECT FirstName, LastName
FROM Delegate
GROUP BY FirstName, LastName
HAVING Count(DelegateID) > 1
) AS Duplicates
ON Delegate.FirstName = Duplicates.FirstName
AND Delegate.LastName = Duplicates.LastName Removing Duplicates
The following query removes all duplicates from the delegate table, leaving only the originals of the duplicated record. Here, we define a record as having a duplicate if there is another record with matching FirstName and LastName fields. This query relies on a view being set up that returns a list of duplicate delegates. The view is called DupDels. DELETE FROM Delegate
WHERE DelegateID =
ANY ( SELECT DupDels.DelegateID
FROM DupDels LEFT JOIN
(SELECT Min(DelegateID) AS DelegateID, FirstName, LastName
FROM DupDels GROUP BY FirstName, LastName) AS FirstDup
ON DupDels.DelegateID = FirstDup.DelegateID
WHERE FirstDup.DelegateID IS NULL
) This worked out great for me and I hope it helps. MacDaddio

]]>