Delete Duplicate Records! | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Delete Duplicate Records!

Hi all,
I don’t know why is this not deleting the duplicate records? FYI… I can delete using cursor. Any help avoiding cursor would be really appreciated. Thanks Dan –Find the duplicate records
SELECT
MAX(empid) AS EmpID,
fName,
lName,
Title
FROM
Employee
GROUP BY
fName,
lName,
Title
HAVING
COUNT(*) > 1 ORDER BY EmpID
— Delete duplicate records
DELETE FROM
emp1
FROM
Employee AS emp1
INNER JOIN
(
SELECT
MAX(empid) AS EmpID,
fName,
lName,
Title
FROM
Employee
GROUP BY
fName,
lName,
Title
HAVING
COUNT(*) > 1 ) AS emp2
ON(
emp1.fName = emp2.fName
AND
emp1.lName = emp2.lName
AND
emp1.Title = emp2.Title
AND
emp1.EmpID <> emp2.EmpID
)

Check: http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=3385 HTH Luis Martin
Moderator
SQL-Server-Performance.com Although nature commences with reason and ends in experience it is necessary for us to do the opposite, that is to commence with experience and from this to proceed to investigate the reason.
Leonardo Da Vinci Nunca esperes el reconocimiento de tus hijos, eso ocurrirá luego de tu muerte
All postings are provided “AS IS” with no warranties for accuracy.
Looks like there are a lot of information about deleting duplicates from table but didn’t get the exact answer that I was looking for. I do not want to use the cursor, I can’t add identity column in table because it is a huge table and can’t change schema of the table. Also it’s almost impossible to choose all columns to delete records and add that deleted records back(if I choose only few columns, it will update original table with null on other columns) as mentioned in the link that Satya had provided. If i add identity column and delete the duplicate records, can i drop that identity column? because I can’t change the schema of the table. Sorry, I was just trying to optimize the script because I am using this script inside the Stored Procedure(executes by sql job). Thanks for the reply LuisMartin.
Dan

Als refer the methods suggested by nr here
http://sqlteam.com/forums/topic.asp?TOPIC_ID=6256 Madhivanan Failing to plan is Planning to fail
Yes, you can drop that identity column.
Thanks everyone for your valuable time. I am exploring the different methods now to see which one is more efficient. So far I have got only cursor and while loop working but none of the others are working. Thanks Dan
Check this
http://www.sql-server-performance.com/rd_delete_duplicates.asp —————————————-
http://spaces.msn.com/members/dineshasanka

Thanks for the link dineshasanka but it will be very complicated to include all the columns in select to delete and add back the data as mentioned in the script because the table has more than 50 columns. The given example was just a sample of data. If i don’t select all the columns, it will only add the data for selected columns and rest of the row or (column value) will be null. And also some of the columns in two duplicate rows are not identical duplicates so I can’t GROUP BY and get the duplicates. I’ve tried this also. Dan

Finally, i got it guys(Not actual code). Thanks all for your help. SELECT DISTINCT EmployeeID, fName, lName
INTO #tmpEmp_To_Del
FROMEmployee(nolock)
GROUP BY EmployeeID, fName, lName
HAVING count(*) > 1 –Declare local variables
DECLARE @rowcount int,
@EmployeeID int,
@fName varchar(30),
@lName varchar(30) WHILE (SELECT count(1) FROM #tmpEmp_To_Del ) > 0
BEGIN
SELECT TOP 1 @EmployeeID= EmployeeID,
@fName = fName,
@lName = lName
FROM #tmpEmp_To_Del SELECT @rowcount = COUNT(*)-1
FROM Employee e (NOLOCK)
WHERE(e.Employeeid = @EmployeeID
ANDe.fName = @fName
ANDe.lName = @lName) SET ROWCOUNT @rowcount — Deleting the duplicate records
DELETE e
FROM Employee e (NOLOCK)
WHERE(e.Employeeid = @EmployeeID
ANDe.fName = @fName
ANDe.lName = @lName)
SET ROWCOUNT 0 DELETE #tmpEmp_To_Del
WHEREEmployeeid = @EmployeeID
ANDfName = @fName
ANDlName = @lName)
END
]]>