SQL Server Performance Forum – Threads Archive

deleting duplicates

I need to write a query that will delete duplicates from a table. The table has one primary keys and there are many duplicate rows for different primary keys. How do I delete them. e.g., the table looks like studentid student_name student_subjects
———- ————- —————–
1 Mike Maths
2 Vista History
3 Mike Maths
4 Susie Geography
5 Prus French
6 Susie Geography
7 Abacu English
8 Collin Moral
The output of my query shold look like : studentid student_name student_subjects
———- ————- —————–
1 Mike Maths
2 Vista History
4 Susie Geography
5 Prus French
7 Abacu English
8 Collin Moral Thanks, StarWarsBigBang

declare @table table
(
studentid int,
student_name varchar(20),
student_subjectsvarchar(20)
) insert into @table
select1, ‘Mike’, ‘Maths’ union all
select2, ‘Vista’, ‘History’ union all
select3, ‘Mike’, ‘Maths’ union all
select4, ‘Susie’, ‘Geography’ union all
select5, ‘Prus’, ‘French’ union all
select6, ‘Susie’, ‘Geography’ union all
select7, ‘Abacu’, ‘English’ union all
select8, ‘Collin’, ‘Moral’ delete t
[email protected] t
inner join
(
selectstudent_name, student_subjects, studentid = min(studentid)
[email protected]
group by student_name, student_subjects
having count(*) > 1
) k
ont.student_name= k.student_name
andt.student_subjects= k.student_subjects
andt.studentid<> k.studentid select*
[email protected] /*
studentid student_name student_subjects
———– ——————– ——————–
1 Mike Maths
2 Vista History
4 Susie Geography
5 Prus French
7 Abacu English
8 Collin Moral
*/ KH

I wanted a query in general……………my actual table has several thousand records and I can’t use this method. Thanks, StarWarsBigBang

it is in general. delete t
fromyourtable t
inner join
(
selectstudent_name, student_subjects, studentid = min(studentid)
fromyourtable
group by student_name, student_subjects
having count(*) > 1
) k
ont.student_name= k.student_name
andt.student_subjects= k.student_subjects
andt.studentid<> k.studentid KH

since it is SQL Server 2005 read about RANK () Functions in SQL SERVER BOL. I think ROW_NUMBER() can do the trick.
run the following query in Adventureworks db and see the result and relate to your scenario. SELECT ROW_NUMBER() OVER
(PARTITION BY ProductID ORDER BY ProductID, LocationID) as SequenceNo, *
FROM Production.ProductInventory; Madhu

What is the isssue with khtan’s approach? It works fine. BUT, If you don’t care which record it deletes as long as there are no duplicates then you can use this easier approach too. Use DatabaseName
GO create table student(studentid int
, student_name varchar(25)
, student_subjects varchar(20)) insert into student(studentid, student_name, student_subjects)
values (1, ‘Mike’, ‘Maths’)
insert into student(studentid, student_name, student_subjects)
values (2, ‘Vista’, ‘History’)
insert into student(studentid, student_name, student_subjects)
values (3, ‘Mike’, ‘Maths’)
insert into student(studentid, student_name, student_subjects)
values (4, ‘Susie’, ‘Geography’)
insert into student(studentid, student_name, student_subjects)
values (5, ‘Prus’, ‘French’)
insert into student(studentid, student_name, student_subjects)
values (6, ‘Susie’, ‘Geography’)
insert into student(studentid, student_name, student_subjects)
values (7, ‘Collin’, ‘Moral’)
GO –This is the delete query
DELETE FROM dbo.student
WHERE [studentid] IN
(
SELECTa.studentid
FROMdbo.student a,
dbo.student b WHERE a.studentid != b.studentid
AND a.student_name = b.student_name
AND a.student_subjects = b.student_subjects
AND a.studentid < b.studentid
) GO
drop table student
Thanks, Name
———
Dilli Grg (1 row(s) affected)

]]>