SQL Server Performance Forum – Threads Archive

Complex Deduplication (Best Practices)

I am trying to develop a quick and easy method to determine duplicates based off a specific field. For simplicity reasons I have two records. These records contain the same data except the primary key and the date. When I deduplicate these records I want to maintain the records with the latest date. (Running SQL Server 2000 and the database is non-relational) Any ideas will help, Thanks.

delete from a
from table a
join table b on a.column = b.column
where a.date < b.date
You need an index starting with "column" for good performance.

]]>