SQL Server Performance

Script to delete duplicate records in a table.

Discussion in 'Getting Started' started by salmasyed, Jan 21, 2009.

  1. salmasyed New Member

    Hi,
    Can anyone suggest script to delete duplicate recods in a duplicate
    Thanks,
    Salma
  2. arunyadav Member

    Hello Salma,
    There colud be two cases:
    Case1: No promary key is defined and whole record is duplicate
    SELECT distinct C1,C2,C3,...,Cn INTO #TEMPDATA FROM TABLENAME
    TRUNCATE TABLE TABLE_NAME
    INSERT INTO TABLE_NAME (C1,C2,C3,...,Cn) AS
    SELECT C1,C2,C3,...,Cn FROM #TEMPDATA
    CASE 2: PRIMARY KEY IS DEFINED AND THE ONLY COLUMN WHICH IS HAVING DIFFERENT VALUE FOR EACH DUPLICATE RECORD.
    DELETE FROM TABLE_NAME WHERE C1 NOT IN (SELECT MAX(C1) FROM TABLE_NAME GROUP BY C1,C2,C3,...,Cn)
    C1 is the primary key.
    ====================
    you can also refer to the below article:
    http://www.sql-server-performance.com/articles/dba/delete_duplicates_p1.aspx
    Arun
  3. salmasyed New Member

    Thanks for the script Arun..
    But, I need to delete duplicate records without using temporary table and my table doesn't have primary key defined as well.
    Can you please suggest how to delete within table without using primary key constraint and temporary table?
    Thanks!!
  4. Adriaan New Member

    Is this a class-room exercise? These are the two recommended methods.
    I could offer another suggestion involving an identity column, but if you're not even "allowed" to use a temp table then you are in a very peculiar situation.
    There is yet another option, but it's worse than the others, and it involves looping through a cursor - for which there is absolutely no point.
    I would not consider having a table without a primary key (or at least a unique constraint). Add an identity column if there is no candidate key.
  5. madhuottapalam New Member

    As eveyone mentioned there are many methods. you may try CTE method. and also another method would be creating a temp table with a unique index with IGNORE Duplicate option.create
    table #tt(col int ,name char(10))insert
    #tt select 1,'aaa'insert
    #tt select 1,'aaa'insert
    #tt select 1,'aaa'insert
    #tt select 2,'bb'insert
    #tt select 2,'bb'insert
    #tt select 2,'bb'insert
    #tt select 3,'cc'insert
    #tt select 3,'cc';
    with tt (SequenceNo,col,name) as(SELECT
    ROW_NUMBER() OVER(PARTITION BY Col,name ORDER BY col,name) as SequenceNo, *from #tt
    )delete
    from tt where sequenceno<>1select
    *From #tt
    Madhu
  6. FrankKalis Moderator

    If you are on SQL Server 2005 and above I would got with Madhu's suggestion, but I suspect this to be a classroom exercise anyway. [;)]
  7. RedDevils New Member

    check this blog....www.sqlity.blogspot.com
    -Abhijit, MCP
  8. kvdba New Member

    DELETE FROM MyTable WHERE ID NOT IN ( SELECT MAX(ID) FROM MyTable GROUP BY DuplicatevalueColumn1, DuplicateValueColumn2, DuplicateValueColumn2)

Share This Page