SQL Server Performance

Update/delete large table with duplicated records

Discussion in 'T-SQL Performance Tuning for Developers' started by little, Mar 21, 2003.

  1. little New Member

    Hello all.
    I wanted to flag ducplicated records in
    a large table (more than 100.000 records)
    so I tried to use the next script.

    The problem is that it took a large amount of
    times. Any ideas to optimize it?

    Is there another way to flag or delete duplicated
    records?

    Thanks

    Update Large_table
    set field_to_Flag= 1
    WHERE
    EXISTS (
    SELECT
    NULL
    FROM

    Large_table b
    WHERE

    Large_table.Field1 = Large_table.Field1
    AND b.Field2 = Large_table.Field2
    AND b.Field3 = Large_table.Field3 GROUP BY

    b.Field1, b.Field2, b.Field3
    HAVING

    Large_table.id < MAX(b.id)
    )

  2. bradmcgehee New Member

    Have you viewed an excution plan of the query to see what is going on internally? For example, perhaps adding one or more indexes might help this query. An execution plan will tell you is indexes are being used or not.

    ------------------
    Brad M. McGehee
    Webmaster
    SQL-Server-Performance.Com
  3. little New Member

    This table dot not have index at all.
    I just added unique identifier field.
    Will you be more explicit about your
    solution?
  4. little New Member

    Another precision:
    If I add indexes on this table,
    the main software(3rd party) won't work.
  5. Chappy New Member

    Im not sure I understand why the app wouldnt work. Adding an non-constrained index should be transparent to any application using the table, unless the table is being recreated etc. Also, the index is there to improve performance, not to prevent duplicates. If you cant create any indices at all, Id guess youre going to struggle to improve performance much.

    Alternatively, perhaps you could create a brand new table with the same schema, with an 'ignore duplicates' index on Field1,Field2,Field3 and then do a select into from your existing table into the new one. This would be ok for a one-off process but not really something I would want to do regularly.
  6. bradmcgehee New Member

    Without indexes, the only option SQL Server has is to perform index scans, which often hurt performance. Like Chappy says, I don't see how adding an index will prevent third-party software from working. We are always adding indexes databases accessed by third party software and I have never seen a problem. Can you be more specific?

    ------------------
    Brad M. McGehee
    Webmaster
    SQL-Server-Performance.Com
  7. testertest New Member

    I have had more luck with large tables not using EXISTS in the where clause!

    This type of query tends to run more effiently and quicker.

    UPDATE LARGE_TABLE
    SET field_to_flag = 1
    FROM LARGE_TABLE
    JOIN ( SELECT LARGE_TABLE.Field1,
    LARGE_TABLE.Field2,
    LARGE_TABLE.Field3,
    MAX(LARGE_TABLE.id) as ID
    FROM LARGE_TABLE
    GROUP BY
    LARGE_TABLE.Field1,
    LARGE_TABLE.Field2,
    LARGE_TABLE.Field3 ) AS A ON A.Field1 = LARGE_TABLE.Field1
    AND A.Field2 = LARGE_TABLE.Field2
    AND A.Field3 = LARGE_TABLE.Field3
    WHERE LARGE_TABLE.id <> A.ID
  8. dtipton New Member

    To identify duplicate rows:

    Select count(distinct(field1, field2,field3) from large_table having count(distinct(field1, field2,field3) > 1

    If you only have a few you can run this query and then take corrective action on a row by row basis
  9. thomas New Member

    Or if you want to know which values are duplicated, run

    select column1 from table
    group by column1
    having count(column1) > 1

    Tom Pullen
    DBA, Oxfam GB

Share This Page