SQL Server Performance

avoiding duplicates

Discussion in 'T-SQL Performance Tuning for Developers' started by erajendar, Apr 4, 2003.

  1. erajendar New Member

    Dear sir/madam,

    I have got a table which contains about 30,000 rows including duplicate rows.

    Now, I've been asked to remove the duplicate records from this table and I should not allow any duplicates in the future.

    How to get around this problem??

    Please mail me the solution to: erajendar@rediffmail.com



    Regards,
    Rajendar

    ok
  2. Jacco New Member

    You can remove the duplicate records from your table by something like the following:

    BEGIN TRAN
    SELECT DISTINCT * FROM myTable INTO #temptable
    DELETE FROM myTable
    INSERT INTO myTable SELECT * FROM #temptable
    DROP #temptable
    COMMIT TRAN

    Next you have to define a Primary Key on the table, this will prevent duplicates from being inserted.
  3. bambola New Member

    If you have an identity field or some other unique value in table you can do this:

    DELETE
    FROM table_name
    WHERE EXISTS
    (
    SELECT NULL
    FROM table_name a WITH (NOLOCK)
    WHERE table_name.unique_column = a.unique_column
    HAVING table_name.id_column + < MAX(a.id_column)
    )

    unique_column - is the column you need to be DISTINCT.
    id_column - is an identity column.
    You can change the MAX() function to your needs.

    Bambola.
  4. MacDaddio New Member

    I just had the same issue and used the info below which is from an article at
    http://www.devguru.com/features/tutorials/AdvancedSQL/advancedSQL.html

    it worked great. I set it up as a stored procedure which uses a view to return the duplicates

    Finding Duplicates - I modified these to my data and set it up as a view
    If we group records together by certain identifying fields we can then use a Count function to extract those that are duplicated. This query utilizes a sub-query and, based on their first and last names, returns a list of DelegateIDs for those delegates that have duplicate records.

    SELECT Delegate.DelegateID, Delegate.FirstName, Delegate.LastName
    FROM Delegate INNER JOIN
    (SELECT FirstName, LastName
    FROM Delegate
    GROUP BY FirstName, LastName
    HAVING Count(DelegateID) > 1
    ) AS Duplicates
    ON Delegate.FirstName = Duplicates.FirstName
    AND Delegate.LastName = Duplicates.LastName

    Removing Duplicates
    The following query removes all duplicates from the delegate table, leaving only the originals of the duplicated record. Here, we define a record as having a duplicate if there is another record with matching FirstName and LastName fields. This query relies on a view being set up that returns a list of duplicate delegates. The view is called DupDels.

    DELETE FROM Delegate
    WHERE DelegateID =
    ANY ( SELECT DupDels.DelegateID
    FROM DupDels LEFT JOIN
    (SELECT Min(DelegateID) AS DelegateID, FirstName, LastName
    FROM DupDels GROUP BY FirstName, LastName) AS FirstDup
    ON DupDels.DelegateID = FirstDup.DelegateID
    WHERE FirstDup.DelegateID IS NULL
    )

    This worked out great for me and I hope it helps.

    MacDaddio
  5. MacDaddio New Member

    I just had the same issue and used the info below which is from an article at
    http://www.devguru.com/features/tutorials/AdvancedSQL/advancedSQL.html

    it worked great. I set it up as a stored procedure which uses a view to return the duplicates

    Finding Duplicates - I modified these to my data and set it up as a view
    If we group records together by certain identifying fields we can then use a Count function to extract those that are duplicated. This query utilizes a sub-query and, based on their first and last names, returns a list of DelegateIDs for those delegates that have duplicate records.

    SELECT Delegate.DelegateID, Delegate.FirstName, Delegate.LastName
    FROM Delegate INNER JOIN
    (SELECT FirstName, LastName
    FROM Delegate
    GROUP BY FirstName, LastName
    HAVING Count(DelegateID) > 1
    ) AS Duplicates
    ON Delegate.FirstName = Duplicates.FirstName
    AND Delegate.LastName = Duplicates.LastName

    Removing Duplicates
    The following query removes all duplicates from the delegate table, leaving only the originals of the duplicated record. Here, we define a record as having a duplicate if there is another record with matching FirstName and LastName fields. This query relies on a view being set up that returns a list of duplicate delegates. The view is called DupDels.

    DELETE FROM Delegate
    WHERE DelegateID =
    ANY ( SELECT DupDels.DelegateID
    FROM DupDels LEFT JOIN
    (SELECT Min(DelegateID) AS DelegateID, FirstName, LastName
    FROM DupDels GROUP BY FirstName, LastName) AS FirstDup
    ON DupDels.DelegateID = FirstDup.DelegateID
    WHERE FirstDup.DelegateID IS NULL
    )

    This worked out great for me and I hope it helps.

    MacDaddio

Share This Page