SQL Server Performance

Find and update duplicate rows

Discussion in 'General Developer Questions' started by SanetteWessels, Aug 28, 2003.

  1. SanetteWessels New Member

    I have a table with the following layout:

    CausalID BucketID Factor PromotionNumber Datestamp
    1 10 0.14 ABC 10 Aug 2003
    2 11 1.00 BCD 11 Aug 2003
    3 11 1.00 BCD 12 Aug 2003
    4 12 0.25 EFG 12 Aug 2003
    5 10 0.80 ABC 11 Aug 2003

    I need to update the factor of CausalID's 1 and 2 to 0.00, because they have duplicate rows according to the BucketID and PromotionNumber. The oldest record will be updated. There will never be more than 2 rows per PromotionNumber duplicated. Please note that the table has more than a 100 million records.

    Any help on this will be appreciated.

    Thanks in advance.

    Sanette

    SQL Developer
    JustEnough Software Corporation

    quote:Walking on water and coding of a spec is easy, as long as both are frozen - A friend
  2. gaurav_bindlish New Member

    U want to update one of 1 or 2 or both? Also for the given data, 1 & 2 are different records. However 1 & 5 are the same.

    Gaurav
    Moderator
    Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard
  3. SanetteWessels New Member

    Correct, I want to update both records 1 and 2. For every duplicated row (1,2,3 and 5), the oldest record per duplicate must be updated. Hope this makes sense.

    Sanette

    SQL Developer
    JustEnough Software Corporation

    quote:Walking on water and coding of a spec is easy, as long as both are frozen - A friend
  4. Twan New Member

    assuming your table name is x

    updatex1
    set factor = 0.0
    fromx x1
    inner join x x2
    onx1.bucketid = x2.bucketid
    andx1.promotionnumber = x2.promotionnumber
    andx1.datestamp < x2.datestamp

    This would ideally want an index on bucketid, promotionnumber and datestamp

    Cheers
    Twan
  5. bambola New Member

    declare @t table (CausalID int, BucketID int, Factor float, PromotionNumber char(3), Datestamp datetime)

    insert into @t
    select 1, 10, 0.14, 'ABC', '10 Aug 2003'
    union all select 2, 11, 1.00, 'BCD', '11 Aug 2003'
    union all select 3, 11, 1.00, 'BCD', '12 Aug 2003'
    union all select 4, 12, 0.25, 'EFG', '12 Aug 2003'
    union all select 5, 10, 0.80, 'ABC', '11 Aug 2003'

    update @t
    set Factor = 0
    from @t a
    where exists
    ( select null
    from @t b
    where a.BucketID = b.BucketID
    and a.Datestamp < b.Datestamp
    )
    select * from @t

    Bambola.
  6. SanetteWessels New Member

    Thanks Twan! It works like a bomb! [<img src='/community/emoticons/emotion-2.gif' alt=':D' />]<br /><br />Sanette<br /><br />SQL Developer<br />JustEnough Software Corporation<br /><i><font size="1">Walking on water and coding of a spec is easy, as long as both are frozen - A friend</font id="size1"></i>

Share This Page