Hello, friends. You have been always such a great help. Please, help me with another problem. I have a table1(value1,value2,value3,date). Combination of (value1,value2) is a primary key, but it wasn#%92t set up. I have duplication of combination (value1,value2) records with value3 and date being different. I would like to keep only one record and delete the second one with the earlier date. Table1 look like it: a b c date a b c1 date1 --should be deleted a1 b1 c date a1 b1 c1 date1 --should be deleted a2 b2 c date a2 b2 c1 date ----should be deleted Thanks again for your help. Alex.
How do you know which one of the records should be deleted. Is it random, or is there some logic you can apply to value3 and date to select the appropriate record? MeanOldDBA derrickleggett@hotmail.com When life gives you a lemon, fire the DBA.
From the sampe data, it looks as if (value1, value2, value3) could also be the primary key. In any case, you need a field that is guaranteed to have a different value on different rows with the same (value1, value2) key, in order to remove the duplicates in one go. You could add an IDENTITY column for that purpose - I've named it NEW_ID in the query below. You can drop de NEW_ID afterwards if you want to. Do this on a copy of the data first and check the results, before committing. DELETE FROM table1 WHERE EXISTS (SELECT TMP.value1 FROM table1 AS TMP WHERE TMP.value1 = table1.value1 AND TMP.value2 = table1.value2 AND TMP.NEW_ID <> table1.NEW_ID AND TMP.date <= table1.date)
Derrick, I think the point was to delete all records that have the same (value1, value2) key, keeping only the one with the earliest date value. You need the unique identifier to avoid deleting all rows with (value1, value2) that have the earliest date value.
Yeah, I just read that. Sorry about missing that one the first time through. MeanOldDBA derrickleggett@hotmail.com When life gives you a lemon, fire the DBA.
U Can Try This Script Also. If u dont want to create identity column in Your table. /* Drop Table TmpA Create Table TmpA (Id Varchar(2),Id1 Varchar(2),Col1 Varchar(2),Dat DateTime) Insert Into TmpA Select 'a', 'b', 'c', GetDate()-1 Insert Into TmpA Select 'a', 'b', 'c1', GetDate()-2 --should be deleted Insert Into TmpA Select 'a1', 'b1', 'c', GetDate()-3 Insert Into TmpA Select 'a1', 'b1', 'c1', GetDate()-4 --should be deleted Insert Into TmpA Select 'a2', 'b2', 'c', GetDate()-5 Insert Into TmpA Select 'a2', 'b2', 'c1', GetDate()-5 ----should be deleted */ --Select Id,ID1,Sum(1) From TmpA Group By Id,ID1 Having Sum(1)>1 --select * from TmpA Declare @ID Varchar(2) Declare @ID1 Varchar(2) Declare @IDx Varchar(2) Declare @IDx1 Varchar(2) Declare @IsFirstRecord Varchar(1) Declare Xyz Cursor For Select Id,ID1 From TmpA Group By Id,ID1 Having Sum(1)>1 OPEN Xyz --Gets The Primary Key Field To Filter the Duplicate Records FETCH NEXT FROM Xyz Into @ID,@ID1 WHILE @@FETCH_STATUS = 0 Begin Select @IsFirstRecord=1 -- 1 Indicates First Record / 0 - Dup records Print @ID + @ID1 Print @IsFirstRecord --Generate Statement 2 Delet Data Declare Abc Cursor OPTIMISTIC For Select ID,ID1 From TMPA Where Id=@ID And ID1= @ID1 OPEN abc FETCH NEXT FROM abc Into @Idx,@Idx1 WHILE @@FETCH_STATUS = 0 BEGIN If @IsFirstRecord = 1 Begin Set @IsFirstRecord=0 FETCH NEXT FROM abc Into @Idx,@Idx1 End Else Begin Delete From TmpA Where Current Of Abc FETCH NEXT FROM abc Into @Idx,@Idx1 Print @Idx + @Idx1 Print 'Del' End END CLOSE abc DEALLOCATE abc --Gets The Next Primary Key Field To Filter the Duplicate Records FETCH NEXT FROM Xyz Into @ID,@ID1 End CLOSE Xyz DEALLOCATE Xyz Thanks Sandy
Sandy, From the looks of it, your script gets the job done, and without adding a field. But as I said, if there is another field value3 that makes (value1, value2, value3) a unique key, then you could compare TMP.value3 <> table1.value3 and get the same results - for much less programming. If the table holds a significant number of rows then this is probably a lot faster too. I use cursors only when there is no other way around it.
If I understand this correct, there is absolutely no need for a cursor. Something like this should also get the job done. I assume Adriaan meant the same. Create Table TmpA (Id Varchar(2),Id1 Varchar(2),Col1 Varchar(2),Dat DateTime) Insert Into TmpA Select 'a', 'b', 'c', GetDate()-1 Insert Into TmpA Select 'a', 'b', 'c1', GetDate()-2 --should be deleted Insert Into TmpA Select 'a1', 'b1', 'c', GetDate()-3 Insert Into TmpA Select 'a1', 'b1', 'c1', GetDate()-4 --should be deleted Insert Into TmpA Select 'a2', 'b2', 'c', GetDate()-5 Insert Into TmpA Select 'a2', 'b2', 'c1', GetDate()-6 ----should be deleted -- the 5 in Sandy's script is likely to be a typo, right? SELECT * FROM tmpA DELETE FROM tmpA WHERE EXISTS (SELECT * FROM tmpA AS T WHERE T.id = tmpA.id AND T.id1 = tmpA.Id1 AND T.Dat < tmpA.Dat) SELECT * FROM tmpA drop table tmpa Id Id1 Col1 Dat ---- ---- ---- ------------------------------------------------------ a b c 2004-09-14 15:42:17.697 a b c1 2004-09-13 15:42:17.707 a1 b1 c 2004-09-12 15:42:17.707 a1 b1 c1 2004-09-11 15:42:17.707 a2 b2 c 2004-09-10 15:42:17.707 a2 b2 c1 2004-09-09 15:42:17.707 (6 row(s) affected) (3 row(s) affected) Id Id1 Col1 Dat ---- ---- ---- ------------------------------------------------------ a b c 2004-09-14 15:42:17.697 a1 b1 c 2004-09-12 15:42:17.707 a2 b2 c 2004-09-10 15:42:17.707 (3 row(s) affected) -- --Frank http://www.insidesql.de
Frank,<br /><br />Basically correct, but if there are two records with the same (id, id1) key that have the same date value, then neither one is deleted. That's why you need both the third field for the key and the >= operator (instead of ><img src='/community/emoticons/emotion-5.gif' alt='' />.<br /><br />Adriaan
quote: I have a table1(value1,value2,value3,date). Combination of (value1,value2) is a primary key, but it wasn#%92t set up. I have duplication of combination (value1,value2) records with value3 and date being different. I would like to keep only one record and delete the second one with the earlier date. One might assume from reading this, that your point cannot happen. Otherwise the question was imprecise. -- --Frank http://www.insidesql.de
Cannot happen? The problem is that we don't know if his real data contains duplicates on (Id, Id1, Dat). If that is the case, then neither of those rows will get deleted (because of the < operator). And then he still has duplicate rows for (Id, Id1) ...
Adriaan,<br />have you read what I quoted from the original question and marked in bold?<br />From what I read there, I assumed that the combination of id, id1 <b>can</b> have duplicated, while the combination of id, id1 and dat <b>cannot</b> have duplicates. <br />Of course, I might be wrong on this [<img src='/community/emoticons/emotion-1.gif' alt='' />]<br /><br />-----------------------<br />--Frank<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a><br />-----------------------<br />
Frank & Adrian, Thanks for the Inputs,Which helped me a lot. But I Used Cursor Just Bcoz , If Alex wants to add his some more where clauses Or Conditions . Thanks Sandy
Frank, As you said, the original question wasn't really clear - he says "value3 and date being different", which suggests that only the combination of all four fields (value1, value2, value3, date) is unique. We just don't know for sure if (value1, value2, date) is unique. If it is, then we can use your query.
Yeah, who knows in the end. ..and the ultimate answer is 42, iirc. Anyway, alex should now have some ideas how to deal with the problem. How do you say "Friede, Freude, Eierkuchen" in Dutch? ----------------------- --Frank http://www.insidesql.de -----------------------
"Vrede, vreugde, eierkoeken" - hm, has a nice ring to it! There has to be a similar saying in Dutch, just cannot think of one at the moment (you know I can't think straight when I'm angry [<img src='/community/emoticons/emotion-6.gif' alt='' />!][}<img src='/community/emoticons/emotion-1.gif' alt='' />][<img src='/community/emoticons/emotion-5.gif' alt='' />][<img src='/community/emoticons/emotion-2.gif' alt='' />]).
Hopefully it wasn't me who upset you [<img src='/community/emoticons/emotion-2.gif' alt='' />]<br />Anyway, I like the sequence of smilies [<img src='/community/emoticons/emotion-1.gif' alt='' />]<br /><br />Oh my, doing time waiting to go home today...<br /><br />-----------------------<br />--Frank<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a><br />-----------------------<br />
Hey, here's a new one from my smilies collection<br /><img src='http://www.insidesql.de/images/PcPunch.gif' border='0' /><br /><br />-----------------------<br />--Frank<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a><br />-----------------------<br />
Hi FrankKalis, I used the following query select t1.* from tmpa t1,tmpa t2 where t1.id=t2.id and t1.id1 = t2.Id1 and t2.dat<t1.dat which produced the required result Which is best in performance? Madhivanan