SQL Server Performance

delete causing problem!!!!!!!!

Discussion in 'General DBA Questions' started by bfarr23, Apr 4, 2005.

  1. bfarr23 New Member

    ran from query analyzer DELETE FROM .......

    it is supposed to delete like 80 million rows.

    the connection got broken with the below error:

    "ConnectionCheckForData"...
    "General Network Error."
    "Connection Broken".

    I looked in Current Processes and there is a process there. IF I click on this process its the same delete statement that I ran above. But the process is "sleeping" and has Open Transaction count of 2. There are like 1000 I/O's a second taking place there now. The command says "Killed/Rollback".

    I cannot kill this process. I need to.

    why is this rolling back?? (which i assume its doing).
    I just ran delete from......

    what should i do here??
    HELP!!!!


  2. bfarr23 New Member

    maybe stop and start sql server "with no recovery" option?
    does this exist?

    looking for that know.

    i will wait a while for someone to post first.

  3. derrickleggett New Member

    How long has it been rolling back? Your best bet is to just let this finish rolling back, even though I'm sure you don't want to hear that. If you lost the connection during the middle of the transaction and it hadn't committed, it did exactly what it should have done. [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br /><br />MeanOldDBA<br />derrickleggett@hotmail.com<br /><br />When life gives you a lemon, fire the DBA.
  4. bfarr23 New Member

    i ran

    delete from ... where day > ......
    (80 million rows)

    so if sql server had deleted 70 million and the connection broke, then it has to rollback 70 million??

    - been running for like 4 hours now.

    i just did a shutdown with nowait and restarted the server. but when i started sql server back up the I/O are still approx. 1000.sec. Profiler show Transactions: TransactionLog. so i assume its srill rolling back even thouhh no SPID shows a command of "rollback" now.

    did not think sql server would do this with a recovery model for "simple". (but i guess it does)

    i guess i have to wait this out. pissing me off. should be ale to start sql server up with norecovery option. or at least some way to stop this rollback.












  5. derrickleggett New Member

    Remember what the simple recovery mode is for. You still have to maintain transactional consistency across any given transaction context. The log files assist in this by maintaing each transaction until they committed. This allows you to:<br /><br />1. Insure transactional consistency across multiple statements.<br />2. Rollback any given transaction.<br />3. Recover from errors such as broken connections. [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br /><br />The only thing simple recovery mode does for you is make you not have to store each individual row and item of a transaction for recovery purposes. It's only storing the minimal requirements to maintain transactional consistency. It's all part of that great ACID concept.<br /><br />Trace flag 3608 will skip automatic recovery on all databases at startup. I would NOT recommend using this though.<br /><br />Is this a production database?<br /><br />MeanOldDBA<br />derrickleggett@hotmail.com<br /><br />When life gives you a lemon, fire the DBA.
  6. bfarr23 New Member

    ya a production database. i know. why delete 80 million rows in one command. (<img src='/community/emoticons/emotion-1.gif' alt=':)' /><br />should have broken that delete up.<br /><br />i did not know that that one delete statement was one BIG transaction. i figured it would commit after so many deletes, etc.<br /><br /><br />i am letting this rollback still. <br />i hope that this --&gt; Profiler show Transactions: TransactionLog.<br />means it is rolling back. As I have no other indication.<br /><br />I/O's still high.<br /><br />
  7. bfarr23 New Member

    Autocommit transactions
    This is the default mode for SQL Server. Each individual Transact-SQL statement is committed "when it completes". You do not have to specify any statements to control transactions.


    wow. that 80 million row delete. i wonder how far it went before the broken connection!!!!! That's probably in a system table.

    crap.

    Took like 7 hours before the connection was broken. so i guess 7 hours to rollback if same efficiency.



  8. derrickleggett New Member

    It's usually not the same efficiency. Generally, you're looking at 150% timeframe. Depending on how the database files are setup, it can take even longer.

    MeanOldDBA
    derrickleggett@hotmail.com

    When life gives you a lemon, fire the DBA.
  9. derrickleggett New Member

    am letting this rollback still.
    i hope that this --> Profiler show Transactions: TransactionLog.
    means it is rolling back. As I have no other indication.


    You should be okay. Does the sysprocesses table still show it as "rollback"? If so, it's still churning away.

    MeanOldDBA
    derrickleggett@hotmail.com

    When life gives you a lemon, fire the DBA.
  10. bfarr23 New Member

    finally rolled back.

    i just deleted 200,000 records. It took 10 minutes!!!!!!!!!
    so 80 million row delete will take me ~80 hours!!!!!

    is there no faster way to do this?


    delete from xxx where day < (getdate() - 360)

  11. Adriaan New Member

    If there are no tables at the receiving end of a cascading relationship with your xxx table, then you could copy the records that are to remain into a temporary table, TRUNCATE TABLE XXX, and then INSERT INTO XXX SELECT * FROM your temp table.

    But you would probably need to take the database off-line, take a full database back-up beforehand, etc., etc.
  12. bfarr23 New Member

    there are 190 million rows in this table. I need to delete exactly 1/2 of them.<br /><br />your idea would take a long time also i bet. <br />maybe not 80 hours though. (<img src='/community/emoticons/emotion-1.gif' alt=':)' /><br /><br /><br />why is that taking so long? this is a fast db server.<br /><br /><br /><br /><br /><br />
  13. Adriaan New Member

    Your criteria may be part of the problem, so store the value in a variable.

    DECLARE @DT DATETIME
    SET @DT = getdate() - 360

    DELETE FROM XXX WHERE XXX.day < @DT

    ... which may not improve much, so you could try repeating the DELETE with a fixed rowcount, instead of a straight DELETE:

    SET ROWCOUNT 1000
    WHILE EXISTS (SELECT <keycol> FROM XXX WHERE XXX.day < @DT
    BEGIN
    DELETE FROM XXX WHERE XXX.day < @DT
    END
    SET ROWCOUNT 0

    Also, does your date column have an index? And is that date column actually a DATETIME column?
  14. RGKN New Member

    Does the table have many indexes on it? It may be quicker to drop other non clustered indexes not used for the delete and then recreate them afterwards. Because when you delete from a table the indexes are updated where rows are deleted.

    Regards,

    Robert.

    PS As Adriaan suggested you could insert all that you want to keep into a new table, drop the old one, and then sp_rename the new table, and recreate your indexes. However if you foreign keys in other tables then referential integrity may be broken.

  15. RGKN New Member

    Oh yes, make sure you take a backup first...
  16. Adriaan New Member

    quote:Originally posted by Adriaan
    SET @DT = getdate() - 360
    Hm - am I getting sloppy or what? Of course that should read:
    SET @DT = DATEADD(d, -360, GETDATE())
  17. mmarovic Active Member

    Adriaan, why to you think variable should be used? I mean for original query not batched one.

    About indexes: Dropp all indexes you can, then create clustered index on day.
    After delete, drop index on day and recreate indexes as they were before. When dropping drop the clustered one last, when create, create clustered index first.

    I assume you agreed on downtime to complete this task.

    If you delete more rows then it will stay in the table you may consider copying rows in the new table, creating indexes there, then dropping old table and renaming new one.

    Even if you do that and there are still millions row that stays, do it in batches using Adriaan's technique.

    [Edited] Sorry for duplicating others recommendations, it looks like we all answered at the same time.
  18. bfarr23 New Member

    cool. <br /><br />Yes a delete from a clustered index also causes an update to the B-Tree for the non-clustered indexes! Gotta delete the non-clistered indexes.<br /><br />don't wanna drop the clustered index as that will take some time to add back.<br /><br />i am just deleting these 5 million rows at a time. (slow but sure).<br /><br />good idea on the variable. that should save some cpu cycles.<br /><br />yes i am in production with this! (no choice).<br /><br />at least now the DB is under control. will run a DBCC CheckDB shortly to ensure nothing got messed up there while i bounced services, ran shutdown command and also failed over the other node in the cluster. (<img src='/community/emoticons/emotion-1.gif' alt=':)' /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br />
  19. Adriaan New Member

    mmarovic,

    You should whenever possible avoid doing calculations in your WHERE clause. Especially if you do a calculation on a column, as there is no way an index can be used for the comparison.

    Anyway, I ran a batch with the 2 variations of the same query, on a non-indexed date field, and the GETDATE() criteria took 50.07% of the query cost for the batch - so it's probably marginal. Well, marginal can become very noticeable once you're covering millions of records.

    Then again, I just ran a test with the DATEADD around the GETDATE(), and then the query cost evens out to 50% each - that's weird. (And yes, I did include DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE before each query.)

    It turns out, if you look at the execution plan, that the estimated row count for the XXX.Day<=GETDATE() criteria equals the number of rows in the table, whereas for XXX.Day<=DATEADD(d, -360, GETDATE()) the estimated row count is about one-third of the number of rows, exactly the same as for the XXX.Day<=@DT1 criteria.

    You'll just have to play with these factors, but using the variable provides the optimum solution in each variation of my test. As long as you have a calculated criteria value that isn't based on a column value in each row, use a variable.
  20. mmarovic Active Member

    Adriaan, I assumed there is an index on the day column. In this case there is just one condition and you can force index if bad execution plan is used, so there is no problem with variable. However I had problems with variables when we had multiple joins and a lot conditions in where clause. Since variable was used, query optimizer used magic numbers approach and choose less efficient execution plan because it was not able to figure out the narrower criteria was on date column (condition was dateColumn between @Start and @End). To make thing worst it was query dynamically produced by "Query builder" which is part of application I support allowing uneducated users to create custom queries, so I was not able to put hints inside. We solved the problem by changing Query builder's logic. Instead of calculating variables in sql batch produced, dates were calculated on the client and inserted in a query string as constants. That way we know have up to 120 times faster queries.

    b f: sorry for digression.
  21. Adriaan New Member

    mmarovic,

    That is what we would do a lot of the time in the client app that we manage: build the whole query, not just feed parameters to a stored procedure. But depending on how the client app is distributed/published it can be very annoying when you need to make changes - just think about an Access front-end where the whole query string is built up in VBA code, and you need to correct an error in one of the JOINs. Stored procedures can be managed more efficiently in that sort of scenario.
  22. FrankKalis Moderator

    <blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>Originally posted by bfarr23</i><br /><br />cool. <br /><br />Yes a delete from a clustered index also causes an update to the B-Tree for the non-clustered indexes! Gotta delete the non-clistered indexes.<br /><br />don't wanna drop the clustered index as that will take some time to add back.<br /><br />i am just deleting these 5 million rows at a time. (slow but sure).<br /><br />good idea on the variable. that should save some cpu cycles.<br /><br />yes i am in production with this! (no choice).<br /><br />at least now the DB is under control. will run a DBCC CheckDB shortly to ensure nothing got messed up there while i bounced services, ran shutdown command and also failed over the other node in the cluster. (<img src='/community/emoticons/emotion-1.gif' alt=':)' /><br /><hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote"><br />I might have missed this information but how many rows does this table actually have?<br />I mean, if you delete 80 mio rows from a table containing 81 mio rows, it's faster to export the remaining rows, drop the table and recreate it and finally import the rows back in.<br /><br />--<br />Frank<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a><br />
  23. mmarovic Active Member

    quote:That is what we would do a lot of the time in the client app that we manage: build the whole query, not just feed parameters to a stored procedure. But depending on how the client app is distributed/published it can be very annoying when you need to make changes - just think about an Access front-end where the whole query string is built up in VBA code, and you need to correct an error in one of the JOINs. Stored procedures can be managed more efficiently in that sort of scenario.
    I agree about sp advantages, our policy is to use sp only, but that app has its own specific requirements and was developed before current rules were in place.

    Anyway, the point I wanted to make is that scalar expression overhead is usually irrelevant compared to I/O costs (join alghorithm, indexes used, bookmark lookup if any...).
  24. RGKN New Member

    A couple of things

    Frank, he is wanting to delete "exactly half" of the rows in the table.

    BFarr, how's the delete going? One thing that has occurred to me and you may or may not get any improvement in performance. Do you have a unique ID on the table? Is the clustered index on this column? If so you may be able to select a list of the IDs with the dates you wish to delete (before dropping the date field's index). The then you can select a subset of these (1000) using row count into another table and delete from the original temp table sort of like so...

    DECLARE @row_no INT, @row_no2 INT

    SET @row_no = 1 -- so that while test returns true first time
    SET @row_no2 = 1
    -- Create a table to hold a list of all records
    CREATE TABLE #Records_For_Del
    (Record_ID INT NOT NULL)
    -- Create a table to hold a list of 1000 records
    CREATE TABLE #Delete_Records
    (Record_ID INT NOT NULL)

    -- Populate list of records for deletion
    INSERT #Records_For_Del (Record_ID)
    SELECT Record_ID FROM Your_Table
    WHERE date < @date

    --This is where you need to drop your indexes on you main table. Also if you find it runs quicker you may be able to create and index on the first temporary table

    -- Ensure only 1000 records are dealt with at a time
    SET ROWCOUNT 1000
    WHILE @row_no > 0
    BEGIN
    INSERT #Delete_Records(Supporter_URN)
    SELECT Record_ID FROM #Records_For_Del
    SET @row_no = @@ROWCOUNT -- When no more records are deleted this should stop
    SELECT @row_no
    SET @row_no2 = 1 -- needs to be reset otherwise it always causes the next while loop to fail
    WHILE @row_no2 > 0
    BEGIN
    -- Delete the records from your main table
    DELETE Your_Table
    FROM Your_Table yt
    INNER JOIN #Delete_Records dr
    ON yt.Record_ID = dr.Record_ID

    SET @row_no2 = @@ROWCOUNT
    SELECT @row_no2
    END
    -- Remove the records from the list of records you identified so that it does not try to redo the same ones and get stuck in a loop.
    DELETE #Records_For_Del
    FROM #Records_For_Del sfd
    INNER JOIN #Delete_Records dr
    ON sfd.Record_ID = ds.Record_ID
    -- Empty the list of 1000 records so that you can repopulate it
    TRUNCATE TABLE #Delete_Records
    END

    Any way, posted because I am doing something similar at the moment. It may not be quicker and it may be buggy at the moment, you'll need to debug it.

    The reason for using a temp table with a subset is that the join for deletion is then between a much smaller table and therefore it may do the delete faster. Any thoughts chaps?

    Regards,

    Robert.

    PS It currently needs a spot of debugging...


  25. FrankKalis Moderator

    quote:
    Frank, he is wanting to delete "exactly half" of the rows in the table.
    Hmpf, must have missed this information when reading this thread the first time...

    --
    Frank
    SQL Server MVP
    http://www.insidesql.de
  26. steve.haran New Member

    Hi, <br /><br />I recently 'archived' our 150 gig database - removing about 50%. I had exactly the same issue with the delete taking 80 hours plus - even breaking up into smaller deletes. The solution we chose in the end was:<br /><br />Create a new table<br />INSERT required rows into new table<br />DROP old table<br />SP_RENAME new table.<br /><br />The above took considerably less time (a couple of hours rather than 80+).<br /><br /><br /><blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>Originally posted by bfarr23</i><br /><br />there are 190 million rows in this table. I need to delete exactly 1/2 of them.<br /><br />your idea would take a long time also i bet. <br />maybe not 80 hours though. (<img src='/community/emoticons/emotion-1.gif' alt=':)' /><br /><br /><br />why is that taking so long? this is a fast db server.<br /><br /><br /><br /><br /><br /><br /><hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote">
  27. satya Moderator

    You can do this if you can afford down time on the application database, if not only way is to scheduele the delete operation during less traffic hours.

    Satya SKJ
    Contributing Editor & Forums Moderator
    http://www.SQL-Server-Performance.Com
    This posting is provided “AS IS” with no rights for the sake of knowledge sharing.

Share This Page