SQL Server Performance

Long running delete

Discussion in 'SQL Server 2005 T-SQL Performance Tuning' started by rklimes, Feb 6, 2008.

  1. rklimes New Member

    I had tried to run a delete on a table base on rows in another for archiving but the statement ran for over 3 hours and eventually failed because ran out of space in trans log. I am deleting 13.5 million rows from a table that has 22 million total row, 48 columns wide. This is the statement used:
    delete from <database>.dbo.<table>
    where id in (select id from <database_arch>.dbo.<table>)
    when I select on the same criteria ( select * from <database>.dbo.<table> where id in (select id from <database_arch>.dbo.<table>)) the results are returned in 22 and a half min. The size of the data being deleted is 4.6GB but when i was doing the delete the transaction log grew to over 60GB before running out of space.
    How much overhead is there for statements written to the trans log? I suppose I could change the recovery model to simple to avoid the translog running out of space but why would it grow that big to begin with? And why did it take so long?
    Would using join instead of subquery increase performance? Any comment to help me understand what happened would be great. Thanks.




  2. Luis Martin Moderator

    That is what happened when you use full recovery model.
    Change to simple to do that and back to full after delete.
  3. jagblue New Member

    Or run delete in batch
    say commit after 1000 Delete
  4. gfalk New Member

    Deletion in batch / recovery modes / etc as stated are useful... mass deletions suck since they're record by record.
    Another tip: drop the indexes on the table beforehand so the DELETE doesn't have to update those... you'd pretty much have to rebuild them afterwards anyway so you're not losing much.
    Alternatively, use a SELECT INTO to a scratch table to save the rows you do want to keep, truncate or outright drop the original one, and copy the data back / rename, and finally recreate the indexes.


  5. vbkenya New Member

    1. I don't think using a join would impact greatly (if at all) on the performance of your delete operation. Check the estimated execution plans for the subquery or join. You might be surprised to find that they are not very different.
    2. After the temporary change to a simple recovery model, It would help to do smaller delete operations on the table. Try something like:
    CREATE PROC DeleteTable
    @MaxID int
    AS
    DELETE from <database>.dbo.<table>
    WHERE id in (select id from <database_arch>.dbo.<table> WHERE id<@MAxID)
    RETURN
    You would then call the procedure many times over with increasing values of the @MaxID parameter:
    EXEC DeleteTable 500000
    GO
    EXEC DeleteTable 1000000
    GO
    EXEC DeleteTable 1500000
    GO
    EXEC DeleteTable 2000000
    GO
  6. satya Moderator

    Do not change recovery model!
    Do not drop indexes!
    Deleting by batches if best one to go and intermittent transaction log backup will also ensure the transaction log size, in this case you coudl run this as a scheduled job for better performance as running from a client connection will have network latency.
    If you drop indexes and run the delete it will run quickly, but think about recreating the same index on the number of rows and during this time the table(s) will not be available to the application and you must ensure to reindex once the process is finished in order to cache the best plan.
    (Nathan, good to see you here again)
  7. rklimes New Member

    Thanks for all the suggestions. What I have decided with is to do batch deletions of 10000 at a time stopping every million to do trans log backup. would the following query be sufficient?
    declare @i int
    set @i = 0
    while @i<1000000
    BEGIN

    delete top (10000) from <database>.dbo.<table>
    where id in (select id from <database_arch>.dbo.<table>)

    IF @@rowcount < 10000 BREAK

    set @i = @i +10000
    END
  8. satya Moderator

    Sounds good, if the current transaction log can cope up such load, use DBCC SQLPERF(LOGSPACE) to see how the space is utilized on log side.

Share This Page