SQL Server Performance

update table without transaction log?

Discussion in 'Performance Tuning for DBAs' started by ralph, Sep 2, 2005.

  1. ralph New Member

    Hi out there. <img src='/community/emoticons/emotion-1.gif' alt=':)' /><br /><br />I searched the online books, this forum, the some of the rest of the internet without finding a hint. Maybe someone here can help me with an answer.<br /><br />There is a database that gets tuncated each night and filled with new data (bulk insert). After that some sql statements fill other tables with their results (so people save time the next work day as they do not have to wait for these standard queries).<br /><br />The database contains ~ 390,000,000 rows and uses ~ 10 GB. It is only used to create some statistics. Even loss of the whole db wouldn't drive anyone mad.<br /><br />The first of the above mentioned steps is <br /><br />alter table [files] add ext char(3) NULL<br />update [files] set ext = left(ext_long,3) from [files]<br /><br />It takes 7 minutes to complete -- as it runs nightly, this is ok by me. But unfortunately it also writes some 10 GB Transaction Log File. :-(<br />Actually I do not really have that much space. And I guess it could also improve performance if the log wouldn't be written.<br /><br /><br /><b>The recovery model is 'simple'.<br />Is there any way to disable t-logging completely for a specific database? </b>I really do not need it here.<br />I guess you cannot disable t-log for just one transaction because the log would be inconsistent afterwards?<br /><br /><br />Or, is there a way to update the table in blocks of, let's say, one million, and shrink the log after each block? The main table [files] will not be edited by anyone else.<br /><br /><br />Thanks in advance for your tips. <img src='/community/emoticons/emotion-1.gif' alt=':)' /><br /><br />Ralph
  2. Adriaan New Member

    If it's not a production database with lots of data entry, why not set a maximum size for the log file?

    Not sure if that would work with all recovery models.
  3. FrankKalis Moderator

    You surely can update in batches


    SET ROWCOUNT < whatever >
    UPDATE...
    BACKUP...
    < whatever is needed >

    Updating in smaller batches rather than a single large transaction should keep in log from growing that much in simple recovery

    --
    Frank Kalis
    Microsoft SQL Server MVP
    http://www.insidesql.de
    Ich unterstütze PASS Deutschland e.V. http://www.sqlpass.de)
  4. mmarovic Active Member

    Why don't you create table with additional nullable column in place, then bulk insert?
  5. ghemant Moderator

    Hi,<br />please refer this topics :<br /><br /><a href='http://vyaskn.tripod.com/track_sql_database_file_growth.htm' target='_blank' title='http://vyaskn.tripod.com/track_sql_database_file_growth.htm'<a target="_blank" href=http://vyaskn.tripod.com/track_sql_database_file_growth.htm>http://vyaskn.tripod.com/track_sql_database_file_growth.htm</a> </a> <br /><br /><a href='http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=8175' target='_blank' title='http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=8175'<a target="_blank" href=http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=8175>http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=8175</a></a><br /><br /><a href='http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=8032' target='_blank' title='http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=8032'<a target="_blank" href=http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=8032>http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=8032</a></a><br /><br /><a href='http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=6338' target='_blank' title='http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=6338'<a target="_blank" href=http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=6338>http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=6338</a></a><br /><br /><a href='http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=4398' target='_blank' title='http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=4398'<a target="_blank" href=http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=4398>http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=4398</a></a><br /><br /><br />[<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br />Regards<br /><br /><br />Hemantgiri S. Goswami<br />ghemant@gmail.com<br />"Humans don't have Caliber to PASS TIME , Time it self Pass or Fail Humans" - by Hemant Goswami<br />
  6. satya Moderator

    You can enable recovery model to BULK_LOGGED when you want to insert the data in bulk with batches.

    Satya SKJ
    Moderator
    http://www.SQL-Server-Performance.Com/forum
    This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
  7. mmarovic Active Member

    Following-up satya's suggestion, BOL:
    quote:Bulk-Logged Recovery
    The Bulk-Logged Recovery model provides protection against media failure combined with the best performance and minimal log space usage for certain large-scale or bulk copy operations. These operations are minimally logged:

    SELECT INTO.


    Bulk load operations (bcp and BULK INSERT).


    CREATE INDEX (including indexed views).


    text and image operations (WRITETEXT and UPDATETEXT).
    In a Bulk-Logged Recovery model, the data loss exposure for these bulk copy operations is greater than in the Full Recovery model. While the bulk copy operations are fully logged under the Full Recovery model, they are minimally logged and cannot be controlled on an operation-by-operation basis under the Bulk-Logged Recovery model. Under the Bulk-Logged Recovery model, a damaged data file can result in having to redo work manually.

    In addition, the Bulk-Logged Recovery model only allows the database to be recovered to the end of a transaction log backup when the log backup contains bulk changes. Point-in-time recovery is not supported.

    In Microsoft® SQL Server™ 2000, you can switch between full and bulk-logged recovery models easily. It is not necessary to perform a full database backup after bulk copy operations complete under the Bulk-Logged Recovery model. Transaction log backups under this model capture both the log and the results of any bulk operations performed since the last backup.

    The backup strategy for bulk-logged recovery consists of:

    Database backups.


    Differential backups (optional).


    Log backups.
    Backing up a log that contains bulk-logged operations requires access to all data files in the database. If the data files are not accessible, the final transaction log cannot be backed up and all committed operations in that log will be lost.

    To recover in the event of media failure

    Back up the currently active transaction log. For more information, see Transaction Log Backups.


    Restore the most recent full database backup.


    If differential backups exist, restore the most recent one.


    Apply in sequence all transaction log backups created since the most recent differential or full database backup.


    Manually redo all changes since the most recent log backup.


    Important If the active transaction log is lost (for example, due to hardware failure on the disk containing the transaction log files), all transactions in that log are lost. To prevent loss of the active transaction log, place the transaction log files on mirrored disks.
  8. Akthar New Member

  9. mmarovic Active Member

    quote:Truncating the transaction log:

    BACKUP LOG { database_name | @database_name_var }
    {
    [ WITH
    { NO_LOG | TRUNCATE_ONLY } ]
    }

    quote:NO_LOG | TRUNCATE_ONLY

    Removes the inactive part of the log without making a backup copy of it and truncates the log. This option frees space. Specifying a backup device is unnecessary because the log backup is not saved. NO_LOG and TRUNCATE_ONLY are synonyms.

    After backing up the log using either NO_LOG or TRUNCATE_ONLY, the changes recorded in the log are not recoverable. For recovery purposes, immediately execute BACKUP DATABASE.
    To summarize, you can combine bulk_logged recovery model with backup log with truncate_only.
  10. ralph New Member

    Whow! Thanks for your answers; I'll read them carefully. <img src='/community/emoticons/emotion-1.gif' alt=':)' />)<br /><br />@Adriaan: Normally the log sizes are restricted to avoid filling the log volume completetely. I'd expect each DB to fail if this happened. I set the size to unrestricted to find out how much additional log space is needed.<br /><br />@mmarovic: It took me a moment, but I think that now I undestand your idea: I could import the same source column into two different target columns; the values will be cut to a length of 3 automatically. I'll try that immediately!
  11. mmarovic Active Member

    quote:mad:mmarovic: It took me a moment, but I think that now I undestand your idea: I could import the same source column into two different target columns; the values will be cut to a length of 3 automatically. I'll try that immediately!
    Actually I didn't read carefully enough [:I] Now I think that you don't need new column at all, you can allways
    select cast(ext_long as varchar(3)) as ext from [files]
    Btw, when you perform huge inserts/updates/deletes follow Frank's suggestion.
  12. satya Moderator

    Even though when you maintin SIMPLE recovery model it takes a process to increase the Tlog, in this case during large delete or insert you can run BACKUP LOG ... TRUNCATE_ONLY option in order to clear out logical partition of Tlog and ensure to maintain full database backup right after the process is completed, otherwise you may have troubles in addition to any issues after the process.

    Satya SKJ
    Moderator
    http://www.SQL-Server-Performance.Com/forum
    This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
  13. ralph New Member

    I'm back with the first result. <img src='/community/emoticons/emotion-2.gif' alt=':D' /><br />I made the 'transform data' task import the source column twice. This makes importing run only ~ 1 minute longer (20 =&gt; 21 minutes) while I save the time used for the 'update'-statement. And -- main thing -- as the import itself is not being logged, the complete tlog for all steps is &lt;&lt; 500 MB. Yeah! <img src='/community/emoticons/emotion-1.gif' alt=':)' />)<br /><br />After importing the data, several statements like this one:<br />insert into [Files.ExtByYear] select ext, year(zeit) as Jahr, count(*) as Menge, sum(Groesse) as Summe from files group by year(zeit), ext order by jahr desc <br /><br />will run, most of which use the 'ext'-column. I found that I cannot improve performance significantly by adding an index on ext (creating the index takes more time than the total of saved time for later queries). But I am afraid of slowing down the whole thing if the size-three-extension is calculated for each query. Anyways... I can try. <img src='/community/emoticons/emotion-2.gif' alt=':D' /><br /><br /><b>@satya</b>, I am not shure wheter I'll be able to 'schedule' the log backup in the 7 minutes time frame. additionally backing up the (half) log might take a while, so that I will probably not be able to shrink the log before it is full. Well, I didn't try - that's what I would expect.
  14. satya Moderator

    If you perform WITH TRUNCATE_ONLY clause then it will take less than a minute and in order to proceed with this you need to make sure to perform full database backup and in general it is not recommended on a production environment.

    If you perform the Tlog backup frequently say every 5 minutes then the SQL will take care of speedup of backup in this case.

    Satya SKJ
    Moderator
    http://www.SQL-Server-Performance.Com/forum
    This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
  15. mmarovic Active Member

    It may help if you make either real or computed columns year(zeit) and substring(ext_long, 1, 3) and index on both columns. Also, if you can afford to occasionaly lose the whole database, you can go with bulk_logged recovery model in combination with backup log ... with trucate_only.
  16. ralph New Member

    Actually I do not need a backup at all. The database is truncated every night before the new data gets imported and it even doesn't matter if we'd lose the db during daytime, because we'd lose only some statistics about file usage.

    The db's purpose is to get some anonymized statistics of the files which employees put on network drives. As this amount is growing rapidly we're trying to find ways to deal with it. First thing is to find out where the space goes.

  17. ralph New Member

    quote:Originally posted by FrankKalis

    You surely can update in batches


    SET ROWCOUNT < whatever >
    UPDATE...
    BACKUP...
    < whatever is needed >

    Updating in smaller batches rather than a single large transaction should keep in log from growing that much in simple recovery

    I tried to use this but I did not find a way to address the next block in a second step.

    I mean:


    SET ROWCOUNT 10000000
    UPDATE myStuff
    BACKUP ... truncateonly

    will modify my first million rows. How can I reach the other millions of rows?
  18. mmarovic Active Member

    Try:
    set noCount on
    --
    set rowCount 10000
    --
    While 1=1 begin
    --
    update ...
    --
    if @@rowCount = 0 break
    --
    end --while
    --
    set rowCount 0
    --
    set noCount off
    [edited]Fixed typo and added set NoCount commands.

    Better set rowCount to smaller value like 10000.
  19. dtipton New Member

    Sounds like you have your answer, but I have a question concerning all of the suggestions to truncate the log?

    If you are running one massive update statement:

    update [files] set ext = left(ext_long,3) from [files]

    How would truncating the log help?

    Unless you are commiting the updated rows along the way (see above for Row Count example), wouldn't the update be seen as one transaction and none of the work could be truncated from the log until it completescommits or a rollback is performed?

    Is that correct or am I missing something?

  20. mmarovic Active Member

    That's correct, hence update in batches will keep transaction log size under control during updates execution.
  21. ralph New Member

    Thanks again for your kind help. <img src='/community/emoticons/emotion-1.gif' alt=':)' /><br /><br />I'm experimenting with this right now.
  22. ralph New Member

    Sorry to bug you again... :-(

    For a test I created a table 'DropMe' with two columns a (char(10)) and b (char(3)), filled 40 rows of a with 'aaaaaaaaaa' and ran these lines:



    use [SD-FEP]
    set noCount on
    set rowCount 10

    While 1=1 begin
    update DropMe set b = left(a,3)
    if @@rowCount = 0 break

    --
    end --while
    set rowCount 0
    set noCount off


    It didn't stop by itself after about half a minute so I interrupted it. The result is, that the first 10 rows contain 'aaa' then, but the rest is still empty. What am I doing wrong?
  23. mmarovic Active Member

    You have to add criteria in where clause that will destinguish between rows that were already updated and rows that are not. There is another technique how batch update can be done in case there is no such criteria or you don't have appropriate index to make it fast enough.
    declare @last int
    declare @buffer int
    declare @start int
    --
    select @last = max(filesPK) from [files]
    --
    set @start = 1
    set @buffer = 10000
    --
    while @start <= @last begin
    --
    update [files]
    ...
    where filesPk between @start and @start + @buffer - 1
    --
    set @start = @start + @buffer
    --
    end--while

  24. Adriaan New Member

    Ralph,

    If you use an UPDATE statement on a table, you don't need a WHILE loop to cover all rows. UPDATE affects all records that would be returned by the SELECT variation of the same query:

    So this:
    UPDATE MyTable SET Col = 'aaaa'

    ... will update the same rows that you see returned by this:
    SELECT Col FROM MyTable

    And don't ever use a WHILE 1=1 loop!
  25. mmarovic Active Member

    Adrian, while is used to split updates in batches. Take a look at earlier posts of the thread.
  26. Adriaan New Member

    Mirko,

    Thanks for the suggestion. Still the suggestion is never to use WHILE 1=1.

    Use an EXISTS clause, like this:

    SET ROWCOUNT 40
    WHILE EXISTS (SELECT * FROM DropMe WHERE b <> left(a,3) OR b IS NULL)
    BEGIN
    UPDATE DropMe SET b = left(a,3) WHERE b <> left(a,3) OR b IS NULL
    END
    SET ROWCOUNT 0
  27. mmarovic Active Member

    About 1=1, I'll just say I am not purist, I used it to avoid @rowCount variable assingment because it was not possible to use @@rowCount before the first iteration.

    Where not exists works but significantly slows down each iteration, since no sargable condition is possible.

    I believe solution with using pk (hopefully identity column) should be much faster even if there are a lot of holes in identity column values. If pk is not identity there is another similar solution, but let's first ralph tell us if he tested solutions we proposed.

    [Edite] The solution with identity primary key may be improved if first iteration starts with min(identityColumn) value.
  28. Adriaan New Member

    I wouldn't be surprised if in the end the straight update query would result in less locking issues than running it in batches. Remember, UPDATE queries can take quite a lot of time to finish. So a series of UPDATE queries will take lots and lots of time to finish.
  29. mmarovic Active Member

    Yes, batched updates are slower, but the problem we try to solve here is keeping tranaction log size under controll during update. If we don't batch updates we have one long running transaction that will not allow log to be truncated on checkpoint before it is completed.
  30. Adriaan New Member

    I wouldn't worry too much about the performance with non-sargeable criteria. If the batch size is not too big, the query stops when enough records have satisfied the criteria, and the EXISTS already stops after finding a single match.

    But okay, the last EXISTS check will take longer as no record will satisfy the criteria. Again, I wouldn't worry too much.
  31. mmarovic Active Member

    quote: I wouldn't worry too much about the performance with non-sargeable criteria. If the batch size is not too big, the query stops when enough records have satisfied the criteria, and the EXISTS already stops after finding a single match.
    The problem is that table is big and after each batch more and more rows will be scaned untill existing one is found. First few (maybe even 100 or so depending on the batch size) impact will be small, but after significant number of rows is updated... as you said at the and the whole table (or index if there is covering one) will be scanned.
  32. Adriaan New Member

    ... then my main point must be: why does he need to run this update query in the first place?

    If it's a query that he needs to run on a regular basis then either there's something wrong with his data model (by definition, a column that copies the three initial characters of another column is redundant) or there's something missing in the data entry part of his system (then remedy this in a trigger for INSERT and UPDATE).
  33. mmarovic Active Member

    I agree with your conclusion, as you can see from couple of my of previous posts in this thread.
  34. Adriaan New Member

    Well, it's Friday afternoon and I got into this thread only on Page 2. There's a half-naked girl on Page 3, at least if you;re in England.[<img src='/community/emoticons/emotion-2.gif' alt=':D' />]
  35. mmarovic Active Member

    Ok, let's make page 3. [<img src='/community/emoticons/emotion-5.gif' alt=';)' />]
  36. Adriaan New Member

    Be there or be square!

Share This Page