SQL Server Performance

Update performance

Discussion in 'T-SQL Performance Tuning for Developers' started by Slart, Sep 2, 2003.

  1. Slart New Member

    Hi,<br /><br />I'm a newly appointed SQL DBA (I'm a systems engineer, our previous DBA left, I'm being given the chance to fill his shoes). I am trying to make an update script run fast and have not been able to achieve acceptable results. I am soliciting advice on how to achieve best performance. I hope someone can help.<br /><br />I two tables (they happen to be in different databases). In the Source table I have Name and ID (a varchar(11)). In the Destination table I have the same fields, but ID is not populated.<br /><br />For every record in the destination table I am trying to populate it's ID based on the values in the source table.<br /><br />I have freshly clustered indexes on the ID field of both tables. The tables have been moved to separate disks (via a filegroup). The Source table has 673M rows, the destination table as 207M rows.<br /><br />The hardware is a 8-way, 700MHz server with 32G of ram. The data lies on raid-0 arrays (this is development data). This is sql 2000 on windows 2003 server.<br /><br />The first thing I tried was an update statement like this:<br /><br />Use Destination_Datbase<br />updateDestination<br />set Destination.Name=Source.Name<br />from Destination, Source.NameIDxref Source<br />where Destination.ID = Source.NameIDxref.ID<br /><br />I expected it to run several hours. My boss keeps harping on me that we cant just let things run forever without an idea of when it will complete. After 43 hours I cancelled it (it took 15 hours to roll back the transaction <img src='/community/emoticons/emotion-1.gif' alt=':)' />).<br /><br />When that didnt work I decided to write a t-sql script that looped through the records. I've been playing with this all day. The best performance I have been able to achieve is 1.8 million rows/hr (which gives this a 5 day runtime). I need this to exceed 5M rows/hr.<br /><br />There is something odd that I notice: If I get rid of the "update" command, the script that runs 1.8 million/hr increases to 16 million/hr. So apparently the update is the majority of the work here. I'm only updating one field.<br /><br />So as it stands the update is my slowest component by a factor of 10. Is there any way to increase the performance of the update? (BTW, the update does not modify any fields that are a component of the clustered index).<br /><br />Even though I know I"m going to get pelted with "row by row are you crazy" replies, I'm open to any suggestions.<br /><br />Thanks in advance<br /><br />Steve Sadler<br />Newbie DBA<br /><br /><br /><br />
  2. Twan New Member

    Hi Steve,

    - You have an index on source.id, source.name?
    - How many columns and what datatypee are in table source?
    - What indexes are there on destination?
    - Are there any null names in the source table?
    - Are the names in the desitation table all null at the moment?
    - was the destination table created with a large amount of free space?


    Is it feasible to bcp or dts the data out of a join between the two tables and then bcp it back into the destination table?
    Is it feasible to create a second destination table populate it using a join between destination and source, index the second destination table and then drop destination and rename the second destination table

    If neither of those are feasible then you could do something like the following NOTE that this may not be that efficient since it will have to look up the name is null, which will get slower and slower as more and more rows are updated in the destination table

    use destination_db
    go

    declare @rowcount
    set @rowcount = 10000

    set rowcount 10000
    while @rowcount = 10000
    begin
    begin tran

    update d
    set d.name = s.name
    from destination d,
    source_db.dbo.nameidxref s
    where d.id = s.id
    and d.name is null

    set @rowcount = @@rowcount
    commit tran
    end
  3. SanetteWessels New Member

    I am confused...

    > In the Destination table I have the same fields, but ID is not populated.

    Why then is the name field updated and not the id.

    My suggestion would be:
    1. Create an index on the name
    and then 2.

    UPDATE Destination
    SET d.ID = s.ID
    FROM Destination d
    INNER JOIN Source s ON d.Name = s.Name


    Sanette

    SQL Developer
    JustEnough Software Corporation
    Walking on water and coding of a spec is easy, as long as both are frozen - A friend
  4. gaurav_bindlish New Member

    What is the location of the transaction log file for destination database? Is it on a different physical disk / array? Where is the destination table located? On the same server or different server?

    Is there a reason for not upgrading to SP3 for SQL Server?

    As suggested by Twan, I would recommend updating in baches of 100,000. Keeps the Transaction Log file size small and makes the update faster.

    The use of READ UNCOMMITTED for source table and keeping destination in SIMPLE RECOVERY mode will also help.

    Gaurav
    Moderator
    Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard
  5. Twan New Member

    I assumed (probably incorrectly) that the initial question was supposed to talk about trying to populate name, since that is what the example showed... <img src='/community/emoticons/emotion-1.gif' alt=':)' /><br /><br />Cheers<br />Twan
  6. Slart New Member

    Thanks to all of your for your time and experience.<br /><br />To answer a couple of questions:<br /><br />&gt;&gt; In the Destination table I have the same fields, but ID is not populated.<br />&gt; Why then is the name field updated and not the id.<br /><br />My apoligies, that was incorrect. ID is populated in both tables, I was trying to get Name into Destination by cross referencing ID to Name in the source.<br /><br />&gt; What is the location of the transaction log file for destination database? Is it on a different physical disk / array? <br /><br />The destination databaese's log is on a different physical array than that databases's data, however it does reside on the same physical array as one of the datafiles that the source database sits on.<br /><br />&gt; Where is the destination table located? On the same server or different server?<br /><br />Same server.<br /><br />&gt; Is there a reason for not upgrading to SP3 for SQL Server?<br /><br />My error. It is running SP3A, without the MS03-031 security rollup.<br /><br />&gt; The use of READ UNCOMMITTED for source table and keeping destination in SIMPLE RECOVERY mode will also help.<br /><br />I have both DBs are set for simple recovery. I did not specify READ UNCOMMITTED. Are you referring to the command "SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED", or is there another location to set this per database?<br /><br />&gt; - You have an index on source.id, source.name? <br />&gt; - What indexes are there on destination?<br /><br />The source and destination both have a clustered index on ID. There is no other index on either.<br /><br />&gt; How many columns and what datatypee are in table source?<br /><br />There are 11 fields, mostly varchar. The average length (based on data size / row count) is 150 bytes/row. BTW our previous DBA designed the table so I'm not at fault <img src='/community/emoticons/emotion-1.gif' alt=':)' /><br /><br />&gt; - Are there any null names in the source table?<br /><br />Yes, a few.<br /><br />&gt; - Are the names in the desitation table all null at the moment?<br /><br />They started null (I added this field to the table just for this process), but I have since updated 20M million (of the 207M). <br /><br />&gt; - was the destination table created with a large amount of free space?<br /><br />The clustered index on the destination has a fill factor=100. The reason I did this was because there aren't going to be any inserts, just updates to a single field in each. Was this a mistake?<br /><br />The destination table has 45GB of data, and is in a file that has another 45GB free.<br /><br /><br />Please pardon my ignorance, I have a lot to learn and I'm learning as fast as possible. I have already learned quite a bit from these replies alone. I will try running them in smaller batches and look up BCP & DTS.<br /><br />Thanks again for your help. I am learning a lot from this discussion.<br /><br />Steve Sadler<br />
  7. Slart New Member

    Hey twan,<br /><br />In your sample code, is the purpose of Setting @rowcount=10000, then setting @rowcount=@@rowcount, to cause it to continue to loop until the update command processes less than 10,000 rows? Ingenious. <img src='/community/emoticons/emotion-1.gif' alt=':)' /><br /><br />I had to look up @@rowcount and "set rowcount" to understand, but it is very cool..<br />
  8. gaurav_bindlish New Member

    There is no need for starting Transaction explicitly in the batch update code as update command itself starts an implicit transaction.

    Gaurav
    Moderator
    Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard
  9. Twan New Member

    Hi Slart,

    fillfactor=100 for the clustered index means that any addition of data causing a row to expand will require a page split, which is a relatively expensive operation... The bottom level of a clustered index is the datapages themselves, so 100 means also fill the dataages to 100%.

    If you are just building this table first off, then I'd suggest exporting the desired data using dts, and then import it back in, finally adding the index.

    If the updates that you want to keep making to the table expand varchar column or set null values to non null values, then allow some slack in the clustered index to avoid page splitting


    Yeah the purpose of setting @rowcount to 10000 and then while @rowcount = 10000 is to avoid that one last iteration of the loop where there no longer are any rows to process... A lot of people use while @rowcount > 0... If you have nulls in the source table then you'd have to add 'and s.name is not null' to avoid an infinite loop...

    I prefer explicit transactions to relying on implicit ones, in older versions I used to name the transaction with a string something like '<table> <n> rows updated' which used to let me keep track of where the batch was at, but I could no longer find the transaction name in SQL2K, so I stopped doing that...

    I thought that an implicit transaction would be started by the first update call and then not committed until either a commit tran or until the end of the batch... The explicit transaction means that you can kill the job when you are sick of it, and run it again the next night. It would carry one where it left off rather than start from scratch...

    In this case DTSBCP sounds like a more attractive option, since you could even bcp into a new table to allow the old destination table to still be accessed while you're creating the new one...

    Cheers
    Twan
  10. gaurav_bindlish New Member

    Twan,

    you'll need to commit every time only in the cases where IMPLICIT_TRANSACTIONS setting is ON for the connection. If this is not set, every update will be comitted as soon as it completes execution and ypu'll still have the flexibility of killing the process in the middle of completion.

    I agree with TWAN on the comments on fill factor. When data is being inserted in table, if all the data is not being inserted in the last page, it is not advisable to keep fill factor 100 as that will lead to both external and internal fragmentation. For details on the same refer tohttp://www.transactsql.com/Articles/fragmentation.html.

    HTH.

    Gaurav
    Moderator
    Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard
  11. Twan New Member


    ah ok, I see the default is OFF for OLEDB or ODBC unless ANSI_DEFAULTS is set. I think that I would still use explicit transactions as it makes the intention clear...

    must admit that the default behaviour is different thatn what I expected... I expected that if you had a procedure with say two updates and the second one fails, then the procedure rolls back, but judging from this only the second statement would roll back unless transaction were explicitly defined or implicit_transactions was on...

    Cheers
    Twan
  12. gaurav_bindlish New Member

    That's correct!

    Gaurav
    Moderator
    Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard
  13. Slart New Member

    Guys,

    Thanks again for the help. I am learning quite a bit. I ended up using an update similar to what twan posted. It processed 14x faster than the cursor I had written, and it also gave me the ability to measure progress & stop/start if needed. Thanks!


    One question about fillfactor: I am updating a column that I defined as char(25) (as opposed to varchar). Currently it has null values. Does sql save space on fields that have null values even when they are of the char datatype?

    In this case is there a reason not to do fillfactor=100?

    Thanks,

    Steve Sadler
  14. gaurav_bindlish New Member

    Depends on the ANSI_NULLS settings. If its ON, space is saved. If OFF, no space is reserved. Fillfactor is not ment to leave spaces for columns. It is meant to leave free space on data pages so that additional rows can be inserted.

    Gaurav
    Moderator
    Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard
  15. Slart New Member

    If it seems like my question keeps changing, it's because I finished the first update and now I'm doing another, and the rules are different.

    I will not be inserting any rows to this database (this is actually source data for another database). It does currently have nulls in a char(25) field, and I will be filling them with 25 byte strings. Ansi_nulls is set to

    Is there a way to preallocate the space for those 25 bytes?

    I have to reindex the data again (to a different index key) before I do this second update. Considering I wont be adding any row, but I will be adding space to each row (and taking into account your mentioning that this is not the purpose of fillfactor), will a lower fillfactor do me any good at all?

    Thanks again,

    Steve


  16. Slart New Member

    BTW Ansi_Nulls is set to on.
  17. gaurav_bindlish New Member

    What do you mean by preallocation? If ANSI_NULLS is ON, the space is there. There is a good article in INSIDE SQL SERVER 2000 about reading the content of a page.

    Having a lower fill factor when the space used by a row is incresing definately helps. if there is space on the page, the rows can expand and so there will be less splitting of pages and also the internal and external fragmentation will be less.

    Gaurav
    Moderator
    Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard
  18. Slart New Member

    Thanks again for your (everybody's) help. I think I'm good to go.

    Steve Sadler

Share This Page