SQL Server Performance

Updating a Range

Discussion in 'Performance Tuning for DBAs' started by larrykl, Dec 11, 2004.

  1. larrykl New Member

    Hi All,

    I need to do an update on a large number of records in a table. The records to update are determined by the value of a 'groupID' column in the table. Records are updated at varying times during the day.

    The problem I have is that one of these groups has gotten vary large, so the update query operates on a large number of records now. I believe this is causing locking/blocking problems. I figure that if I can break up the update into chunks, it will allow other updates on the same table to occur. In that same table, I have another column called 'rank'. This column has a unique value from 1 to N in it for every record in the group. There are over 5000 records in the large group. My new update query updates this large table in chunks now based on "rank>0 and rank<=500", "rank>500 and rank<=1000", etc. There are now 11 chunks. With this change, the query is executed 11 times but is twice as slow as the single query that updates all of the records in one shot. That's not bad but that brings me to my question:

    Will putting an index on the column "rank" help this query perform better? The rank column changes frequently. After I do the above mentioned updates whether it be in chunks or all at once, I then re-order the rank column. What can I do to help performance now that "rank" will be in the where clause of my chunky version?

    simplified example (old):

    update scoring set totalpts=(select num_correct from results where results.userid=scoring.userid) where groupid=@group_id

    simplified example (new):

    while @i<=@num_reps

    begin

    update scoring set totalpts=(select num_correct from results where results.userid=scoring.userid) where groupid=@group_id and rank>(@i-1)*500 and rank<=(@i*500)

    set @i=@i+1

    end

    Thanks in advance!

    Larry
  2. thomas New Member

    Why don't you just limit it with rowcount? Forget the where clause on rank, and the need for a new index, do it more simply, and you can chunk it by your rowcount setting. Something along the lines of (and this may not be 100% correct, but you'll get my drift..)

    declare @counter int
    set @counter = 0

    set rowcount 5000 -- chose to be a sensible chunk size

    while @counter < (select count(*) from totalpts where groupid=@group_id)

    begin

    begin tran --explicit transaction stops the log blowing too big if it's all done in one

    update scoring set totalpts=(select num_correct from results where results.userid=scoring.userid) where groupid=@group_id

    commit tran

    set @counter = @counter + 5000

    end

    set rowcount 0


    Hopefully this will do it in 5000 row batches until all your rows have been updated.

    HTH

    Tom Pullen
    DBA, Oxfam GB
  3. larrykl New Member

    That looks like a possibility but your implementation would continually update only the first 5000 each time, wouldn't it? There is nothing there to exclude the rows that have already been operated on.
  4. thomas New Member

    yes you may have to add an

    AND totalpts<>(select.. )

    to the update statement to exclude the ones which have already been done, which may slow it down. Might be worth a try though?

    Tom Pullen
    DBA, Oxfam GB
  5. Twan New Member

    Hi ya,

    yeah having an index on the combination ( groupid, rank, userid) will help. It does mean taking a hit when you update the rank column, but that's the price I guess...

    also rather than subselecting you could try joining

    update scoring
    set totalpts=num_correct
    from scoring
    inner join results
    on results.userid=scoring.userid
    where groupid=@group_id
    and rank>(@i-1)*500 and rank<=(@i*500)

    this would miss out those with no results at all, unless you use a left outer join

    Cheers
    Twan
  6. larrykl New Member

    Thanks for the tips!

    I think the extra query would hurt performance in the rowcount method.
    Twan, would a clustered index be in order here?
  7. Adriaan New Member

    Do you really need to store the total? You can always run a query to retrieve the current total based on the groupid (unless the underlying data gets archived at some point). SELECT queries do run quicker than UPDATE queries ...
  8. Twan New Member

    no I wouldn't use a clustered index for this, since this will prevent the possible use of an index scan, and a covered non-clustered index is the fastest way to select data

    Cheers
    Twan
  9. larrykl New Member

    Hi Adriaan,

    My example was just a simplified version showing how I planned to use the rank column. The actual query is a bit more complex.
  10. Adriaan New Member

    Larry,

    I can see how things can get complicated once you start using a rank column. My question was: why store an aggregate value, when you can derive it? Especially as the update action takes a lot of time.

    If you transform your update query to a select statement you may see it takes a lot less time than the update version.
  11. larrykl New Member

    Hi Adriaan,
    I follow you now. That's a good point. It brings up some interesting ideas as the architecture of my system has changed.

    I had a page that was being dynamically generated in ASP and calling an SP to return the data. I figured at that time, for performance reasons, it would be better to do the update periodically to get the total and that the display page which is accessed quite frequently would then simply call the the SP which ultimately did a select with no computation to display the data including totals, etc.

    For performance reasons, I changed my design to dynamically generate the same page to a static file and then display that static file until the data was updated again, essentially caching the data to a file. With this scenario, I guess I may be able to avoid doing the update now since I could generate the total once when the static file is created. The problem is that the rank column is generated based on the total. I use the rank column for paging through records.

    It's a bit more complicated than that, I need to know weekly totals and a running total for each week. I think minimizing my updates would definitely be helpful though.
  12. thomas New Member

    Updates can be such a pain in the arse, especially if they're affecting a huge number of rows, like Adriaan indicated, they're best avoided at all costs. The benefits of hindsight, of course, are not available after systems have been designed, built and gone live, of course..

    Tom Pullen
    DBA, Oxfam GB

Share This Page