SQL Server Performance Forum – Threads Archive

Parallel processing for updating the same table…

Hi all,
In my DTS, there is one task which involves a updating a large table from a smaller table. Currently this task is taking a very large time which is really a pain. So I am thinking of a way to splitting this task into two and then updating from these two tasks simultaneously. Hers’s my approch……. I am updating column ‘Lcol1’ in table L from column ‘Scol1’ in table S. There is a join on Lcol2 and Scol2. 1. Move half data from table S to a temprary table(Task1) with same structure as S.
2. On completion of task 1 start two tasks. One will update table L from table S and other from temporary table.
3. truncate the temprary table. I am specifying WITH (NOLOCK) in both the parallel update task. Need inputs whether this would be feasible or a UPDLOCK will prevent the other query from running and the second query will have to wait.
Also any other alternative suggetion is welcome.

Are you updating columns that are referenced by foreign key constraints? Or covered by indexes?

no nothing like this is there….and I am updating the same column….not two diffeerent columns……

Should also have asked whether column being updated is (part of) primary key. You say S is smaller than L – just a handful of rows, or lots and lots? If lots and lots, then is the Scol2 lookup column unique? and is it indexed? Same questions for Lcol2: is it unique? is it indexed?

columns being updated are not the part of primary key and are not even unique….
S has around 11 million rows where as L has around 88 millin rows…
Lcol2 and Scol2 have indexes for faster join…

You haven’t answered the question whether the Scol2 column on S is unique. If it isn’t, then each row in L will be updated once for each match in S, and you will end up with unpredictable results. Are you already filtering out rows that already have the "new" value? Other than that, updating 88 million rows will just take time. You might gain some time by doing this in smaller batches, for instance updating the first 10,000 rows, then updating rows 10,001 – 20,000, etc. etc. Another approach would be to copy the L rows with the new value for the column into a temp table, then truncating L, and inserting the data from the temp table. Is this a production system? If still in development, you might consider moving the changing column to a child table, either 1:n so you have a history of the changes, or 1:1 so you can delete and insert the data on a small table.

Scol2 is not unique …rather combination of Scol1 and Scol2 are unique….
I have tried the batch update but it is not giving me much performance boost…
Actually my doubt is will there be any kind of locking that would prevent one task from updating while other is running…
Moreover I am updating the same column in the same table but from two different tasks….

If (SCol1, SCol2) is unique, then the join should probably be on both fields … assuming that this composite key corresponds to the current (LCol1, LCol2) entries on table L. Were you testing multiple tasks doing batches? The idea is to use one loop for the batch updates, and forget about using multiple DTS tasks.

but don’t you think that updating using two parallel tasks will be ven faster than updating in a loop….

Two update tasks running in parallel on the same table? Sounds like it will cause a lot of overhead. When SQL can concentrate on one task at a time, it can assign double the resources to this one task, instead of half, and not have the overhead. [edit]
Somebody help me out on explaining why smaller batches will be better … I’ll put my foot in my mouth when it comes to log file growth and such matters.

]]>