SQL Server Performance Forum – Threads Archive
Bulk Update PerformanceIn one of our application, we need to update huge dataset (aprox. 10 Million rows) with multiple columns. We do not have too many indexes on the table and we don’t use Transaction (it may not look right, but we really don’t need Transaction). On a good hardware (4 CPU, 4 GB RAM) it takes aprox. 10-15 hrs to complete the operation. Are there any ways to reduce this time by half? Thanks for help.
One of the reason is Index defragmentation which makes ur update slow
and lack of indexes also another reason. First u do update statistics on this table and try to update the table
by splitting the 10 million rows as different batches then it will reduce
the time. Thanks. Rushendra
Run the update in samll batches like 10,000 rows at a time. Also if the data recovery is not that important (assuming from the fact that u don’t have transactions), change the recovery model or Dump transaction after each batch. Gaurav
I think guarav is right. Setting recovery mode to simple should have a big effect since its cutting IO by quite a large degree (no logging). If you choose to backup the transaction log instead of this, use BACKUP as the DUMP command still works but has been deprecated.
Sorry chappy, its just that even when the SQL 6.5 days are gone, the techincal term that I use in the discussions is still influenced by the old versions. Thanks for correction! Gaurav
quote:Originally posted by ukothari …..We do not have too many indexes on the table and we don’t use Transaction……..Are there any ways to reduce this time by half?
- If you are updating any columns used by any of your few indexes, remove the indexes first and then recreate them after your updates.
- Remove CHECK constraints if you are sure the incoming data is ‘clean’.
- Using or not using an explicit TRANSACTION does not affect the performance.
i am inclined to think that a 10M row update should take 1-2hrs assuming there are few indexes, updates are not to the primary key or unique indexes for which there are foreign key references and the updates are not to FK columns.
This is based on >500MHz CPU and 200K CPU-cycles per row.
breaking it into batching would make it more manageable but may or may not improve the overall time.
Make an estimate of the total size of 10M rows in this table and make sure that the data file has more than that much free space, and the log has preallocated about the same amount of space (unless you already know how much log space this op takes)
I am also concerned about a parallel execution plan and if you are on Xeon system.
If the execution plan shows parallel execution, try OPTION (MAXDOP 1).
If you are on a Xeon system, disable HT
Everyone suggested very valid software/design solutions to the problem. Perhaps there could be a I/O bottleneck? I would implement the changes suggested above first. Then I would check disk statistics in PerfMon to see if there is a problem.
just a minor clarifiction:
I beleive there is always loging going on regardless of mode(Full,Simple,Bulk) however with the simple logging option, the log is cycled at the page level
hence instead of no logging it would be more coreect to say miniumaly logged.