Is it known that a transaction speeds things up? | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Is it known that a transaction speeds things up?

We’re developing a conversion program to load data from a file-based version of our software to the new SQL-based version. The program is run with no one else accessing the database and with most if not all indexes on the various tables removed. (They get added at the end.) The first action is to make sure that there aren’t any rows in any of the target tables, so that we’re starting with a clean slate. The program processes 100 rows from the main table in the old system, generating data in memory for a number of related tables as well, then (using ODBC) inserts the rows into each table. Some tables will have more than one row per main-table row added. The sequence of adding data to the various tables is important because we need to learn the auto-assigned identity values and put them into foreign key columns (parents have to be added before children). This process is repeated until all the data has been added. If the process fails at any point, we abandon everything. It would have been silly to put a transaction around the whole exercise, as the log would grow quite large, and it’s easy enough to truncate all the tables (or delete all the rows if something prevents truncate table from working) at startup. There didn’t seem to be much value to using a transaction for each group of rows. ODBC’s autocommit is on by default; we weren’t doing anything explicit related to transactions (other than to have ODBC autocommit turned on). Someone suggested that we should expect to get better performance if we placed begin / commit calls around the part of the code that writes the data, and our testing showed that in fact this is the case. I was surprised by the extent of the speedup, as I would have thought that the writing of the new rows would dwarf the transaction overhead. The biggest chunk of overhead related to transactions (I’d think) would be writing the data needed to support rollback to the log, and that happens regardless of whether there are many individual transactions or one big one. Given that everything being done is an insert, there’s not much "before-image" data being written to the log at all. Is this a known optimization? I don’t remember seeing mention of it before, and I’ve read a lot. Of course, in most cases you want to use transactions to protect the data, but in this context it simply wasn’t necessary as we think it’s right to re-start completely so we could avoid having to add a lot of logic to the conversion that would let it skip work already (possibly only partly) done. (Any bug that caused a failure could have resulted in bad data being written to SQL, so that’s another reason to start over each time.) Would we have seen as big a speedup if everything being done was an update to an existing row? There, the data needed to roll back would be the previous values (a lot more information). Any comments would be appreciated. J. Merrill / Analytical Software Corp.
Hmmm. Auto commit causes each individual update insert to be commited. The transaction only commits at the end (when commit is issued). So presumably, you’re seeing a bulk commit being quicker that lots of individual ones. Seems to make some kind of sense. But am just postulating…
if you a block of write ops, insert, update delete that operate on a few rows each
each write op is an implicit transactions meaning it gets a write to the log before the next step
if you bracket the group with BEGIN TRAN/ COMMIT TRAN
the entire group gets one write to the log (maybe broken up) if you have a slow log disk, then this really helps
try grouping 10-1000 rows together, or until the log write hits 64K, after that there is limited gains see my papers (some where on this site) on detailing all of this