SQL Server Performance

Updates: fastest way to update tons of rows.

Discussion in 'Performance Tuning for DBAs' started by Sparky, Oct 28, 2004.

  1. Sparky New Member

    I am getting slow times (3-7 days) to build a 2-3 million row table that is very wide (120+ fields). It is has a surrogate int Clustered index and various text/int fields. About monthly, I assemble this table from about 7 tables of relational data. The result is a flat 2D table, or flat file, that is used for list processing and export to external companies.

    I need to improve the speed of this operation, so I am redesigning the build process. I am toying with various methods. Which one I use depends on the efficiency of updates and inserts on very wide tables.

    1) I have noticed large slow downs when attempting multi-row inserts from selects on joins across multiple tables. Should I split the inserts by source tables (reducing joins) or batch process the insert (10000 recs at a time)?

    2) When doing random updates, is it faster to do 1 row updates or multi-row updates from large select statements for wide tables.

    3) In a client server scenario, it typically doesn't matter if you call a sproc updating the all fields in a row, even if you only change 1/2 of them. But in my case, should I break single row updates into pieces, only including the fields being updated?

    Basically, I am wondering the dynamics of inserting/updating wide tables. I am tempted to push the business rules of this to a client app to save on the SQL machine. But even if I do, the insert/update efficiency is key.

    Thanks in Advance
  2. Chappy New Member

    Is it necessary to build the table into denormalised form ? Maybe its possible to create a view from the 7 tables and export that, saving a lot of disk write activity.

    Check your data types carefully, a byte saved here and there will improve page utilisation.

    Check the execution plan of your insert. Could any indices be added to help ?
    Also its often useful to drop indices on large tables before a repopulation, and rebuid them after the operation

    You ask whether to 'Should I split the inserts by source tables'. Would this involve an initial insert from one table, then subsequent updates from other tables? I would go for the batch insert option, I see no gain in doing an initial insert and then subsequent updates unless the criteria of the update is such that it is bogging down the query plan. Difficult to recommend a specific strategy, a lot depends on your design I think.

  3. Sparky New Member

    Chappy, good question. the data DOES need to be denormalized. We run numerous counts & computations off this table. It is more efficient to run reports on one table after all the business logic, and conversions have been applied than re-convert everytime. Denormalized is faster for reporting. I know this is counter-intuitive, but it closer to OLAP than OLTP, even though it is flat not a cube. It doublely serves as an export table.

    Concerning inserts, 1 batch inserts of 1000 records is always faster than 1000 single inserts, correct? I guess I want to make sure SQL doesn't behave unexpectedly with really wide tables. Right now I am doing the business logic in sprocs and inserting one record at a time. I wonder if I am getting a bunch of successive page splits, due to one record inserts. Is there a way to easily see the KB per row?

  4. oneilldo New Member

    This SQL should show you the max number of bytes per row. Obviously if you are using variable length columns the results may not be entirely accurate.
    It might be worth looking into creating smaller tables (lets say for a 1/4 the width)
    then cartesian these 4 smaller tables together with a select into to form the wide table that you require.

    select so.Name, sum(sc.length) as NumBytes, 8060/sum(sc.length) as NumRowsPerPage
    from sysobjects so, syscolumns sc
    where so.id = sc.id
    and so.type = 'u'
    group by so.name
    order by numbytes desc

  5. vaxman New Member

    It seems to me the problem is not creating the new table, but the query you use to generate it.

    3 days = 259,000 seconds = ~7 rows/second for 2 million rows. Under the worst conditions/indexes etc. SQL should be able to handle 1000+ inserts per second.

    Consider building a denormalized table of primary keys rather than data values then build your wide table from that in one go, or build it in stages where you can optimize individual queries.

  6. Sparky New Member

    Thanks oneilldo. I ran the query only find I am mistaken on the table width. Turns out the table is narrower than some other tables. Though there are 120+ columns, they are fairly narrow, some taking only 1-2 bytes. This leads me to look else where.

    Vaxman, I think you are right. I have found some indexing problems. As well, I think I am pushing too much business rules into one sproc. The rules are quite complicated and I think I would be better off putting them on on another box in a client application. Thanks both of you.

    The only question that remains is the most efficient way to do bulk updates. I have posed this question in a SQL developer forum with no success, so I will ask it here. When connecting to SQL with a client app, is there a faster way to do lots of updates other than 1 rec at a time thru an update sproc?

    For example, I could use a Disconnected Recordset. I select rows to update into a disconnected recordset, update the rows on the client and then set the batchupdate to complete the update. (In reality, this is a cursor with an ADO wrapper and I have no idea how efficient this is. I don't know if ADO really treats this like a SQL batch update or not.) Is this or some other method faster when updating a large quantity of data in a table.

    Thanks in advance.

  7. royv New Member

    In this situation I would write 10000 records of data to a file from the client and then bulk insert the file into a temp table via a sproc. Then insert into the destination table from the temp table. The reason for the temp table being if you need to massage the data before inserting into the destination table.

    "How do you expect to beat me when I am forever?"

Share This Page