SQL Server Performance

Merge performance on Very Large Tables

Discussion in 'SQL Server 2008 T-SQL Performance Tuning' started by zerolan, Jan 14, 2011.

  1. zerolan New Member


    To all you DB/SQL experts out there - I hope you can shed some light on my problems.

    I try to make it short.
    The DB has got a very large table, about 60m rows, 11GB in size, 2.9GB index size.
    Each day new data has to be merged into the table, I do that in 64MB batches which I bulk insert
    into a temporary table and then merge into the large target table.
    Each of those batches contains roughly half a million rows, of which some will be new to the db (resulting in an insert)
    and some already exist (resulting in an update).

    I do the operation above via a merge using 2 bigint indexes (which are the same in the temporary table as well the large target table).
    The issue is simply that the time used for the merge statement to complete, grows as the db grows. and I mean not just
    by a little bit, it grows ALOT.
    Starting with an empty db, a merge as described above takes about 11seconds. Now with the size as described above, it takes
    over 10 minutes(!). Im aware I cant keep insert times constant but this is a vast drop. I put the table already onto its own filegroup spanning
    across 4 disks, no real difference.

    I found out that it seems to be the inserts of new rows within the merge which cause the performance issue.
    When I run the same batch twice, or any other batch which I know I have already processed before, it only takes about 2 minutes (as all records already exists
    in the db and therefore only an update needs to be performed).
    I also set the table to autogrow (by 10%) so it doesnt constantly grow in 1MB steps.
    Recovery model is set to simple to keep the logfile small.
    I also checked the query plan and no table scans are done, but the query plan looks very complex (for a fairly simple merge).
    The most expensive items are those 3: "Table spool", "RID Lookup" and "Index Seek".
    Its running on a failry beefy machine (3GHz, 8GB mem).

    Am I doing something fundamentally wrong or do I have to simply accept that SQL becomes very slow at this size? (I rather believe its the former).
    I mean adding 500k new rows to a 60m rows table shouldnt take 11 minutes, its only 60mb of data to add.

    Any ideas?


    I look forward to your responses
    ZL

  2. FrankKalis Moderator

    Welcome to the forum!
    you're correct. It should always almost never take that long.
    Can you please post the table structure & indexes and the statement you are using? The mentioning of RID Lookup and Table Spool suggests that there is something else going on than a simple merge on two bigint indexes.
  3. zerolan New Member

    Hi Frank,
    Thank you for your reply!
    The table has 10 columns: bigint, int, datetime, datetime, varchar(1024), varchar(512), bigint
    bigint (called URLEExLo), bigint (called URLEExHi), bigint.
    the clustered primary index is on the int, identity column, not really used. 2 of the bigints are indexed (individually) and are the key
    used in the ON clause of the merge. Those 2 bigints (something like a checksum which comes in with the data), are the key used to identify
    if the data already exists and needs update or if it needs to insert a new column.
    Yes you are so right, I was thinking the same, the query plan looks way to complex imo for a merge like this one:
    merge FullURLs as target
    using #tmpimport as source
    on source.URLExLo = target.URLEExLo and source.URLExHi = target.URLEExHi
    when matched then
    update set SeenLast = CURRENT_TIMESTAMP, target.TimesSeen = target.TimesSeen+source.TimesSeen
    when not matched then
    insert ( TimesSeen, SeenFirst, SeenLast, URL, [File], URLEExLo, URLEExHi, DomainExLo, DomainExHi )
    VALUES( 1, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, source.FullURL, source.[File], source.URLExLo, source.URLExHi, fnMD5ExLo( fnMD5Ex(Domain) ), fnMD5ExHi( fnMD5Ex(Domain) ) )
    output $action into #tempresult
    ;
    This is the original query, 'failry' fast to run if it only updates (allthough also getting slower with table size), but when inserts hits, the thing is going down.(time-wise)
    Thanks!
    ZL
  4. satya Moderator

  5. Misccold New Member

    @satya what does that have to do with anything? He/She's talking about a MERGE INTO query.
    What's the solution? My merge are too slow as well.
  6. satya Moderator

    I have referred about the best practices setup in such highly volatile database for merge replication. However it seems the INSERT operation is taking long time means which is a default behaviour. As Merge replication creates triggers on published tables, which will cause the insert data and replication trigger is firing at same time.
    Have a look at Books online for sp_mergecleaupmetadata to see if that makes any better.
  7. FrankKalis Moderator

    I might be wrong, but the OP does not mention replication at all. [:)]
  8. Misccold New Member

    Please let's focus on the problem at hand... MERGE INTO queries get real slow, what are the "tricks" to optimize them? If fields you merge on are indexed is it a good thing or a bad thing... You know. That sort of stuff.
  9. Misccold New Member

    The doc is here http://technet.microsoft.com/en-us/library/cc879317.aspx
    • Create an index on the join columns in the source table that is unique and covering.
    • Create a unique clustered index on the join columns in the target table
    What do they mean by "join columns" ? Do they just mean all the columns used in the USING or the ON?
    I don't understand
  10. satya Moderator

    My bad, apologies [:)]...incorrect assumption.
    Anyway, have you had a chance to look at execution plan on the query how and what indexes it is using. The key performance in the MERGE execution is the indexes and join hint. See what BOL refers:
    Because the MERGE statement performs a full table scan of both the source and target tables, I/O performance can be affected when using the TOP clause to modify a large table by creating multiple batches. In this scenario, it is important to ensure that all successive batches target new rows. For more information, see Optimizing MERGE Statement Performance.
    Also have a look at how the triggers are fired as they are active wtihin INSERT/DELETE operations.

Share This Page