SQL Server Performance

Replicate an update statement that touches many records

Discussion in 'SQL Server 2005 Replication' started by WingSzeto, Jan 17, 2008.

  1. WingSzeto Member

    We are using SQL 2K5 sp2 with push transactional replication method. The publisher and distribution db are on the same server. Recently we have to issue an update statement to update a date field, 'StartDate', with a value of '1-1-2008' in a tableA. Let say this table has 300,000 records and we need to update 250,000 of these records. StartDate is not an indexed field and the TableA only has a primary key. The update statement is something like that:
    update TableA set StartDate = '1-1-2008' where StartDate is null
    The outcome of this update statement is that it ran very fast on the publishing database but it took at least 2 or more hours to replicate the statements over to the subscriber database. During this time, the synchronized status is indicating " delivering transactions" but the delivered transaction per sec counter in my performance monitor shows zero . I don't know how long it took because I ran out of patience since it blocked other replicated transcations. I ended up removing that article, TableA and did a snapshot for it to address this problem.
    I believe if the update statement touched 250,000 records, the update statement being used in the replicated database would be 250,000 statements which it would use the primary key to do the update. Why is it so slow in the subscriber database? Is it because the StartDate field is not an index field so the replication process needs to use a table scan to deterimine the primary key for each of the 250,000 statements? It doesn't seem to make sense but I can't find a good reason for it. Please advice.
    wingman
  2. satya Moderator

    Is this a first push of data on the replication?
    WHat does execution plan states?
  3. WingSzeto Member

    No, tableA has been involved in replication process for over 1 year.
    The execution plan shows it is a clustered index scan on tableA.
  4. satya Moderator

    As per the rowcount they are too many to retrieve and also you might see estimated rowcount on the plan, my guess it may be due to the column queried in the WHERE clause may not be selective enough.
    By default a clustered index scan is generally faster than a standard table scan, as not all records in the table always have to be searched when a clustered index scan is run, unlike a standard table scan. Generally, the only thing you can do to change a clustered index scan to a clustered index seek is to rewrite the query so that it is more restrictive and fewer rows are returned
  5. WingSzeto Member

    Yes, I follow your idea. So if someone issued an update statement like I have shown without thinking about restricting the query or thinking intelliently, would the outcome be like what I experienced, very slow update-replication on the subscriber database?
    Actually I also want to clarify what exactly is happening in the replication process in this scenerio. When a statement like 'Update tableA set createdate = '1/1/08' issued at the publishing database, this same command will eventually be translated to "update tableA set createdate = '1/1/08' where pkc = @c1" (I am simplied this a little) because the update stored procedure already exists in the subscriber database. Since createdate is not an index field, it will trigger a clustered index scan to find the pkc. Since I have 250,000 of such update statements coming over to the subscriber database, this clustered index scan happens for every statement and that's why it took so long. Am I right or you have a different explanation?
    wingman
  6. ndinakar Member

    The clustered index scan is probably not because of your query but perhaps because of outdated statistics. See if you can reindex the tables. That might help.
  7. clarkbaker1964 New Member

    Statistics will be updated automatically if that database option is set.
    most likely you are missing an index for this column.
    generally it is good to have a "covering" index to assist the clusted index in finding the specific leaf nodes.
  8. satya Moderator

    It is recommended to run intermittent UPDATE STATISTICS on frequently used tables even though when you have AUTO UPDATE STATS set on that database.

Share This Page