SQL Server Performance

Long running queries on a subscriber server slowing down the whole system

Discussion in 'SQL Server 2005 Replication' started by WingSzeto, Jan 14, 2011.

  1. WingSzeto Member

    We are using SQL 2008 as a publisher and have three subscriber servers, One is SQL 2005 and the other is SQL 2008. All SQL servers are standard version. We are using push transactional replication.
    The SQL 2005 subscriber is destinated for our Microsoft Access reporting application and the SQL 2008 subscriber is for our web application reporting. The SQL 2008 publishing server is our main OL . Recently the SQL 2005 subscriber had a few reports generated by the MS Access application created huge IO and ran for a long time (> 10 mins). They are the same reports with very slight different criteria. For some reason, these long running queries not only affected its server, the main server (publishing server) and the SQL 2008 subscriber also came to a haul which affects our web application servers use. This problem becomes a crisis.
    Obviously I will need to address this query first which I will definitely do. I am very curious of why the heavy activity on one subscriber server would affect others. Any input on this is very much appreciated.
  2. satya Moderator

    What is the snapshot schedule between the replication instances?
    Are you able to find out which process is long running and do you see any error or warning specific to this behaviour?
  3. WingSzeto Member

    Snapshot runs once a day at 3:30 am. But normally we don't have any snapshot to be run at all. If it does, it would be for a new table replication so I don't think it is my issue at all.
    Yes, I know which query is causing the problem and I am planning to address it. My original question is that I understand the long running query would affect the subscriber server (say C, SQL 2005) where the queries are running on, but why it would also affects the other subscriber (Say B, SQL 2008) and the publisher (Say A, SQL 2008). Let me clarify the scenerio here.
    SQL Server A (publisher/distributor) is our main OLTP which serves our Web Application. Server B(subscriber) is our main reporting server which serves the Web application reporting requests. Server C (subscriber) is another reporting server which serves our internal MS Access 2007 users. In my situation, server C is very busy due to long running queries from our Access users, I fully expect Access users would experience long reponses time. What I am puzzling with is why both server A and B also come to a haul. By the way, These long running queries also create huge IO.
    I checked the latency by looking at the Tracer Tokens in Replication Monitor, when the long running queries happens, the Distributor to Subscriber latency on server C will increase quite largely (between 15 sec and 1 min) which I can understand. The Publisher to Distributor latency (between 3 and 10 secs) on server C seems to be alright. It increases slightly but not in any alarming fashion. The same latency category on other servers is well within 3 to 5 secs.
    Can you explain why the activity on server C would affect server A and B? The only tie I can link them together is replication but I don't know how it would. Please give me some pointers.
  4. satya Moderator

    Are the publisher & subscriber on same physical server or each of them are physical seperated?
  5. WingSzeto Member

    The publisher db and distrbution db are on the same physical server and each subscriber is on its own physical server.
  6. satya Moderator

    As I guessed it right that publisher/distributor on same physical box causing resource hog, in this case the subscriber is on other box where the network connectvitiy is also important to ensure the same query works in optimized way. As referred above the underlying indexes & statistics of tables will have the knock on affect if they haven't been managed properly.
  7. WingSzeto Member

    just want to clarify that before the long running query happened, everything was fine. Publisher and distributor, and all subscribers perform efficiently. As soon as the long running queries were running on the SQL 2005 subscriber server for couple minutes, then all hell broke lose. Since each subscriber has its own distribution agent, if the 2005 subscriber server was super busy which affected IO and network, would it just slow down the distribution agent for that particular subscriber but not the other distribution agent(s) and log reader agent? As long as the log reader agent is not busy or having huge latency, why would one distribution agent affecting other distribution agent?
  8. satya Moderator

    Because everything is in loop, as the long running query waiting to finish causing others having similar affect too.
    As you said earlier what is that long running query cuaisng such an issue?
  9. WingSzeto Member

    Are there any settings in replication/agent jobs that I scan adjust to miminize or isolate this effect?
    I found out that the long running query is due to a different query exection plan. When I checked the query plan of that long running query yesterday, the plan showed using SEEK on the big tables compared using 'SCAN' during the problem time a week ago. So the long running query is no longer an issue but the problem now is why the plan changes. I do update statistics with full scan on all big tables every morning and updated statistics again during the problem time. I don't mean to change the subject on this thread but if you have a quick idea, I like to know how to make the optimizer to reverse back to the seek operation from a scan operation on a specific query in a production environment on the fly. Let assume I can't make change to the source query until two hours later.
  10. satya Moderator

    As you have identified the query, is it possible to see what indexes SEEK is used or if you beleive it is incorrect, You can specificy the hints for that query.
  11. RamJaddu Member

    Start trace / profiler check any blocking going on ... some times adding few missing index would sort out the problem.
  12. satya Moderator

    Dont you think running PROFILER on top of current performance problem will add some overhead, in this case why not take help of DMVs to obtain resource information.

Share This Page