SQL Server Performance

overnight peformance degradation ( indexes? )

Discussion in 'Performance Tuning for DBAs' started by rkinderDBA, Apr 7, 2006.

  1. rkinderDBA New Member

    I am wondering if anyone has ever seen index fragmentation cause a huge performance problem over night.
    We experienced a strange situation where one morning ( after some firmware updates ) the peformance of a given database came to it's knees. Nothing had changed in the app or the database.
    The firmware updates were then backed off and nothing improved.
    We went ahead and rebuilt the indexes and soon thereafter performance improved. The problem is that in addition to this, some reboots occurred as changes were attempted to be made to the SAN which was experiencing the high i/0.

    while some of the indexes were pretty fragmented, they were equally as fragmented the days before and we did not experience a gradual decline.

    We are now unable to reproduce it, but still struggling for an answer.
    Any suggestions or thoughts?

  2. satya Moderator

  3. rkinderDBA New Member

    Because production utilizes a SAN, there is not an equivalent test system.

    I do understand that the fragementation will degrade performance, but could it happen overnight with only the same fractional increase that had been occurring prior to this?

  4. satya Moderator

    Have you checked any spurious activities during this upgrade, such as high increase of sQL server or any virus or spyware. ... that helps too.

    Satya SKJ
    Microsoft SQL Server MVP
    Contributing Editor & Forums Moderator
    This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
  5. joechang New Member

    while index fragmentation can have an effect, there is no solid evidence provided here that the cause is index fragmentation.

    firmware does occasionally have this affect as well, but i will set this aside for now.

    the most common cause of a sudden change in behavior is a change in the execution plan.
    I suggest you take profiler traces, parse them for average cpu & duration for the top procedures.
    if one or a few show drastic changes, then this is the probable cause

    you can use my SQL Clone tool to snapshot the database statistics on a daily basis
    bring up the execution plan for the problem queries on good and bad days
  6. seanlo@jdg.co.za New Member

    I second joe's thoughts.

    I have however seen a database go to its knees when something goes wrong with 1 index.
    Even defraging the index did not help, only dropping and re-creating the index fixed it.

    Some sort of index corruption, but in your case this is highly unlikely.

    What SAN are you using ? Did anyone check what was happneing on the SAN at the time, if it it is a CX of some sort, Analyzer can help if you have it installed on the SAN.
  7. rkinderDBA New Member

    I did look into the virus software, defragging tools, etc. and we have found nothing yet.

    We did do a checkdb and check table on all tables, ( which is were I expect any index corruption to be reported) and found nothing. So I was hoping to rule that out based on those results.
    I also have a brief performance trace from the time of the problem, in which I can see that the indexes are being used ( in the execution plan ).

    The stored procedures were recompiled close to the time of the index rebuilds and so I can look at this. How or what would go wrong with the execution plans out of the blue?

    It is an EMC Clariion SAN, I am new to the SAN and I have to admit that I do not know what CX is referring to.

    Some are looking more into confirming that the high disk i/o and network trafic was coming from SQL Server, but to some it is an assumption because they monitored for the LUN associated only with SQL Server.
    I will pull another trace and compare the numbers with what I do have from the time of the problem to see if it looks like the execution plans had a problem. Any insight on how this happens would be appreciated.
    Thanks very much for all of your responses!!
  8. joechang New Member

    Every query can have many possible execution plans
    suppose there are 2 two plan that are close in estimated cost, A & B
    A is a known good plan, B has slightly higher cost by the formulas used by the optimizer.
    In fact the true cost of B is much higher than the true cost of plan A,
    a slight shift in statistics causes the optimizer to go with B instead of A.
    hence your problem

    if i recall, an index defrag does not update stats(?)
    an index rebuild does

    always compare the estimated row count with the actual row count, if there is a consistent larger discrepancy (>3X) consider increasing the sampling percentage,
  9. rkinderDBA New Member

    I understand what you are saying, but I have to ask, why would this happend overnight and not happen repeatedly, if this is really the case? Auto -update statistics are on, I queried for the last time they were updated on all tables and nothing has been updated in between the good and the bad performance. And would it continually pick plan B since it has a much higher cost? ( Same SPs called over and over all day ).
    I'm not doubting the information at all, I'm just tyring to fit it into what happened, one time... overnight...
    Also, after the indexes were rebuilt, one of us went to shrink the log, which after waiting a very long time, it was cancelled, then the next day the same operation finished in seconds ( as we would expect ). The difference, was a reboot that occurred after the hardware upgrade did not happen.

    Also, I see no differences ( only sampled a couple of plans ) between now and the problematic period ( as far as the execution plans ).

    Thanks again...
  10. mmarovic Active Member

    Do you regurarly shrink db log or db?
  11. rkinderDBA New Member

    No, this was just because of the log growth from doing the reindex.<br />No, and um, no. <img src='/community/emoticons/emotion-1.gif' alt=':)' />
  12. joechang New Member

    more than likely the problem you described is execution plan related,
    but there are multiple causes,
    another has to do with parameter sniffing,
    MS claims this is good, but I maintain that you do not know if this is good or bad,
    anyways, lookup MS articles that talk about parameter sniffing
  13. rkinderDBA New Member

    Wouldn't my execution plans vary, then, from the trace during the time of the problem to the execution plan that I see now in test? While I've only comared a couple of plans, and I suppose that it could have been any one of them causing the rest to slow down, waiting on resources??

    I'm trying to think of a way to prove this theory, and I would think it would be comparing execution plans ( assuming I had all sp calls in the duration of my trace, which is unlikely ).

  14. ramkumar.mu New Member

    If you have more number of inserts/updates on your table, then fillfactor might be a factor that affect performance. 90% fillfactor (default value) will not be sufficient if the table is a victim of frequent and heavy inserts.

    if you do more inserts on your table, try reindexing with a lower fillfactor - 80 or 70


    "It is easy to write code for a spec and walk in water, provided, both are freezed..."
  15. druer New Member

    "It happened overnight" .... Is it possible that a large data insert/update/delete occured that particular day that normally wasn't happening that could explain the sudden change?
  16. MichaelB Member

    I have also had performance go south for some major procs our system uses. The first thing I would of done would of been an UPDATE STATISTICS WITH FULLSCAN which usually fixes sudden performance issues when index fragmentation doesnt look like the problem.


    Michael Berry
  17. rkinderDBA New Member

    Thanks, I'll look into trying that should it ever happen again.
    Also, in response to the previous message about large data inserts or modifications, no we did check for that and found nothing out of the ordinary.
    At this point we are really still pointing it to an issue with the firmware. We did take the parameter sniffing and execution plans into account, but we actually have some 'robots' that hit the database continually with the same parameters and these would have been the first thing to hit it after a restart.
    Thanks again.
  18. cmdr_skywalker New Member

    I am not sure exactly what happened but sometimes when you upgrade the firmware, in some vendor's hardware, it tries to re-init attached hardware during BIOS POST. And probably during this process that you hit the partition/reinitialization that introduced the high io (or fragmented your files in the process including your server indexes/data files). Usually, don't try to upgrade the firmware not unless you really need its fix and you have a backup.

    May the Almighty God bless us all!
  19. cmdr_skywalker New Member

    or your OS (HAL) components is trying to adjust to the new firmware and is causing the high I/O.

    May the Almighty God bless us all!
  20. derrickleggett New Member

    Just an fyi. [<img src='/community/emoticons/emotion-1.gif' alt=':)' />] If you have a Clariion, you have a CX series SAN from EMC. They are the same thing. <br /><br />I've seen several times where a bad statistic can cause all kinds of issues. Sometimes, this is related to a fragmentation issue. Sometimes, it's because of rapid growth in the table data size. You only have to cross one percent sometimes for performance to go south. For general troubleshooting, it is sometimes good to just update the statistics and see what happens. If it works, keep lowering the tables until you get the exact ones causing the issue. You can then look at adjusting fill values. On one database we rolled out new, we actually setup a job to update the statistics every 30 minutes on five tables. They were growing extremely rapidly; and we had all kinds of issues until we implemented the scheduled updates.<br /><br /><br /><br />MeanOldDBA<br />derrickleggett@hotmail.com<br /><br />When life gives you a lemon, fire the DBA.

Share This Page