SQL Server Performance

Why does Update Stats with SAMPLE take longer then FULLSCAN?

Discussion in 'SQL Server 2005 Performance Tuning for DBAs' started by DBADave, Jan 8, 2009.

  1. DBADave New Member

    We've been running UPDATE STATISTICS with FULLSCAN daily against databases that total 280GB. The small databases are no problem, but the largest database takes over 3 hours. Overall the total time for all databases ranges from 4 hours to 4.5 hours. We decided to implement UPDATE STATISTICS WITH SAMPLE = 40 thinking this would cut down the Update Stats time dramatically. That wasn't the case. So far tests on two different servers showed using SAMPLE=40 taking longer then FULLSCAN. Why would this run longer then FULLSCAN? FULLSCAN is supposed to be the same as SAMPLE=100.

    Thanks, Dave

  2. gurucb New Member

    Do you have clustered index on all tables. I remember we had this issue long back and some body suggested creating clustered index on tables to solve the problem.
  3. Elisabeth Redei New Member

    Hi,
    FULLSCAN is the same as SAMPLE RATE 100 with the difference that with SAMPLE RATE, SQL Server 2005 might adjust the rate up (and use a sample rate that is sufficiently high to create useful statistics) and if the requested rate creates more values than is considered needed, it "tries to match the requested sample amount". I'm not a 100% how it is in 2005, but in 2000 it would actually revert to FULLSCAN if the sample rate exceeded some threshold (for performance). You can check which sample rate you end up with DBCC SHOWSTATISTICS... WITH STAT_HEADER.
    Now in your case, the explanation probably lies in how samples are gathered. Thing is, when you use a sample rate, SQL Server needs to figure out an access path (heap/cl idx/non-cl idx) and it will try to find one that is NOT physically sorted on the first column in the statistics. It does this to provide a more random sample and thus more accurate statistics. For FULLSCAN, this is not an issue so the lowest cost access path is chosen.
    /Elisabeth
  4. DBADave New Member

    Well that stinks. I will run some tests using STAT_HEADER, but it must be reverting back to FULLSCAN and all of the internal checks for SAMPLE only adds to the overall time. I was hoping to find a way to speed up UPDATE STATISTICS because eventually one of the databases in this application will be over 1TB. One thought I had yesterday was to modify our update stats script to check for indexes not used. Those indexes would be added to an exclusion table and would then be skipped during the actual update stats process. If after a specific period of time (weeks) the indexes are still not being used I will drop them, assuming the vendor agrees. Any other suggestions are appreciated.
    I'm not sure if clustered indexes are being used on all of the tables. I haven't checked. It's a vendor application (accounting system) so there's not much I can do in terms of adding clustered indexes.
    Thanks
  5. Elisabeth Redei New Member

    Hi,
    Yes I guess if you are close to it, it would do :) However, perhaps because of the distance between me and your database, I think the way it works is rather clever (from statistics accuracy point of view) but perhaps there is some intelligence missing in terms of deciding when to revert to FULLSCAN.
    Try to identify which indexes/Tables are involved, look at whether cl idx are there or not, check sample rate. It would be interesting to know what correlations you can find.
    /Elisabeth
  6. DBADave New Member

    We write to a table the starttime and duration of updates statistics per table. From this list I can guess which indexes are taking the longest. Here are the results from today for the top durations.3 hours 17 minutes35 minutes272416
    15
    14
    That's 5.5 hours for 7 tables in 18 databases. The table that took over three hours is 68GB with the following breakdown.
    Rows = 25,227,589
    HEAP - 65.5GB RowModCtr = 167,651,746
    Non-clustered Index #2 - 853MB RowModCtr = 612,182
    Non-clustered Index #3 - 831MB RowModCtr = 1,647
    Non-clustered Index #4 - 651MB RowModCtr = 1,647
    PK Non-clustered Index #1 - 379MB RowModCtr = 697,715
    One important item to note is Auto Update Statistics and Auto Create Statistics are OFF. This was a vendor requirement. They said keeping these options ON has caused performance problems, but I suspect that is pre-2005. In any case this is there requirement so there is not much I can do unless I want to void there support of the application by changing these settings.
    Should I only run Update Statistics if RowModCtr is over a certain amount or is that not a good idea?
    Thanks, Dave
  7. gurucb New Member

    From Books Online:
    The SQL Server 2005 Database Engine makes sure that a minimum number of values are sampled to guarantee useful statistics. If the PERCENT, ROWS, or number option causes too few rows to be sampled, the Database Engine automatically corrects the sampling based on the number of existing rows in the table or view. At least 1,000 data pages, approximately, are sampled. If the PERCENT, ROWS, or number option creates more values than are needed for a useful sample, the Database Engine tries to match the requested sample amount. However, because samples are taken by scanning all data pages, the actual sample size may not be exactly the same as the amount specified.
    SQL scanning more than specified sample to create representative randomness of data, if table is a heap (with no clustered index) the scanning becomes random due to fragement of tables and also SQL Server may not be able to do read ahead reads but if there is clustered index (in this case there seems to be none) then Read Ahead reads can be used.
    Yes updating statistics based on Rowmodctr is good and also you may want to check with Colmodctr in SQL Server 2005 and Asynchronous auto update statistics (ofcourse if vendor agrees).
  8. Elisabeth Redei New Member

    Hi Dave,
    For any system, with any significant workload, a mix of Autostatistics and "manual" UPDATE is usually the best solution. I have yet to see a system where manual updates are necessary for ALL indexes.
    Assuming some reasonable UPDATE/INSERT/DELETE activity - the reasons why autoupdate might not work so well is that:
    1. For large tables, autoupdate might not kick in frequently enough (the threshold is rowmodctr = 500 + 20% of total rows if the table has more than 500 rows)
    2. Again for large tables, the default sample might not be enough because it gets relatively lower as the tables grows
    3. You have "skewed" data in the column (e.g. LastName; Smith vs. Redei for instance)
    4. The Autoupdate process interrupts your user activity (causing performance problems, this can be alleviated by turning on Asynchronous updating in 2005 with ALTER DATABASE... AUTO_UPDATE_STATISTICS_ASYNC). This a quite common problem.
    So the vendour should be able to specify for which of the above reasons Autoupdate should be turned off. If it is for the reason mentioned in 5. you can, as mentioned, get around it in 2005.
    I am suprised to hear they have turned off Autocreate statistics. It is used is if you use a column in a search argument which doesn't have an index/statistics. SQL Server might then create statistics on the fly to improve performance. I don't think I have ever heard that this has caused performance problem. In 2000, you would identify your "missing indexes" by looking out for indexes with name LIKE '_WA%' to find columns with autocreated statistics.
    So finally in answer to your question :p yes, compare Rowmodctr/Rowcnt to find your biggest movers and shakers and also try to identify if you have tables that match the descriptions in the list above. And I think your vendour should be able to give you an epxlanation as to why Autoupdate cannot be used for ANY index.
    :)
    /elisabeth
    /Elisabeth


  9. DBADave New Member

    I agree with you completely about leaving AUTO UPDATE and AUTO CREATE turned on, but the vendor has insisted it should be off. They say they have seen performance problems due to statistics being created and updated while the application is being used, causing delays in their application. Needless to say I am very skeptical and told them it sounds like they are basing their decisions on older versions of SQL Server and that 2005 tuned statistics management to where this should not be an issue in over 97% of all SQL Server systems. Perhaps they are in the 3%, but I doubt it. I'm going to approach them again about using AUTO_UPDATE_STATISTICS_ASYNC, because they do contradict themselves by having one document indicating this option can be used if necessary (last resort), but I need it in writing they will not discontinue support if I turn it on. All other documents from the vendor say to manage statistics manually.I will also inquire about adding clustered indexes where they do not exist (only on the larger tables), but I am not hopeful they will agree.Thanks, Dave

Share This Page