SQL Server Performance

Timeouts

Discussion in 'Performance Tuning for DBAs' started by rerichards, Feb 8, 2005.

  1. rerichards New Member

    I have a table with a clustered primary key index with a 90% fillfactor and a second composite (two column) non-clustered index. The table has less than 300,000 rows. We have a separate archive table to keep the row count under 300,000.

    The table receives far more inserts and updates than reads.

    Performance monitor counters for the following are all well within accepted allowances:
    PhysicalDisk - %Disk Time - _Total
    PhysicalDisk - Current Disk Queue Length - _Total
    System - % Total Processor Time
    System - Processor Queue Length
    SQLServer: Access Methods - Page Splits/Sec
    SQLServer: Memory Manager - Target Server Memory
    SQLServer: Memory Manager - Total Server Memory
    SQLServer: Locks - Average Wait Time ms - _Total (always returns 0, never a lock)

    DBCC TraceOn(3605,1204,-1) - (no indication of dead locks)

    Daily, we delete from the table to an archive table, perform a transaction log backup, and then run a maintenance plan to shrink the database and reorganize the data and index pages.

    If we do not perform this daily routine, the application services communicating with the database time out at 30 seconds. As long as we perform this daily routine, in running a duration trace in Profiler, normal statement duration times for these inserts and updates are in the milliseconds. The 30 second command time out seems more than adequate.

    When we do not perform this daily routine we see the following in Profiler:
    1. Only the inserts are failing (timing out at 30 seconds), but not all. A service that fails on an insert, can turn right around and perform a successful insert in milliseconds the next time around. The same service that timed out when inserting can turn right around and successfully perform an update in milliseconds.

    2. Only inserts are failing (timing out). Updates do no fail.

    3. There is no gradual buildup in duration time. Every duration is at milliseconds and then all of a sudden a service times out inserting at 30 seconds.

    4. The rows size being inserted rarely exceeds 1MB and those that time out are often only several hundred bytes in size, if even that.

    5. The data files are set to automatically grow at 10%. The data file is 6GB and does not show any growth or even near the threshold of growing when the inserts are timing out (remember that some inserts continue to succeed in milliseconds and all updates are succeeding in milliseconds).

    6. The log files are set to automatically grow at 10%. The log file is 215MB and a transaction log backup is performed every 2 hours. Occasionally it does grow, but most of the time by the time the two hour backup comes around, it is around 50-60% used, or 40-50% free space (remember that some inserts continue to succeed in milliseconds and all updates are succeeding in milliseconds).

    With that lengthy background, do you have any insight on what may be at issue? When the timeouts occur we stop the application services, archive (delete some records from the table receiving the inserts), perform a transaction log backup, and then optimize by shrinking the database files and reorganize the data and index pages. Then we restart the application services and things proceed without any insert timeouts.
  2. mmarovic Active Member

    quote:The rows size being inserted rarely exceeds 1MB and those that time out are often only several hundred bytes in size, if even that.
    You mean one row more then 1 MB!? In that case you must have text/image column(s).

    I have a few ideas how to improve performance, but I don't see anything that can cause such a long insert.
  3. Chappy New Member

    Whilst shrinking the data files will optimise for size, I wouldnt recommend it. Since its inevitable that the database will grow once again beyond the shrinked size, its better to just not shrink it. This will save sql server from having to autogrow the data file shortly after you perform your shrink.

    If data is regularly being inserted and then archived out of the database, leave the database unshrinken. Youll soon find an optimal size where the database does not need to grow as frequently.

    Run an sql profiler and monitor data file growth. From what you say I suspect the timeout could possibly be due to the autogrow process. Alternatively you might have some badly chosen clustered indices which are requiring the data to be reorganised on disk in order to satisfy an insert, check this also.

    Rows have a max size of 8k. If you suspect the row size is 1MB, it is likely you are using BLOB fields as mmarovic says, but the BLOBS are stored external to the actual row data.

  4. rerichards New Member

    quote:Originally posted by Chappy
    Alternatively you might have some badly chosen clustered indices which are requiring the data to be reorganised on disk in order to satisfy an insert, check this also.

    How might I check this out, i.e., badly chosen clustered indices?
  5. joechang New Member

    how much free space is in the 6GB data file, even if its not near the threshold, once you get to less than 30% free space, it can be a pain to find free space, especially if you have blob fields.
    1. make sure the data file has atleast 30% free space
    2. make sure the log has free space,

    per chappy, the autogrow could cause the 30sec timeouts,
    also, are you really inserting 1MB objects?
    how long does the T-Log backup take?
  6. rerichards New Member

    quote:Originally posted by joechang

    per chappy, the autogrow could cause the 30sec timeouts,
    also, are you really inserting 1MB objects?
    how long does the T-Log backup take?

    1. Regarding the autogrow causing the timeouts, that is what I expected until I saw that updates were successful in milliseconds, and never failed, and there were intermittent insert successes, even though the number of insert timeouts far exceeded the number of insert successes. I thought during auto grow all events were suspended? Correct or not correct?

    2. Yes, there is a column of text data type.

    3. The T-Log backup usually takes 60-90 seconds. But the last time we had this timeout episode it definitely did not occur during a T-Log backup.

  7. Chappy New Member

    Ok really were just stabbing in the dark with this. To find out what is causing the timeouts is quite easy, just run an SQL Profile and watch activity on the server during a timeout

    If this doesnt show anything conclusive, run perfmon and monitor cpu, network traffic, disk queue length etc. Using perfmon and profiler you will get to the bottom of the problem
  8. mmarovic Active Member

    Have you kept default recovery interval? Do you use full or bulk logged recovery model? Do you use writetext for filling in text columns? Is your clustered primary key on identity column?
  9. rerichards New Member

    quote:Originally posted by mmarovic

    Have you kept default recovery interval? Do you use full or bulk logged recovery model? Do you use writetext for filling in text columns? Is your clustered primary key on identity column?

    Answers to your questions:
    1. Have you kept default recovery interval? Yes (0)
    2. Do you use full or bulk logged recovery model? Full, nightly full backups with t-log backup every 2 hours.
    3. Do you use writetext for filling in text columns? Yes
    4. Is your clustered primary key on identity column? Yes
  10. Luis Martin Moderator

    It seems that what Chappy said in his last post was missing.

    I stronly suggest to do what he recommend.


    Luis Martin
    Moderator
    SQL-Server-Performance.com

    One of the symptoms of an approaching nervous breakdown is the belief that one's work is terribly important
    Bertrand Russell


    All postings are provided “AS IS” with no warranties for accuracy.



  11. mmarovic Active Member

    AS Chappy said wa are really stabbing in the dark. Do you have your blobs in row?
  12. rerichards New Member

    quote:Originally posted by mmarovic

    AS Chappy said wa are really stabbing in the dark. Do you have your blobs in row?

    I do not seem to understand what you mean by "blobs in row"? The text column does reside on the same data file [Primary] as the rest of the table.

    This is the DDL:
    CREATE TABLE [dbo].[IFSMessages] (
    [MessageID] [int] IDENTITY (1, 1) NOT NULL ,
    [DocumentID] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [MessageTypeID] [int] NOT NULL ,
    [MessageState] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [ActionType] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [CreateDate] [datetime] NOT NULL ,
    [MessageBody] [text] COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
    [LastUpdated] [datetime] NULL
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
    GO

    ALTER TABLE [dbo].[IFSMessages] WITH NOCHECK ADD
    CONSTRAINT [PK_IFSMessages] PRIMARY KEY CLUSTERED
    (
    [MessageID]
    ) WITH FILLFACTOR = 90 ON [PRIMARY]
    GO

    CREATE INDEX [IX_CreateDate_MessageTypeID] ON [dbo].[IFSMessages](
    [CreateDate], [MessageTypeID]) WITH FILLFACTOR = 90 ON [PRIMARY]
    GO
  13. mmarovic Active Member

    Actually I should already know that you don't have text in row since you are inserting > 1 MB text. Not having myself experience with text columns, I tried to find articles about that topic but without luck. I tried to find if your timeout may be related to inserting large amount of texts at the moment data are flushed to the disk (after checkpoint). But it was really wild guess and I haven't found article that would support such theory.

    *

    If you do not have much updates on columns other then MessageBody you should put fillFactor = 0 on pk. The only reason to put some free space on identity column I can imagine is when you have frequent updates changing the size of row (e.g. MessageState length changes from 5 to 45).

    *

    Have you tried to combine profiler and perf. monitor to figure out what is going on? (as alredy suggested)
  14. rerichards New Member

    quote:Originally posted by mmarovic

    Have you tried to combine profiler and perf. monitor to figure out what is going on? (as alredy suggested)

    Since taking ownership of this issue, it has only occurred one time (because of our current workaround to optimize and archive). That one time I had Profiler running, which is how I was able to determine that updates were all successful, and inserts mostly failed but not always (as explained above). When it came to running performance monitor, Murphy's law took place. I thought I had them running but apparently I did not. But as I mentioned in the introductory message, outside of the timeout episodes, the counters I mentioned in the introductory message are all acceptable.

    I agree having a counters running with the accompanying Profiler trace, should reveal something more definitive than what I have to go by at present.

    And lastly, just to provide some clarity if needed, the text column never gets updated.
  15. mmarovic Active Member

    quote:That one time I had Profiler running, which is how I was able to determine that updates were all successful, and inserts mostly failed but not always (as explained above).
    quote:And lastly, just to provide some clarity if needed, the text column never gets updated.
    This is additional reason I think the issue can be related to filling-in text column values.
  16. rerichards New Member

    quote:This is additional reason I think the issue can be related to filling-in text column values.

    Do you have any suggestions as far as Profiler or Perfmon that might help to prove this theory the next time these timeouts occur. Currently, I have a duration trace (long running queries) in Profiler running almost 24x7. Any other ideas?
  17. mmarovic Active Member

    Like I said this is really wild guess. If you have plenty of time you could try to simulate it. Run batch of commands inserting texts, run other commands in parallel windows and while they are still running run checkpoint command in separate window. See if any insert waits untill checkpoint completes and how much time it takes. Not that I am optimistic but this is a way to test the theory. Of course everything on a test server.
  18. rerichards New Member

    Okay. The heavens seem to be smiling upon us. We just experienced a new timeout episode and when it came to monitoring, I was loaded for bear.

    It appears the timeout episodes we are experiencing occur when the data file grows. It is set to automatically grow by 10%.

    I believe this might explain why all the updates succeed during the automatic growth (at least I haven't noticed any updates timeout yet), and the inserts are mostly timing out, with an occasional insert succeeding.

    I was monitoring checkpoints and that seemed to be normal.

    The disk% time was very high during this automatic growth, timeout episode.

    No locking, blocking, or deadlocks.
  19. Chappy New Member

    &gt; Since taking ownership of this issue, it has only occurred one time (because of our current <br />&gt; workaround to optimize and archive). That one time I had Profiler running, which is how I was <br />&gt; able to determine that updates were all successful, and inserts mostly failed but not always (as <br />&gt; explained above). When it came to running performance monitor, Murphy's law took place. I thought<br />&gt; I had them running but apparently I did not. But as I mentioned in the introductory message, <br />&gt; outside of the timeout episodes, the counters I mentioned in the introductory message are all <br />&gt; acceptable.<br /><br />Wow Ive no idea what any of that means <img src='/community/emoticons/emotion-1.gif' alt=':)' /><br />Did you run profiler and perfmon or not?
  20. rerichards New Member

    [quoteDid you run profiler and perfmon or not?


    On this episode that occurred earlier today I was running both Profiler and Perfmon.

    Profiler was retrieving Errors,RPC:Completed, SQL:BatchCompleted, with columns of start/end time, duration, text data, event class, reads, writes, cpu.

    Perfmon was monitoring:
    SQLServer: Buffer Manager - Checkpoints pages/sec
    SQLServer: Databases - Data File(s) Size (KB)
    SQLServer:Locks(_Total) - Lock Timeouts/sec
    SQLServer:Locks(_Total) - Lock Waits/sec
    Physical Disk: % Disk Time
    Physical Disk: Avg. Disk Queue Length
    SQLServer: Access Methods - Page Splits/sec
  21. satya Moderator

    Like I refer in your other post:#
    If the timeouts occuring during the datafile expansion then keep the value to 20% rather than standard 10% or instead of %age set size to 500K or so.

    Satya SKJ
    Moderator
    http://www.SQL-Server-Performance.Com/forum
    This posting is provided “AS IS” with no rights for the sake of knowledge sharing.

Share This Page