SQL Server Performance

IO requests taking more than 15 seconds...

Discussion in 'Performance Tuning for Hardware Configurations' started by gsd1900, Jul 2, 2007.

  1. gsd1900 New Member

    HP ML570,Single HP P400 contoller 512MB Array Accelerator set to 100% Write

    System 2 SAS 15K drives Raid 1 All System Files, SQL Server binaries
    Backup 4 SAS 10K drives Raid 5Backups for all databases
    Data 8 SAS 15K drives Raid 10 Data and log files for most databases
    TempDB 2 SAS 15K drives Raid 1 Data and log files for tempdb
    Log 2 SAS 15K drives Raid 1 Single log file for biggest, busiest datbase

    We getting following error during off peak hours (no connected users) (!)
    when data migration process runs (lots of selects * from x with (nolock) with JOINs.. couple of million rows total)

    SQL Server has encountered xxxx occurrence(s) of IO requests taking longer than 15 seconds
    to complete on file [T:TempDB empdb.mdf] in database [tempdb] (2).

    Where xxxx is anywhere from 1 to 8000...

    While this is happening we are also seeing spikes on TempDB drives' perfmon sec/write and sec/read counters up to 60+ seconds.

    If we run the same data migration process during peak hours We don't receive any errors (!) ...
    Also, We don't see any signficant spikes in sec/write sec/read on TempDB drives.

    What could be causing this ?

    I can think of following:
    - Drives are getting "stuck" or going to sleep / HP surface scan being on ... during idle, off peak hours
    - SQL engine executing queries under a different execution plan (i.e. using hash join that uses tempdb) during off peak hours
    - Hardware problem

    HP Diagnostic tools are not reporting any problems with drives/controller.
    Also, No Anti Virus client, Symantec Backup Agent installed but not processing at that point.
    Newest drivers have installed been for controller and all drives.

    Best solution would be to add 2+ drives to TempDB, but there are no more empty drive bays..

    I'm thinking of these possible solutions in no particular order:
    - Add extra controller for Log, TempDB and Backup drives
    - Move TempDB to Data drive (alltogether and possibly assign 2 TempDB drives to Data)
    - Move Tempdb log to Data Drive, leave TempDB data files on TempDB drive
    - Move Tempdb data files to Data Drive, leave TempDB log files on TempDB drive
    - Move Tempdb log to System Drive
    - Consider External Enclosure add more drives to TempDB
    - Consider RamDisk for TempDB (upgrade to OS required)
    - Consider Solid State Disk ( SSD ) for TempDB (Are there any with SAS interface yet ?)
    - Replace TempDB drives (drives are bad?)
    - Replace Array Controller (controller is bad?)
    - Trace Flag -T1118 (There are already multiple data files in tempdb, one for each processor core)

    I have already read Microsoft recommendations (KB articles and more, listed in previous threads ...)

    Anything else I should check for ? Any additional suggestions ?

    Thank you for your help.
  2. techbabu303 New Member

  3. satya Moderator

    This error is reported by sql server to indicate that an IO operation has been issued, and it has taken over 15 seconds for that operation to return through the IO subsystem. The problem is certainly below SQL server in the IO stack, but the location can't be pinpointed without further diagnostics.

    The KBA should help you to assesshttp://support.microsoft.com/kb/897284/en-us &http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlIObasics.mspx for a good understanding on IO.


    Satya SKJ
    Microsoft SQL Server MVP
    Writer, Contributing Editor & Moderator
    http://www.SQL-Server-Performance.Com
    This posting is provided AS IS with no rights for the sake of knowledge sharing. Knowledge is of two kinds. We know a subject ourselves or we know where we can find information on it.
  4. joechang New Member

    i really think the 1 tempdb per core rule by itself is one of the really dumb rules,
    yes, i know about the temp object creation bottleneck
    but applying this rule where all tempdb reside on 1 pair of drives is really stupid, people write rules really need to be careful

    for now, i would go to 1 tempdb date file
    or possibly 2, 1 in the current location, 1 on the backup drives

Share This Page