SQL Server Performance

Disk Queuing and Raid size calculation

Discussion in 'Performance Tuning for Hardware Configurations' started by Lance, Feb 18, 2004.

  1. Lance New Member

    I am trying to calculate the number of disks we need to support our databases.<br /><br />I currently have 2 arrays using Raid 5 1 with 4 drives(H) and 1 with 3 drives(J)<br /><br />H stores the tempdb, datafiles and the smaller backups<br />J stores the log files and the larger backups<br /><br />I have been monitoring the system for over a year at 5 minute intervals<br />on average I noticed the following<br /><br />H: disk queuing avg 16<br />J: disk queuing avg 0.3<br />H: reads per second avg 36<br />H: writes per second avg 8<br />J: reads per second avg 22<br />J: writes per second avg 7<br /><br />Batch requests per second averaging 18<br />Processor utilization avg 25%<br />Processor Queeuing avg .7<br />Memory page reads/sec avg 30<br /><br />This box is sql server only. It is in a 2 node cluster active active cluster<br />according to the formulas I used I should have plenty of capacity.<br /><br /> (disk reads/sec +(4 * disk writes/sec))/number of disks in array<br /> H: (36 +(4 * <img src='/community/emoticons/emotion-11.gif' alt='8)' />)/4 = 17 operations per second<br /><br />This is well below the 100 to 125 IO's per second that these drives(36GB 10K RPM) should handle.<br /><br />Why am I seeing this disk queueing?<br /><br />We went ahead and added an additional GB of RAM bring the total to 2GB on each node<br />The new counters after 1 week are<br /><br />H: disk queuing avg 5<br />J: disk queuing avg 0.2<br />H: reads per second avg 16<br />H: writes per second avg 7<br />J: reads per second avg 19<br />J: writes per second avg 6<br /><br />Batch requests per second averaging 18<br />Processor utilization avg 35%<br />Processor Queeuing avg 1.5<br />Memory page reads/sec avg 18<br /><br />I also need to calculate the array size and configuration for a new system<br />It seems to me that this formula tells me to buy a 1 disk system and I would have no problems.<br /><br />I also am still trying to figure out what was causing the queueing on the H Drive.<br /><br />Are there other ways to estimate the # of drives needed in a system.<br /><br />Thanks in advance for any help or comments
  2. Luis Martin Moderator

    I presume you have SQL 2000 with last service pack.
    About queueing on drive H:
    Run Profiler to find witch queries take more time, copy and paste to SQL Analyzer and see execution plan to find out if there is any optimization to do.
    Also you may run Index Tuning Wizard to those queries, using SQL Analyzer.
    If you allready did this tasks, I mean all optimization task, I gess the queueing is because RAID 5.


    Luis Martin
    Moderator
    SQL-Server-Performance.com
  3. Lance New Member

    I am seeing queueing with so few IO's per second.
    I thought theses drives could easily handle 40 IO's per second across 4 drives.
    The package is a purchased package so there is not much tuning I can do.

    Any suggestion on what formula's to use to calculate drives needed to support my new ERP system?

    Thanks.
  4. Luis Martin Moderator

  5. joechang New Member

    the avg disk queue length on a sql server data disk set should be relatively low most of the time with occasional spikes from the checkpoints.
    it is only necessary to keep avg disk queue length low during the low interval, i would not worry about how high the disk queue gets during the spikes, only that it gets cleared up quickly.
    also, what is avg sec /read and /write, regardless of the queue length, if these values are low, then you have nothing to worry about,
    during off hours, you should use IOMeter to determine what your disks can actually handle without relying on the live load, which only tells you what your current load is, not what the disks can handle
  6. Lance New Member

    Thanks for the links.<br />I am using the same formula's<br />I am just confused how the low level of IO 17 requests per second for the H: array is causing such major disk queueing. These drives are supposed to support 100 to 125 operations per second. The formulas indicate this level of IO should be easily handled by 1 drive and should not even be a problem for a 4 drive array.<br /><br />I am not sure these formulas will produce any accurate results when I try to calculate the drive needs for my new systm.<br /><br />I hope I am doing something wrong in the calculations, otherwise I have to recomend buying one drive for my new erp system[<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br />I am trying to justify buying a new enterprise SAN for about 100K vs upgrasding our exisintg SAN for about 40K. The current numbers I calculate do not suppor the need for any type of SAN.<br /><br />Thanks
  7. Lance New Member

    The avg sec/read is 0.3
    The avg sec/write is 0.5
    These values look high to me.

    The values I used were averages from the data I collected over a year
    The average value of my disk queue was 16 for H: 4 times the recommended value.

    Thanks
  8. joechang New Member

    oh my!!!
    how does such light disk activity cause such monsterous access times.
    just to confirm, these are the values as read from perfmon (time units in seconds and not milli-sec?)
    under light load, up to queue depth 1, avg sec/read or /write for random io across the entire disk should be in the range of 10ms, 3.3ms for rotational latency, 7ms for avg seek time.
    a good quality 10K SCSI disk is a good quality disk, i cannot believe that there is anything wrong with your disks (unless they are IBM Ultrastars from 1999-2001, throw those away, or use them a paper weights)
    something must wrong with your host adapter or raid controller,
    whatever you buy, i would demand a guarantee on disk access times for queue depth 1 per physical disk, random access across the entire disk of <10ms,
    SAN salespeople seem to like to sell expensive stuff, them recommend the absolute worst possible configuration for database performance
  9. Lance New Member

    We are using compaq drives in RA4100 cabinets connected via fiber switch.

    I have 2 RA 4100's
    G:& H: are in 1 cabinet each array is 4 (36gb)drives in a Raid 5 config
    I:& J: are in 1 cabinet
    I: 8 (36gb) drives in raid 5
    J: 3 (36gb) drives in raid 5

    H: & J: are used by node 1 of the cluster
    G: & I: are used by node 2 of the cluster

    These values are from perfmon data collected over a year at 5 minute intervals
    Sec/Read Sec/Write Writes/Sec Reads/Sec
    G: 0.05 0.1 2 7
    H: 0.3 0.5 8 36

    I: 0.02 0.04 3 9
    J: 0.03 0.04 7 22

    Any ideas on what else I should check on the slow cabinet.

    Thanks for all your help
  10. satya Moderator

    Which version of SQL and service pack is used?
    What options are set for Lightweight Pooling, Max Worker Threads, Memory and any Priority Boost option is used?

    Can take reference from these KBAs
    -http://support.microsoft.com/default.aspx?scid=kb;EN-US;298475
    -http://support.microsoft.com/default.aspx?scid=kb;EN-US;243588
    -http://support.microsoft.com/default.aspx?scid=kb;EN-US;254321

    HTH

    Satya SKJ
    Moderator
    http://www.SQL-Server-Performance.Com/forum
    This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
  11. satya Moderator

    Which version of SQL and service pack is used?
    What options are set for Lightweight Pooling, Max Worker Threads, Memory and any Priority Boost option is used?

    Can take reference from these KBAs
    -http://support.microsoft.com/default.aspx?scid=kb;EN-US;298475
    -http://support.microsoft.com/default.aspx?scid=kb;EN-US;243588
    -http://support.microsoft.com/default.aspx?scid=kb;EN-US;254321

    HTH

    Satya SKJ
    Moderator
    http://www.SQL-Server-Performance.Com/forum
    This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
  12. joechang New Member

    when the avg sec/xfer is long such that most of the time is comprised of wating in the queue, then the following applies:

    avg sec/xfer x xfers/sec = queue depth

    so if avg sec/xfer is 0.4, and your xfer/sec is 40, then queue depth should be approx 10.

    as to cause, where is the raid controller for each cabinet, is it in the cabinet itself, what are the settings for each raid controller, does one have more cache than the other, sometimes an overly large cache can cause this,
    it do not see how a sql server setting can cause one cabinet to be ok and another to be slow
  13. Lance New Member

    We had an issue with error 823's occuring (torn pages).
    The actual issue was a conflict between the raid controller cache and sql server caching.
    We had to turn of the controller caching to get rid of the 823 errors.
    We have had no update from HP/Compaq on the controller so we are leaving the cache off

    date Avg. Disk sec/Read H:Avg. Disk sec/Write H:
    Cache ON 0.022076 0.020023
    Cache Off 0.29865 0.490507


    We are running SQL Ent 2000 (SP3) on a 2 processor box with 2GB of Ram
    sp_configure results are

    name minimum maximum config_value run_value
    ----------------------------------- ----------- ----------- ------------ -----------
    affinity mask -2147483648 2147483647 0 0
    allow updates 0 1 0 0
    awe enabled 0 1 0 0
    c2 audit mode 0 1 0 0
    cost threshold for parallelism 0 32767 5 5
    Cross DB Ownership Chaining 0 1 0 0
    cursor threshold -1 2147483647 -1 -1
    default full-text language 0 2147483647 1033 1033
    default language 0 9999 0 0
    fill factor (%) 0 100 0 0
    index create memory (KB) 704 2147483647 0 0
    lightweight pooling 0 1 0 0
    locks 5000 2147483647 0 0
    max degree of parallelism 0 32 0 0
    max server memory (MB) 4 2147483647 2147483647 2147483647
    max text repl size (B) 0 2147483647 65536 65536
    max worker threads 32 32767 255 255
    media retention 0 365 0 0
    min memory per query (KB) 512 2147483647 1024 1024
    min server memory (MB) 0 2147483647 0 0
    nested triggers 0 1 1 1
    network packet size (B) 512 65536 4096 4096
    open objects 0 2147483647 0 0
    priority boost 0 1 0 0
    query governor cost limit 0 2147483647 0 0
    query wait (s) -1 2147483647 -1 -1
    recovery interval (min) 0 32767 0 0
    remote access 0 1 1 1
    remote login timeout (s) 0 2147483647 20 20
    remote proc trans 0 1 0 0
    remote query timeout (s) 0 2147483647 600 600
    scan for startup procs 0 1 1 1
    set working set size 0 1 0 0
    show advanced options 0 1 1 1
    two digit year cutoff 1753 9999 2029 2029
    user connections 0 32767 0 0
    user options 0 32767 0 0

    Thanks
  14. satya Moderator

    Any other applications sharing resources on this box?
    Any time tried to refresh SQL services or reboot the server?

    Satya SKJ
    Moderator
    http://www.SQL-Server-Performance.Com/forum
    This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
  15. Lance New Member

    This is a a sql server only box. Multiple databases are on the array in question.
    We rebooted about a week ago to install the additional memory and security hot fix.
  16. satya Moderator

    You may search under this forum posted by DBADave for the RAID size calculations etc.

    HTH

    Satya SKJ
    Moderator
    http://www.SQL-Server-Performance.Com/forum
    This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
  17. joechang New Member

    well that explains the long sec/xfer and queuing,
    even the 0.020 with cache on is very long
    there are other raid controller that do not cause write cache io errors, would ditch this current raid controller as soon as possible, as it is crippling performance
  18. fullbrij New Member

    I think the question is: Where did all the IO go?

    I can think of a few things:

    1. Track Alignment
    2. Element Size/alignment
    3. NTFS allocation unit size
    4. NTFS Journal size
    5. Application IO patterns

    For reads, SQL reads pages from the database into buffers in memory. The size of a page is 8K, unless you are doing many sequential rows and prefetch kicks in. In this case, the IO request size is 64K. Backups typically read in 64K chnks also. Writes to the logs [all writes are first written to the logs and then later committed to the database] are 8K as well as writes to the database.

    For write performance, the first logical place to look is the log drive. You will never write any faster than the maximum performance of your log drive. For read perfomance, the limiting factor is the database drive. I guess this is a good place for the formulas:

    Given P is the performance of a single physical spindle [100~120 IO/sec for a 10K drive], and N is the number of physical spindles in a given set then:

    For RAID 1/10

    Write = P*N/2
    Read = P*N

    For RAID 5

    Write = P*N/4
    Read = P*(N-1)

    If you have a mirror dedicated to logs, then the maximum write performance will be 100~120 IO/sec. There are several things that can take away from this and leave you with less than expected performance.

    1. Track Alignment. Most disk geometries have 64 sectors per track. Windows uses the first 63 sectors for volume information prior to the start of the first partition. This causes the first partition to start on the last sector of the first track, and every 4th IO [with an 8K IO size] to cross track boundaries. Each time you cross a track boundary, it adds at least 1ms to the IO. It adds up.

    2. Element Alignment. For arrays with an element size [RAID 10 and RAID 5] an issue similar to track alignment can cause an IO request to cross element boundaries and create an extra IO against a spindle on the back end. If you have an element size of 128K, and an IO size of 8K, then one out of 16 IOs will cross an element boundary and generate extra IO. The solution to both track alignment and element alignment is to use the diskpar utility from the windows 2000 resource kit to align the start of the partition with a track boundary of which the element size is an even multiple. If your lelment size is 128K, and your IO size is 8K, then you could set the alignment offset to 64 sectors or 32K.

    3. NTFS Allocation unit size. The default allocation unit size when you format an NTFS disk is 4K. SQL has an 8K IO size. If you used smartstart or Server Assisant to create your servers, the 3rd party partioioning software [both HP and Dell] converts the OS partition. This results in an allocation unit size of 512 bytes with all the associated negative performance impact. If you use the default allocation unit size of 4K and the IO size is 8K, Windows will split each IO into 2 which are sent to the backend, effectively halving your IO throughput. You should use a minimum 8K allocation unit size when you format an NTFS disk for SQL logs/data. You may want a larger allocation unit size if doing OLAP or if you want to reduce the impact of backups. Because of track crossing, it's typically not a good idea to use an allocation unit size larger than 32K.

    4. NTFS Journal size. NTFS is a journaling file system and has transaction logs and a checkpoint that are kept in a special area on the ntfs volume. Under heave write IO, excessive checkpointing can occur. You can use chkdsk /L: to increase the size of the journal area and reduce the impact. You may also want to disable the last accessed time. Each time a file on an NTFS partion is accessed [read or written to] the last access time is updated in the MFT entry. You can disable this through a registry setting.

    5. Application IO patterns. Log file access is mostly sequential writes. In OLTP applications, database access is random reads and writes. In OLAP applications, data access is commonly mostly sequential reads. By seperating out the databases and logs onto seperate physical sets of spindles [LUNS created through you storage processor's virtualization layer don't count; they're all hitting the same set of physical spindles] you can tune cache. Write cacke can have a significant impact for logs, as long as the IO remains mostly sequential. A small ammount of read and write cache can reduce the RAID 5 penalty from P*N/4 to P*N/3.





    John
    MOSMWNMTK
  19. Lance New Member

    Thanks for all the information. I am afraid it will take me a while to sort it all out.
    Can you direct me to any references for more information about this.

    I have the logs and data files on separate physical RA4100's (Compaq disk systems).

    If I have to change any of these settings I assume it requires a rebuild of the disk arrays.

    Thanks.
  20. fullbrij New Member

    chkdsk /L is non-destructive

    Disabling last access updates is non-destructive

    Changing the allocation unit size requires reformatting

    Disk alignment requires reformatting

    Changing the RAID type will require a rebuild of the array

    Seperating IO is non-destructive



    John
    MOSMWNMTK
  21. Lance New Member

    Thanks

    I believe I have the IO's separate data on the H drive and logs on the J drive. Unfortunately both arrays are raid5.

    It will be a while (if ever) before I can schedule production to be down to rebuild the entire windows and sql cluster.

    I will try disabling the last update on my test system to see if how that improves performance.
  22. Lance New Member

    I have been monitoring the the split IO/sec counter in perfmon and it is staying at 0 all the time. Is this the split IO you are referring to in point 3 below?

    3. NTFS Allocation unit size. The default allocation unit size when you format an NTFS disk is 4K. SQL has an 8K IO size. If you used smartstart or Server Assisant to create your servers, the 3rd party partioioning software [both HP and Dell] converts the OS partition. This results in an allocation unit size of 512 bytes with all the associated negative performance impact. If you use the default allocation unit size of 4K and the IO size is 8K, Windows will split each IO into 2 which are sent to the backend, effectively halving your IO throughput. You should use a minimum 8K allocation unit size when you format an NTFS disk for SQL logs/data. You may want a larger allocation unit size if doing OLAP or if you want to reduce the impact of backups. Because of track crossing, it's typically not a good idea to use an allocation unit size larger than 32K.


    Thanks
  23. fullbrij New Member

    also increase the journal size - chkdsk /L:65536



    John
    MOSMWNMTK
  24. mikeharms New Member

    You should really try and monitor what is occuring with SQL Server or even Windows at the time you are seeing Disk Queuing. Averages don't work very well as the average of disk queuing doesn't include all the zeros. It's averaging the events together. So if you get some spikes due to check point occuring and never anything else, your average will be high.

    I/O's are occuring all the time, thus the Average I/O's will be low or will average out the peaks. You'll do better to zero in on what's going on by watching the immediate counters rather than averages. Say monitor LogicalDisk - Current Disk Queue Length and SQLServer:Buffer Manager - Checkpoint pages/sec. See if the queuing you see occurs at the same time.

    With your H drive having 4 disks, if you are now able to see no more than a disk queue of 5, then I'd say it's doing pretty well. Allow for 2 per spindle and your system should provide sufficient performance as long as the number stays under 8.

    As far as the number of drives for the new system. It depends.

    What type of system are you supporting. Data Warehouse, Online Transactional system?

    The first thing to figure out is the physical space you need. That will drive many of your options. Then determine the RAID level you wish to use. I prefer RAID 0+1 but that isn't always an option.

    We can go over some examples if you want. Budget will play a big role as well as the Server and how many drives it can physically handle. Ideally you want more spindles for data where random reads and writes will go. Logs can get by with less. And how your application works with TempDB will also affect it's drives would be setup and how many you might have.

Share This Page