SQL Server Performance

NTFS Cluster Size

Discussion in 'Performance Tuning for Hardware Configurations' started by gryphonsclaw, Nov 1, 2005.

  1. gryphonsclaw New Member

    I was wondering if changing the default cluster size on the disk is a good idea. I was searching online and found a couple snippets about setting the disk to a 64K cluster size (8 x 8KB pages).

    Our app DB is on a RAID 10. I was wondering if anyone has seen this boost disk performance and under what scenarios (tons of users with lots of reads, every scenario, lots of writes, etc...)?

    Thank you for any thoughts in advance.

    Thanx
    Adam
  2. Luis Martin Moderator

    Can't you post some counters about disk like: Avg. Disk Queue Length, etc?


    Luis Martin
    Moderator
    SQL-Server-Performance.com

    One of the symptoms of an approaching nervous breakdown is the belief that one's work is terribly important
    Bertrand Russell


    All postings are provided “AS IS” with no warranties for accuracy.



  3. gryphonsclaw New Member

    Sure here are some stats taken for a full day.

    Disk Queue Avg 1.4 = MAX 48.6
    Disk Transfers / sec Avg 196.1 MAX 2457.9
    Disk MB/Sec Avg 12.6 MAX 131.48

    Not sure if changing cluster size depended on load or just a general rule of thumb.

    Thanx
    Adam
  4. Luis Martin Moderator

    I think your numbers are very good, but wait for others members oppinions.


    Luis Martin
    Moderator
    SQL-Server-Performance.com

    One of the symptoms of an approaching nervous breakdown is the belief that one's work is terribly important
    Bertrand Russell


    All postings are provided “AS IS” with no warranties for accuracy.



  5. satya Moderator

    It is not a rule of thumb that if you change cluster size the performance will be increased.
    You must consider the options such as frequent transaction log backup and writes to the data file will have better balances the trade off between disk defragmentation due to smaller cluster size and wasted space due to a large cluster size.

    Recently I'd been through an article discussing the same, if I can find it will post here.

    Satya SKJ
    Moderator
    http://www.SQL-Server-Performance.Com/forum
    This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
  6. gryphonsclaw New Member

    The log is on a separate drive array, RAID 1 (4KB cluster).

    We also run O&O Defrag on the drives to combat the disk defrag. Hopefully, that gives some more insight.

    Thanx
    Adam
  7. FrankKalis Moderator

    See if this helps:http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlops6.mspx
    In the lower half of that article you can read:

    quote:
    Windows NT File System (NTFS) Allocation Unit

    SCSI Drives: When you format the new drives in Disk Administrator, you should consider an allocation unit, or block size, that will provide optimal performance. Significant performance gains may be obtained by sizing this to a larger value in order to reduce disk I/Os; however, the default value is based on the size of the physical disk. The best practice for SQL Server is to choose 64 KB, because this reduces the likelihood of I/Os that span distinct NTFS allocations, which then might result in split I/Os. Keep in mind that although this information can be useful, the type of storage you are using (and in some cases also your backup software) will drive the format of your disks. If you are changing the block size on an existing system, be sure to run a baseline in your test environment and another after you have tested the changes.

    For more information on planning and configuring hardware, see Inside SQL Server 2000 by Kalen Delaney. For specific information on your hardware, refer to the customer service site of your hardware manufacturer.

    --
    Frank Kalis
    Microsoft SQL Server MVP
    http://www.insidesql.de
    Ich unterstütze PASS Deutschland e.V. http://www.sqlpass.de)
  8. satya Moderator

    May also check with MS PSS for better resolution and support in this regard, if you have any.

    Satya SKJ
    Moderator
    http://www.SQL-Server-Performance.Com/forum
    This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
  9. sql_machine New Member

    I actually ran tests to verify the improved performance with 64K, and found some improvement in writes, but no improvement (and in some cases indeed degradation) in reads. I have not done any backup testing though, can't speak to that.

    So this really depends on your specific environment, I'd recommend testing. As far as fregmentation with larger disk blocks, if you're rebuilding indexes regularly this would be alleviated.
  10. bertcord New Member

    I also performned tests on Cluster size . I tested the following

    4KB
    8KB
    64KB

    I ran 2 tests
    1. 8KB Random reads
    2. Mixed workload that represents our Application workload
    60% Read IO of 64KB size
    40% Write IO of 16KB size

    I saw no differnece in performance

    Bert
  11. jck New Member

    i too wondered as to the affects that sector size compared to read/write requests had on the drive sub-system, as with any database server, SQL can have huge read/write requests, so i had to satisfiy that question for myself...

    what i used and the hardware that was used is listed here along with the results:

    Dell PE6850 with quad Xeon 1.9Ghz CPUs, 10GB of ram, two QLA2340 HBAs, connected thru a cisco MDS9509 FC switch to a EMC CX500, with powerpath 4.4.1 installed. OS is windows server 2003 enterprise, boot.ini file has the /3GB /PAE set. i used the microsoft SQL drive performance tool "SQLIO.EXE"(Ver: v1.5.SG) for the test program. The LUN used is a set of 14 - 146GB 10Krpm Fiber Channel drives in a raid-10. i aligned the sectors from the OS to a 128 boundery to pervent strip crossing for a single read. could have aligned the sectors on the CX500, but this can cause issues if mirrorview is used to mirror the SAN to a remote location.

    The CX500 is not in procuction yet, so the only system to have access to it is this test server. no other devices are accessing, or causing any load on the system during these test.

    so the proceedure used during the test was to format the sector size, run the test, reformat the sector size, run the test again. i used a file size (3GB)large enough to ensure that the cache would get saturated and have to access the physical drives.

    in the "Param.txt" file is the line "m: estfile.dat 2 0x0 3000"

    and in a test.bat file is for the paramaters (taken from the SQLIO.exe txt file)...
    sqlio -kW -s360 -frandom -o8 -b8 -LS -Fparam.txt timeout /T 60
    sqlio -kW -s360 -frandom -o8 -b64 -LS -Fparam.txt timeout /T 60
    sqlio -kW -s360 -frandom -o8 -b128 -LS -Fparam.txt timeout /T 60
    sqlio -kW -s360 -frandom -o8 -b256 -LS -Fparam.txt timeout /T 60

    sqlio -kW -s360 -fsequential -o8 -b8 -LS -Fparam.txt timeout /T 60
    sqlio -kW -s360 -fsequential -o8 -b64 -LS -Fparam.txt timeout /T 60
    sqlio -kW -s360 -fsequential -o8 -b128 -LS -Fparam.txt timeout /T 60
    sqlio -kW -s360 -fsequential -o8 -b256 -LS -Fparam.txt timeout /T 60

    sqlio -kR -s360 -frandom -o8 -b8 -LS -Fparam.txt timeout /T 60
    sqlio -kR -s360 -frandom -o8 -b64 -LS -Fparam.txt timeout /T 60
    sqlio -kR -s360 -frandom -o8 -b128 -LS -Fparam.txt timeout /T 60
    sqlio -kR -s360 -frandom -o8 -b256 -LS -Fparam.txt timeout /T 60

    sqlio -kR -s360 -fsequential -o8 -b8 -LS -Fparam.txt timeout /T 60
    sqlio -kR -s360 -fsequential -o8 -b64 -LS -Fparam.txt timeout /T 60
    sqlio -kR -s360 -fsequential -o8 -b128 -LS -Fparam.txt timeout /T 60
    sqlio -kR -s360 -fsequential -o8 -b256 -LS -Fparam.txt timeout /T 60

    Sample of the output for the SQLIO.exe program:
    C:program FilesSQLIO>sqlio -kW -s360 -frandom -o8 -b8 -LS -Fparam.txt timeout /T 60
    sqlio v1.5.SG
    using system counter for latency timings, 1890000000 counts per second
    parameter file used: param.txt
    file m: estfile.dat with 2 threads (0-1) using mask 0x0 (0)
    2 threads writing for 360 secs to file m: estfile.dat
    using 8KB random IOs
    enabling multiple I/Os per thread with 8 outstanding
    size of file m: estfile.dat needs to be: 3145728000 bytes
    current file size:0 bytes
    need to expand by:3145728000 bytes
    expanding m: estfile.dat ... done.
    using specified size: 3000 MB for file: m: estfile.dat
    initialization done
    CUMULATIVE DATA:
    throughput metrics:
    IOs/sec: 5767.94
    MBs/sec: 45.06
    latency metrics:
    Min_Latency(ms): 0
    Avg_Latency(ms): 2
    Max_Latency(ms): 539
    histogram:
    ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
    %: 0 37 32 16 8 4 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

    RESULTS:
    Format: 64K sectors (sorry the forum compacted the space/tab formatting i did)
    R/W Ran/Seq I/O-KB I/O-Sec MBs/Sec Min Lat ms Ave Lat ms Max Lat ms
    w r85770.5545.0802576
    wr641534.7595.9219757
    wr128828.01103.5318467
    wr256465.14116.28633357
    ws811022.9786.1101145
    ws641287.580.46111248
    ws128687.4885.9342292
    ws256350.1687.54345204
    rr82541.7619.8505269
    rr641732.21108.2608164
    rr1281022.18127.77015210
    rr256580.37145.09126307
    rs817444.41136.2800141
    rs643185.75199.104143
    rs1281593.98199.2409146
    rs256801.59200.39119164

    Format: 32K sectors
    R/W Ran/Seq I/O-KB I/O-Sec MBs/Sec Min Lat ms Ave Lat ms Max Lat ms
    wr85740.3244.8402516
    wr641416.5188.53210743
    wr128752.5394.06320474
    wr256395.3298.83639318
    ws810916.3885.2801160
    ws641306.2381.63111110
    ws128691.686.4532292
    ws256348.987.22945116
    rr82546.4119.8905356
    rr641352.5584.53011142
    rr128830.97103.87018237
    rr256421.82105.45137494
    rs817083.7133.4600144
    rs643079.31192.4504149
    rs1281583.95197.9919151
    rs256757.06189.26120177

    Format: 8K sectors
    R/W Ran/Seq I/O-KB I/O-Sec MBs/Sec Min Lat ms Ave Lat ms Max Lat ms
    wr85767.9445.0602539
    wr641433.7889.61110578
    wr128778.1997.27320478
    wr256401.98100.49739355
    ws810977.1385.7501174
    ws641294.6880.91111127
    ws128691.7586.46422122
    ws256348.9587.23945145
    rr82544.9619.8805345
    rr641343.1483.94011209
    rr128873.6109.2017192
    rr256389.897.45140428
    rs816939.13132.3300110
    rs643105.41194.0804150
    rs1281588.24198.5319158
    rs256752.85188.21120172


    As a comparison (red apples to green apples)....to illistrate the wide difference between configurations and drive technology...

    This is 8 - 320GB 7200RMP SATA drive set in a RAID-10 on the same CX500 system, Formatted to 64K sectors and sector aligned.... half the drive count of the FC raid-10 in the above tests, but you can see "a more than half" in difference in drive performance, in fact, closer to 2-5 fold in performace difference.

    Format: 64K sector
    R/W Ran/Seq I/O-KB I/O-Sec MBs/Sec Min Lat ms Ave Lat ms Max Lat ms
    wr81254.849.8012846
    wr64315.0119.68250882
    wr128307.3238.413511022
    wr256137.4934.378115964
    ws82686.6720.9805432
    ws64322.3320.14149410
    ws128156.5219.568101476
    ws25679.7519.933200679
    rr8520.874.06030605
    rr64472.0729.5033583
    rr128310.6938.83051591
    rr256202.2250.55178603
    rs82132.3316.6506534
    rs64611.8838.24025512
    rs128761.6595.2020500
    rs256434.8108.7136544

    and lastly, As a comparison (apples to oranges)....to further illistrate the wide difference between Disk Sub-Systems, configurations, and drive technology...

    This is 15 - 400GB 7200RMP SATA drive set in a RAID-5 on a "INFORTREND" array system (ESA16F-R1211), single storage processor, 1GB cache, Formatted to 64K sectors and sector aligned.... almost same drive count of the CX500 FC raid-10 in the above tests, but you can see HUGE differences in drive performance, in fact, a 4-10 fold in performace difference. (good system for D2D2T, or file share storage, but just did not have the performance needed by or sql applications, but did not have a tenth of the price tag either)...

    Format: 64K sectors
    R/W Ran/Seq I/O-KB I/O-Sec MBs/Sec Min Lat ms Ave Lat ms Max Lat ms
    wr8397.733.1439419
    wr64190.111.88138219984
    wr128115.4414.43191382772
    wr25682.7820.6961192484
    ws8414.263.2383884
    ws64291.1718.191454182
    ws128219.7127.462072191
    ws256150.3237.5830105336
    rr81599.9312.4909312
    rr641045.7265.35014254
    rr128636.6979.58024301
    rr256313.7378.43150267
    rs85616.7143.880227
    rs64974.9560.9311547
    rs128491.8961.48113271
    rs256252.5563.132062106

    Conclusion: in some cases it made some difference, in others, the difference was 147MB/sec thruput, this is sustained thruput and not "Burst" which we are becoming complacent in judging the performance of drives nowdays.

    Depending on the disk subsystem you have and the raid type used, there could be a HUGE difference seen by you and your testing. It really depends on what you can afford for a disk sub-system. (my money is on EMC)

    In the end, it still is your choice and up to you to decide if it is worth the effort to baseline your system, make an adjustment, and test to see if this has made any differences, as every database is different, there is no "completely" one correct choice. And i did not use a running database capture (transaction logs) to test against a live sql database.

    taken for the MS guide (mentioned earlier in this thread)
    http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlops6.mspx
    "The best practice for SQL Server is to choose 64 KB, because this reduces the likelihood of I/Os that span distinct NTFS allocations, which then might result in split I/Os."


    For what is worth, i hope this helps in answered more questions than it raised.


    JCK
  12. bertcord New Member

    I am interested at looking at these numbers but they are a bit hard to read….do you have the original SQLIO output files that you can provide?

    How much cache is in your CX500, 2GB? With 2GB of memory that still leave a ton of cached data. Is it possible to create a larger file with SQLIO? The last version I used I think the file size was limited to 4GB.

    This is why I used IOMeter with a 100GB test file. I tested 4KB, 8KB and 64KB on both my 3Par S400 and EMC CX700 system and saw very similar performance.

    Can you provide more information on your SATA drives. IT woudl be interesting to see if its the actually drive that is the issue or the controllerplatform.

    Bert


  13. johnchaas New Member

    As soon as you go above 4k, you lose access to many disk utilities, which may worsen performance. One thing you will lose is the ability to defragment your disk.
  14. gryphonsclaw New Member

    Thank you for all the input on this. One thing I do have in place is a disk defragment using O&O Defrag. It works with the 64K size. It allows me to scheudle the defrags and works with the large files. I also don't have to take the database offline, which is nice. I'm not sure DiskKeeper or windows defrag works with the bigger size.

    Thanx
    Adam
  15. avkesq New Member

    From a defrag perspective, cluster sizes of over 4K were not supported on NT, but I know for sure that it runs fine on XP and also 2003.

    In general, for the volume in question, it would make sense (in theory) to keep the RAID stripe size and NTFS cluster size the same, and to align it with whatever you will need for SQL (depending on whether you are doing transaction processing or BI etc). However, on some storage subsystems, it makes absolutely no difference. For example, the default RAID stripe size on the 3PAR system is 256K (versus 64K on the Clariion). We changed this to 32K and 64K on th 3PAR for specific volumes and gave it go. Absolutely no difference. Be aware that some storage systems are optimized to work with certain RAID stripe sizes and so by too much fine tuning, you may actually end up degrading overall storage frame performance !

    Eventually, it is my belief that spindle-count is king. The larger the spindle-count, the better the performance. If you want to spend more time managing your database and less time trying to optimize LUNs on your storage frame, I would advise you to seriously consider 3PAR. In terms of both performance (it produces performance optimized LUNs by default everytime, through some very clever virtualization) and bag for buck, no one else (all 3-letter acronyms here) comes even remotely close. Incidentally, according to Gartner's Magic Quadrant, 3PAR is #1 in the visionary quadrant, beating all the rest by a noticeable margin. You can run all sorts of mixed loads on the system and not have to worry about performance optimization. The LUNS for everything (Exchange, SQL, file/print, you name it !) are evenly spread across all the spindles across the entire storage frame and the thing is so fast, it smokes ! For the hell of it, I artificially (from a 3PAR perspective) created two equal sized LUNs, each on half the spindle population on the frame (ie., 32 spindles each, at that time) and tried some sustained file transfer operations (20GB file) between the two volumes. The timing was no better than when I created the two volumes on the same set of spindles (64 spindles total). So you could dump your 60GB+ database on a dump drive (if you wanted to) on the same set of spindles as your database and it would still fly .... very fast.
  16. Frank Brennan New Member

    I've done some work in this area.

    1. place your database file on separate LUNs from your log file(s).
    2. use diskpar(2000) or diskpart(2003) to create a single partition on each LUN.
    3. during the creation of the partitions align be sure to align on (128sectors*512b sector)=64K track boundary. This will reduce the number of head movements caused by track spanning. It also reduces cache buffer allocations for trailing data that you may not access once it has been cached.
    4. when you create the volume on each partition you need to do a full format and specify an AU Size of 64K on all volumes that will host your database files and an AU Size of 4K for volumes that will host your log files. Logs are read and written to differently than database files. mostly all sequential and not page/extent driven. However, the track boundary alignment is not as important with an AU of 4k as it is with and AU of 64k it is still very important. With a 64k au misaligned you'll track span on each I/O as compared to track spanning on every 16th I/O with an AU of 4k.
    5. Track boundary misalignment causes latency due to head movement but it also helps to over work your hard drives resulting if failure sooner rather than later.
    6. Additionally, if you have the option of striping your firmware RAID at 256k over 64k or 128k you should opt for 256k because it helps to increase preformance of SQL Servre read-aheads.
    7. you will see your most significant i/o gains during backups and restores.
    8. should also mention that your spindle count will have an significant impact Direct attached, SAN or NAS. if you storage is shared you would do well to use a dedicated horizontal allocation using a high spindle count. multiple I/O bus usage as long as the spindle rotation and i/o port speed will keep up with the added bandwidth.
    9. HBAs - Opt for pci-express instead of pci or pci-x. the multipath gate architecture of pci-express allows for faster throughput and reduces BUS noise which you'll encounter as you add more pci and pci-x controls.

    There is a lot more that goes into this strategy like partitioning and file placement but this should be a good start for you if you haven't already addressed these issues.

    hope this helps.


    quote:Originally posted by gryphonsclaw

    I was wondering if changing the default cluster size on the disk is a good idea. I was searching online and found a couple snippets about setting the disk to a 64K cluster size (8 x 8KB pages).

    Our app DB is on a RAID 10. I was wondering if anyone has seen this boost disk performance and under what scenarios (tons of users with lots of reads, every scenario, lots of writes, etc...)?

    Thank you for any thoughts in advance.

    Thanx
    Adam

Share This Page