NTFS Cluster Size | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

NTFS Cluster Size

I was wondering if changing the default cluster size on the disk is a good idea. I was searching online and found a couple snippets about setting the disk to a 64K cluster size (8 x 8KB pages). Our app DB is on a RAID 10. I was wondering if anyone has seen this boost disk performance and under what scenarios (tons of users with lots of reads, every scenario, lots of writes, etc…)? Thank you for any thoughts in advance. Thanx
Adam
Can’t you post some counters about disk like: Avg. Disk Queue Length, etc?
Luis Martin
Moderator
SQL-Server-Performance.com One of the symptoms of an approaching nervous breakdown is the belief that one’s work is terribly important
Bertrand Russell
All postings are provided “AS IS” with no warranties for accuracy.
Sure here are some stats taken for a full day. Disk Queue Avg 1.4 = MAX 48.6
Disk Transfers / sec Avg 196.1 MAX 2457.9
Disk MB/Sec Avg 12.6 MAX 131.48 Not sure if changing cluster size depended on load or just a general rule of thumb. Thanx
Adam
I think your numbers are very good, but wait for others members oppinions.
Luis Martin
Moderator
SQL-Server-Performance.com One of the symptoms of an approaching nervous breakdown is the belief that one’s work is terribly important
Bertrand Russell
All postings are provided “AS IS” with no warranties for accuracy.
It is not a rule of thumb that if you change cluster size the performance will be increased.
You must consider the options such as frequent transaction log backup and writes to the data file will have better balances the trade off between disk defragmentation due to smaller cluster size and wasted space due to a large cluster size. Recently I’d been through an article discussing the same, if I can find it will post here. Satya SKJ
Moderator
http://www.SQL-Server-Performance.Com/forum
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
The log is on a separate drive array, RAID 1 (4KB cluster). We also run O&O Defrag on the drives to combat the disk defrag. Hopefully, that gives some more insight. Thanx
Adam
See if this helps:http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlops6.mspx
In the lower half of that article you can read:
quote:
Windows NT File System (NTFS) Allocation Unit SCSI Drives: When you format the new drives in Disk Administrator, you should consider an allocation unit, or block size, that will provide optimal performance. Significant performance gains may be obtained by sizing this to a larger value in order to reduce disk I/Os; however, the default value is based on the size of the physical disk. The best practice for SQL Server is to choose 64 KB, because this reduces the likelihood of I/Os that span distinct NTFS allocations, which then might result in split I/Os. Keep in mind that although this information can be useful, the type of storage you are using (and in some cases also your backup software) will drive the format of your disks. If you are changing the block size on an existing system, be sure to run a baseline in your test environment and another after you have tested the changes. For more information on planning and configuring hardware, see Inside SQL Server 2000 by Kalen Delaney. For specific information on your hardware, refer to the customer service site of your hardware manufacturer.


Frank Kalis
Microsoft SQL Server MVP
http://www.insidesql.de
Ich unterstütze PASS Deutschland e.V. http://www.sqlpass.de)

May also check with MS PSS for better resolution and support in this regard, if you have any. Satya SKJ
Moderator
http://www.SQL-Server-Performance.Com/forum
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
I actually ran tests to verify the improved performance with 64K, and found some improvement in writes, but no improvement (and in some cases indeed degradation) in reads. I have not done any backup testing though, can’t speak to that. So this really depends on your specific environment, I’d recommend testing. As far as fregmentation with larger disk blocks, if you’re rebuilding indexes regularly this would be alleviated.
I also performned tests on Cluster size . I tested the following 4KB
8KB
64KB I ran 2 tests
1. 8KB Random reads
2. Mixed workload that represents our Application workload
60% Read IO of 64KB size
40% Write IO of 16KB size I saw no differnece in performance Bert
i too wondered as to the affects that sector size compared to read/write requests had on the drive sub-system, as with any database server, SQL can have huge read/write requests, so i had to satisfiy that question for myself… what i used and the hardware that was used is listed here along with the results: Dell PE6850 with quad Xeon 1.9Ghz CPUs, 10GB of ram, two QLA2340 HBAs, connected thru a cisco MDS9509 FC switch to a EMC CX500, with powerpath 4.4.1 installed. OS is windows server 2003 enterprise, boot.ini file has the /3GB /PAE set. i used the microsoft SQL drive performance tool "SQLIO.EXE"(Ver: v1.5.SG) for the test program. The LUN used is a set of 14 – 146GB 10Krpm Fiber Channel drives in a raid-10. i aligned the sectors from the OS to a 128 boundery to pervent strip crossing for a single read. could have aligned the sectors on the CX500, but this can cause issues if mirrorview is used to mirror the SAN to a remote location. The CX500 is not in procuction yet, so the only system to have access to it is this test server. no other devices are accessing, or causing any load on the system during these test. so the proceedure used during the test was to format the sector size, run the test, reformat the sector size, run the test again. i used a file size (3GB)large enough to ensure that the cache would get saturated and have to access the physical drives. in the "Param.txt" file is the line "m: estfile.dat 2 0x0 3000" and in a test.bat file is for the paramaters (taken from the SQLIO.exe txt file)…
sqlio -kW -s360 -frandom -o8 -b8 -LS -Fparam.txt timeout /T 60
sqlio -kW -s360 -frandom -o8 -b64 -LS -Fparam.txt timeout /T 60
sqlio -kW -s360 -frandom -o8 -b128 -LS -Fparam.txt timeout /T 60
sqlio -kW -s360 -frandom -o8 -b256 -LS -Fparam.txt timeout /T 60 sqlio -kW -s360 -fsequential -o8 -b8 -LS -Fparam.txt timeout /T 60
sqlio -kW -s360 -fsequential -o8 -b64 -LS -Fparam.txt timeout /T 60
sqlio -kW -s360 -fsequential -o8 -b128 -LS -Fparam.txt timeout /T 60
sqlio -kW -s360 -fsequential -o8 -b256 -LS -Fparam.txt timeout /T 60 sqlio -kR -s360 -frandom -o8 -b8 -LS -Fparam.txt timeout /T 60
sqlio -kR -s360 -frandom -o8 -b64 -LS -Fparam.txt timeout /T 60
sqlio -kR -s360 -frandom -o8 -b128 -LS -Fparam.txt timeout /T 60
sqlio -kR -s360 -frandom -o8 -b256 -LS -Fparam.txt timeout /T 60 sqlio -kR -s360 -fsequential -o8 -b8 -LS -Fparam.txt timeout /T 60
sqlio -kR -s360 -fsequential -o8 -b64 -LS -Fparam.txt timeout /T 60
sqlio -kR -s360 -fsequential -o8 -b128 -LS -Fparam.txt timeout /T 60
sqlio -kR -s360 -fsequential -o8 -b256 -LS -Fparam.txt timeout /T 60 Sample of the output for the SQLIO.exe program:
C:program FilesSQLIO>sqlio -kW -s360 -frandom -o8 -b8 -LS -Fparam.txt timeout /T 60
sqlio v1.5.SG
using system counter for latency timings, 1890000000 counts per second
parameter file used: param.txt
file m: estfile.dat with 2 threads (0-1) using mask 0x0 (0)
2 threads writing for 360 secs to file m: estfile.dat
using 8KB random IOs
enabling multiple I/Os per thread with 8 outstanding
size of file m: estfile.dat needs to be: 3145728000 bytes
current file size:0 bytes
need to expand by:3145728000 bytes
expanding m: estfile.dat … done.
using specified size: 3000 MB for file: m: estfile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 5767.94
MBs/sec: 45.06
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 2
Max_Latency(ms): 539
histogram:
ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%: 0 37 32 16 8 4 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 RESULTS:
Format: 64K sectors (sorry the forum compacted the space/tab formatting i did)
R/W Ran/Seq I/O-KB I/O-Sec MBs/Sec Min Lat ms Ave Lat ms Max Lat ms
w r85770.5545.0802576
wr641534.7595.9219757
wr128828.01103.5318467
wr256465.14116.28633357
ws811022.9786.1101145
ws641287.580.46111248
ws128687.4885.9342292
ws256350.1687.54345204
rr82541.7619.8505269
rr641732.21108.2608164
rr1281022.18127.77015210
rr256580.37145.09126307
rs817444.41136.2800141
rs643185.75199.104143
rs1281593.98199.2409146
rs256801.59200.39119164 Format: 32K sectors
R/W Ran/Seq I/O-KB I/O-Sec MBs/Sec Min Lat ms Ave Lat ms Max Lat ms
wr85740.3244.8402516
wr641416.5188.53210743
wr128752.5394.06320474
wr256395.3298.83639318
ws810916.3885.2801160
ws641306.2381.63111110
ws128691.686.4532292
ws256348.987.22945116
rr82546.4119.8905356
rr641352.5584.53011142
rr128830.97103.87018237
rr256421.82105.45137494
rs817083.7133.4600144
rs643079.31192.4504149
rs1281583.95197.9919151
rs256757.06189.26120177 Format: 8K sectors
R/W Ran/Seq I/O-KB I/O-Sec MBs/Sec Min Lat ms Ave Lat ms Max Lat ms
wr85767.9445.0602539
wr641433.7889.61110578
wr128778.1997.27320478
wr256401.98100.49739355
ws810977.1385.7501174
ws641294.6880.91111127
ws128691.7586.46422122
ws256348.9587.23945145
rr82544.9619.8805345
rr641343.1483.94011209
rr128873.6109.2017192
rr256389.897.45140428
rs816939.13132.3300110
rs643105.41194.0804150
rs1281588.24198.5319158
rs256752.85188.21120172
As a comparison (red apples to green apples)….to illistrate the wide difference between configurations and drive technology… This is 8 – 320GB 7200RMP SATA drive set in a RAID-10 on the same CX500 system, Formatted to 64K sectors and sector aligned…. half the drive count of the FC raid-10 in the above tests, but you can see "a more than half" in difference in drive performance, in fact, closer to 2-5 fold in performace difference. Format: 64K sector
R/W Ran/Seq I/O-KB I/O-Sec MBs/Sec Min Lat ms Ave Lat ms Max Lat ms
wr81254.849.8012846
wr64315.0119.68250882
wr128307.3238.413511022
wr256137.4934.378115964
ws82686.6720.9805432
ws64322.3320.14149410
ws128156.5219.568101476
ws25679.7519.933200679
rr8520.874.06030605
rr64472.0729.5033583
rr128310.6938.83051591
rr256202.2250.55178603
rs82132.3316.6506534
rs64611.8838.24025512
rs128761.6595.2020500
rs256434.8108.7136544 and lastly, As a comparison (apples to oranges)….to further illistrate the wide difference between Disk Sub-Systems, configurations, and drive technology… This is 15 – 400GB 7200RMP SATA drive set in a RAID-5 on a "INFORTREND" array system (ESA16F-R1211), single storage processor, 1GB cache, Formatted to 64K sectors and sector aligned…. almost same drive count of the CX500 FC raid-10 in the above tests, but you can see HUGE differences in drive performance, in fact, a 4-10 fold in performace difference. (good system for D2D2T, or file share storage, but just did not have the performance needed by or sql applications, but did not have a tenth of the price tag either)… Format: 64K sectors
R/W Ran/Seq I/O-KB I/O-Sec MBs/Sec Min Lat ms Ave Lat ms Max Lat ms
wr8397.733.1439419
wr64190.111.88138219984
wr128115.4414.43191382772
wr25682.7820.6961192484
ws8414.263.2383884
ws64291.1718.191454182
ws128219.7127.462072191
ws256150.3237.5830105336
rr81599.9312.4909312
rr641045.7265.35014254
rr128636.6979.58024301
rr256313.7378.43150267
rs85616.7143.880227
rs64974.9560.9311547
rs128491.8961.48113271
rs256252.5563.132062106 Conclusion: in some cases it made some difference, in others, the difference was 147MB/sec thruput, this is sustained thruput and not "Burst" which we are becoming complacent in judging the performance of drives nowdays. Depending on the disk subsystem you have and the raid type used, there could be a HUGE difference seen by you and your testing. It really depends on what you can afford for a disk sub-system. (my money is on EMC) In the end, it still is your choice and up to you to decide if it is worth the effort to baseline your system, make an adjustment, and test to see if this has made any differences, as every database is different, there is no "completely" one correct choice. And i did not use a running database capture (transaction logs) to test against a live sql database. taken for the MS guide (mentioned earlier in this thread)
http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlops6.mspx
"The best practice for SQL Server is to choose 64 KB, because this reduces the likelihood of I/Os that span distinct NTFS allocations, which then might result in split I/Os."
For what is worth, i hope this helps in answered more questions than it raised.
JCK

I am interested at looking at these numbers but they are a bit hard to read….do you have the original SQLIO output files that you can provide? How much cache is in your CX500, 2GB? With 2GB of memory that still leave a ton of cached data. Is it possible to create a larger file with SQLIO? The last version I used I think the file size was limited to 4GB. This is why I used IOMeter with a 100GB test file. I tested 4KB, 8KB and 64KB on both my 3Par S400 and EMC CX700 system and saw very similar performance. Can you provide more information on your SATA drives. IT woudl be interesting to see if its the actually drive that is the issue or the controllerplatform. Bert

As soon as you go above 4k, you lose access to many disk utilities, which may worsen performance. One thing you will lose is the ability to defragment your disk.
Thank you for all the input on this. One thing I do have in place is a disk defragment using O&O Defrag. It works with the 64K size. It allows me to scheudle the defrags and works with the large files. I also don’t have to take the database offline, which is nice. I’m not sure DiskKeeper or windows defrag works with the bigger size. Thanx
Adam
From a defrag perspective, cluster sizes of over 4K were not supported on NT, but I know for sure that it runs fine on XP and also 2003. In general, for the volume in question, it would make sense (in theory) to keep the RAID stripe size and NTFS cluster size the same, and to align it with whatever you will need for SQL (depending on whether you are doing transaction processing or BI etc). However, on some storage subsystems, it makes absolutely no difference. For example, the default RAID stripe size on the 3PAR system is 256K (versus 64K on the Clariion). We changed this to 32K and 64K on th 3PAR for specific volumes and gave it go. Absolutely no difference. Be aware that some storage systems are optimized to work with certain RAID stripe sizes and so by too much fine tuning, you may actually end up degrading overall storage frame performance ! Eventually, it is my belief that spindle-count is king. The larger the spindle-count, the better the performance. If you want to spend more time managing your database and less time trying to optimize LUNs on your storage frame, I would advise you to seriously consider 3PAR. In terms of both performance (it produces performance optimized LUNs by default everytime, through some very clever virtualization) and bag for buck, no one else (all 3-letter acronyms here) comes even remotely close. Incidentally, according to Gartner’s Magic Quadrant, 3PAR is #1 in the visionary quadrant, beating all the rest by a noticeable margin. You can run all sorts of mixed loads on the system and not have to worry about performance optimization. The LUNS for everything (Exchange, SQL, file/print, you name it !) are evenly spread across all the spindles across the entire storage frame and the thing is so fast, it smokes ! For the hell of it, I artificially (from a 3PAR perspective) created two equal sized LUNs, each on half the spindle population on the frame (ie., 32 spindles each, at that time) and tried some sustained file transfer operations (20GB file) between the two volumes. The timing was no better than when I created the two volumes on the same set of spindles (64 spindles total). So you could dump your 60GB+ database on a dump drive (if you wanted to) on the same set of spindles as your database and it would still fly …. very fast.
I’ve done some work in this area. 1. place your database file on separate LUNs from your log file(s).
2. use diskpar(2000) or diskpart(2003) to create a single partition on each LUN.
3. during the creation of the partitions align be sure to align on (128sectors*512b sector)=64K track boundary. This will reduce the number of head movements caused by track spanning. It also reduces cache buffer allocations for trailing data that you may not access once it has been cached.
4. when you create the volume on each partition you need to do a full format and specify an AU Size of 64K on all volumes that will host your database files and an AU Size of 4K for volumes that will host your log files. Logs are read and written to differently than database files. mostly all sequential and not page/extent driven. However, the track boundary alignment is not as important with an AU of 4k as it is with and AU of 64k it is still very important. With a 64k au misaligned you’ll track span on each I/O as compared to track spanning on every 16th I/O with an AU of 4k.
5. Track boundary misalignment causes latency due to head movement but it also helps to over work your hard drives resulting if failure sooner rather than later.
6. Additionally, if you have the option of striping your firmware RAID at 256k over 64k or 128k you should opt for 256k because it helps to increase preformance of SQL Servre read-aheads.
7. you will see your most significant i/o gains during backups and restores.
8. should also mention that your spindle count will have an significant impact Direct attached, SAN or NAS. if you storage is shared you would do well to use a dedicated horizontal allocation using a high spindle count. multiple I/O bus usage as long as the spindle rotation and i/o port speed will keep up with the added bandwidth.
9. HBAs – Opt for pci-express instead of pci or pci-x. the multipath gate architecture of pci-express allows for faster throughput and reduces BUS noise which you’ll encounter as you add more pci and pci-x controls. There is a lot more that goes into this strategy like partitioning and file placement but this should be a good start for you if you haven’t already addressed these issues. hope this helps.
quote:Originally posted by gryphonsclaw I was wondering if changing the default cluster size on the disk is a good idea. I was searching online and found a couple snippets about setting the disk to a 64K cluster size (8 x 8KB pages). Our app DB is on a RAID 10. I was wondering if anyone has seen this boost disk performance and under what scenarios (tons of users with lots of reads, every scenario, lots of writes, etc…)? Thank you for any thoughts in advance. Thanx
Adam

]]>