Poor Clustered Server Write Performance

I’m having some very slow write performance on a new SQL 2000 active/active cluster I have built from new hardware. Perhaps this is more of a tuning post and should go there, but because I am clustering I want to get some feedback from those that have similar setups. Here’s what I have… (2) DELL PowerEdge 2950 (latest BIOS, firmware, etc.)
RAID-1 SAS array on PERC5 for OS, 16GB RAM, PERC 4e/DC controller for disk array
(1) Powervault 220s with (13) identical 300GB disks setup as 5 RAID-1 arrays for SQLData1
Quorum and one RAID-5 array of the remaining 3 disks for misc storage that I am not using. I first noticed, that SQL backups were taking a very long time compared to my other active/active SQL clusters. Simple file copies from any RAID-1 array to the same RAID-1 array (like SQLData1) were also very slow so SQL is not the source. Copying/Pasting a 1GB file to the same drive takes about 4 minutes (or 4MB/sec). My other Win2003/SQL2000 clusters that are built from older DELL servers (2850/2750s with 220s) are copying the same file within their RAID-1 arrays in 1.5-2 minutes (or 10-12MB/sec). I’m working with DELL support as well but not really getting anywhere as the server/array is functioning just not very well. All the firmware is up to date and PATROL READs (a common source of this) has been disabled on the PERC4e/DC. I am wonderng if others see this in their clusters and what your write performance is. Obviously, clustering takes some disk performance away as write caching has to be disabled on the controller, but it shouldnt be this bad. Other than that the cluster functions properly…failovers work fine in either direction. Performance is poor no matter what server is serving up the SQL instance and drives. The only real differences between this cluster and my other ones is… – PERC4e/DC controller (others run PERC 4/DC but 4e is "supposed" to be faster)
– This Cluster has a maxed out SCSI bus (15 devices…other clsuters have 12)
– This Cluster has faster hardware and twice the RAM Thanks
This one is a tough problem, as you already know. When I see a problem like this, and I have seen similar, but not an identical problem as yours, I first assume it is the hardware, hardware configuration, firmware, or other drivers causing the problem. I think you can rule out SQL Server contributing to the problem, and for that matter, I think you can rule out clustering as causing the problem. I would ride Dell support hard to have them identify the problem, as I personally feel it is their problem. Your system should be superfast, and if it isn’t, then Dell is to blame. If Dell is not helpful, and if you have a support agreement with Microsoft, you might seek out their advice, but even then, I think they will point their finger to Dell. —————————–
Brad M. McGehee, SQL Server MVP
Bad SCSI cable appears to be the culprit. (1) shorter pin out of 68 on the connector is the flaw that I can see. Caused the drives to report a speed of 10 rather than 320.