Disk bottleneck? | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Disk bottleneck?

Hi everyone,
my SQL server is a very powerful machine:
Windows 2003 Enterprise
4 CPU Xeon 2500, with Hyperthreading enabled
16Gb RAM, /3GB and /PAE enabled
2 SCSI Disk for OS, 15000rpm with RAID 1+0
5 SCSI Disk for Database Files, 15000rpm with RAID 5
1 controller HP SmartArray 6400 I’ve put all the files on the second Drive. I notice that the %idle time of this unit is always 0%. Then to improve the performance I’ve tried to put the transaction log on the first drive but I don’t really notice a change. Now, when the %idle time of the first array is 0 the second one is 100% and viceversa. I think that the problem could be the controller, that is too busy… Do you think that an additional controller, with 2 SCSI unit can help? Or we can just add 2 disk to the actual controller where we will put the transaction log file? Here’s our counters: Object: Memory Page Faults/sec108.009
Page Reads/sec0.000
Page Writes/sec0.000
Transition Faults/sec48.004
Transition Pages RePurposed/sec0.000 Object: LogicalDisk C:E:
% Idle Time96.6605.920
Avg. Disk Queue Length0.0334.751 Object: PhysicalDisk 0 C:1 E:
% Idle Time96.6705.920
Avg. Disk Bytes/Read0.0009294.769
Avg. Disk Bytes/Transfer54613.33314301.753
Avg. Disk Bytes/Write54613.33347681.641 Object: SQLServer:Access Methods
Index Searches/sec12556.999
Do you think that this machine is running poorly? Roughly we have something like 5 million row per day.
I am adding to the preceeding (working with SK on the problem) that we have absolutely no CPU problems, running at less than 10%.
Any performance bottleneck on SQL database other than H/w? Satya SKJ
Moderator
http://www.SQL-Server-Performance.Com/forum
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
We aren’t developers, but System Administrator. We are building up the architecture for that software so we can only work on hw optimization.
I don’t think it’s the controller. In my experience it’s often the disks. The controller usually has no problem pushing the data around. Run the below script and see if you get any high wait stats on for example writelog. If so I would look into:
– a controller that has a battery backed up cache that could be used for caching writes
– move away from RAID 5 to RAID 0+1 instead
– Or even better identify what in the application is causing the high disk usage. Is there a batch job or query that are writing way to much than it needs and could be tuned instead? Script from sqldev.net. Edit the finish value if you want to run it longer. Right now it loops 10 times, waiting for 10 seconds and then shows the average: — simple waitstate script
[email protected]

if ((object_id(‘waitstats’) is not null) and
(objectproperty(object_id(‘waitstats’), ‘IsUserTable’) = 1))
drop table waitstats create table waitstats (
wait_type varchar(80),
requests numeric(18, 1),
wait_time numeric (18 ,1),
signal_wait_time numeric(18, 1),
time_stamp datetime default getdate()
) declare @start int,
@finish int
select @start = 1,
@finish = 10 dbcc sqlperf(waitstats,clear) — clear out wait statistics while (@start < @finish)
begin
begin transaction
insert into waitstats(wait_type, requests, wait_time,signal_wait_time)
exec (‘dbcc sqlperf(waitstats)’)
commit select @start = @start + 1
waitfor delay ’00:00:10′ — every 10 seconds
end declare @begin_time datetime,
@end_time datetime,
@total_wait numeric(18, 1) select @begin_time = min(time_stamp),
@end_time = max(time_stamp)
from waitstats where wait_type = ‘Total’ — subtract waitfor, sleep, and resource_queue from Total
select @total_wait = sum(wait_time) + 1
from waitstats
where wait_type not in (‘WAITFOR’, ‘SLEEP’, ‘RESOURCE_QUEUE’, ‘Total’,
‘***total***’)
and time_stamp = @end_time — insert adjusted totals, rank by percentage descending
insert into waitstats values(‘***total***’, 0, @total_wait, @total_wait,
@end_time) — reporting
select @begin_time as [begin time],
@end_time as [end time],
datediff(mi, @begin_time, @end_time) as [duration in min] select wait_type,
wait_time,
percentage = cast(100 * wait_time / @total_wait as numeric(18,1))
from waitstats
where wait_type not in (‘WAITFOR’, ‘SLEEP’, ‘RESOURCE_QUEUE’, ‘Total’)
and time_stamp = @end_time
order by percentage desc
Example output from one of our systems that have disk performance issues:
waite_typewait_timepercentage
***total***23040.0100.0
WRITELOG13726.059.6
NETWORKIO9092.039.5
PAGELATCH_UP174.0.8
PAGELATCH_SH15.0.1
PAGEIOLATCH_SH16.0.1
LCK_M_S16.0.1
LCK_M_U.0.0
LCK_M_X.0.0
LCK_M_IS.0.0
LCK_M_IU.0.0
LCK_M_IX.0.0
LCK_M_SIU.0.0
..
..

for e: drive, monitor:
avg disk queue length
avg disk sec /read & /write
read/sec
writes/sec

<blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>Originally posted by Argyle</i><br /><br />I don’t think it’s the controller. In my experience it’s often the disks. The controller usually has no problem pushing the data around.<hr height="1" noshade id="quote"></blockquote id="quote"></font id="quote"><br /><br />Yeah, right. Actually that was a misleading information. Sure enough the bottleneck is not on the controller (beside the fact that we are still waiting for the battery backed cache to be installed, ancd this should help a lot) but on the disks.<br /><br />Last update, we have done the following:<br /><br />Changed the transaction log to simple mode<br />Moved the transaction log to another disk (yes, both datafile and tr.log were on the same disk (E<img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ /> not out fault, the "supposed" VAR did it…)<br />Configured to work in "fiber" mode<br /><br />Now the situation seems to be better, at least we have – sometimes – %disk_idle that is "different" from 0%, while before it was always 0%.<br /><br />The real question is that – not being "real" DBAs (please, do not ask why we are doing this job then, not out fault, we just got involved [B)]) – we have no idea of what are the "correct" performances we should expect. Now the system seems to perform correctly, but some operations are still awfully slow. I guess that asking a "reference" value – even supplying some numbers as we did is not an option, right ?
Can you get any information about SQL behaviour from Developers/DBAs at your end.
quote:Originally posted by soft_karma We aren’t developers, but System Administrator. We are building up the architecture for that software so we can only work on hw optimization.

Satya SKJ
Moderator
http://www.SQL-Server-Performance.Com/forum
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
]]>