Scenario : Have single server running SQL 2000 Enterprise on a RAID 5 configuration. There is only one application which connects to this instance and there are 7 connections spread across the 5 databases defined. The connections are kept open to the DB. Each database is set to SIMPLE recovery. While using perm monitor. counters, I notice the following. 1) Checkpoint pages/sec goes from 0 to 600/sec for about 6 seconds. 2) Physical page writes/sec follows the same as the the checkpoint. 3) Physical disk writes shows about 3 MB/sec during this time. 4) Percent log used on one of the DB reached 70% then drops to 20%. 5) Load is low in my opinion, averages 10 batch requests / second. 6) This is much more pronouced on the RAID 5 boxes in comparison to the IDE test boxes, it appears this is related to write caching? 7) When this occurrs our time critical app is slowed drastically. This matches the documentation on log being truncated at 70% during a checkpoint on the database right ? So, I thought I could force a checkpoint more often in an attempt to reduce the amount of writes. I did this by scheduling a job to run every 2 minutes( test value) that simply called 'CHECKPOINT'. Notes: This database file is 100 MB and the LogFile is 50 MB. Also, this database only has 'INSERTS' going against the DB 95% of the time. I am looking into the possibility of using a write cache on the controller if even available. I don't need 50 MB for the log file, if I shrink it will that help?
I don't see any issues with SIMPLE RECOVERY MODEL and Log truncation plays slow performance of the system. How about DBCC checks and other maint. tasks on the database? _________ Satya SKJ Moderator SQL-Server-Performance.Com
There are no maintenance and/or jobs scheduled during these times. I dont know about DBCC checks, any examples of what I might need to look for ?
It appears if I call checkpoint on the database in question it does what I need. In my previous tests I was not calling checkpoint on the db in question as I thought it was system wide when called explicitly. Mark.
Any reason behind calling CHECKPOINT in the program? System will take care of checkpoint once it reached on Tlog. _________ Satya SKJ Moderator SQL-Server-Performance.Com
Heavy amount of inserts happening in the system explain Checkpoint consuming so many resources in the system. Do you have a saperate disk / array for log? Having same will definately help in reducing the time for checkpoint to complete. You can configure the System Recovery time in the Database options to change the frequency of chekpoint as well. This way you don't have to call Checkpoint manually in the application. Gaurav Moderator Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard
I am not calling CHECKPOINT in the app itself. I just scheduled a job via SQL to force the checkpoint. The data and log file are on the same raid 5 disks. I don't believe changing the recovery option will help since the minimum is 1 minute which I believe the system tries to optimize for anyways. If that is true, there would still be to much information being written out. Plus, I don't want to make changes that would be system wide if possible. It appears my problems are the log and data are on the same disks. My stress tests run better on test machines which are IDE no raid. I realize RAID5 isn't great for the seq writes, but the load is so low this isn't a problem yet. Finally, is there any difference from plain old SQL insert ... VS an insert that is done inside a transaction along with a COMMIT ? The application does no sql inside of transaction just straight insert / updates. Thanks,
The first and foremost recommendation will be to have log and data on saperate disk array. I don't see any issues with running inserts and updates within one transaction as insert and update statements themselves start implicit trnsactions. Gaurav Moderator Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard
Moving the log file to a different RAID-5 disk will help if the disks have independent controllers (which should be the case in most installations). But since you don't seem to be bothered about disaster recovery (SIMPLE recovery model and explicit checkpoints!) RAID-0 would have done the trick without the overhead of maintaining parity information on the disks. IMHO, you shouldn't tamper with the default CHECKPOINT behaviour - it normally is the least of any DBA's problems. The culprit could be lurking somewhere else. Nathan H.O. Moderator SQL-Server-Performance.com
I agree I don't want to tamper with default behaviors. Unfortunately, it appears be a solution for this poor disk configuration. All I can tell is this large write to disk temporarily slows down the DB which has fatal consequences for our application. I appreciate everyones comments.
Any information on event viewer log for h/w issues? _________ Satya SKJ Moderator SQL-Server-Performance.Com
There are no errors or abnormalities to be seen any where. The event logs and db logs all look clean.
Then how did you find its a poor disk configuration? _________ Satya SKJ Moderator SQL-Server-Performance.Com
Two points - it has always been since Sybase/Sql server 4.2 that a user specified checkpoint will not truncate the log... If you want the checkpoint behavior to be more aggessive - change the recovery interval. Set it to some value and see if this changes the frequency of the disk usage spikes Joe E O
We have had the same problem with checkpoint at least we think so. Then we tried to change the "max async IO Option" and seems to have work. Has anyone else tried to change this option with succes. I can refer to the BOL and to the Microsoft web pages (search for "Max Async I/O"): http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql7/html/msdn_sql7perftune.asp Btw. we are using SQL Server 7.0 and I am not sure that this applies to SQL Server 2000.