Windows 2003 Clustered SQL Server SP3 Failovers | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Windows 2003 Clustered SQL Server SP3 Failovers

Currently I have a SQL server cluster running on a Windows 2003 Server that failsover every time the maintenance jobs run. The maintenance jobs include a Forced Reindex, Rebuild Indexes, and Defrag the indexs. The first two index jobs run in the morning around 6 am, and we fail not long after the job starts. The last index job runs in the evening around 6pm, and we fail about half way through. The server reports an error of 2019, which is that the system has run out of nonpaged pool memory. Microsoft has given us hotfix after hotfix to try and resolve the issue. Nothing seems to help. The hardware is not reporting any errors. It is Active/Passive setup with a fiber channels to the SAN. SQL server is setup on a RAID 10 configuration. THe indexs, tempdb, and backups are all on seperate RAID 1 configured drives. Now the this failure began after a the change to the configuration above was done. Before that we had everything on the RAID 10 drives. Any Ideas?
Can you specify the error text returned. Satya SKJ
Moderator
http://www.SQL-Server-Performance.Com/forum
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
I don’t know if this is feasible or not, but if I was faced with your problem, I would rebuild the cluster from scratch, starting with the OS, SPs, drivers, SQL Server, etc. This is a lot of work, but based on my experience, once you start having problems with a cluster, they are hard to fix unless you start over. —————————–
Brad M. McGehee, MVP
Webmaster
SQL-Server-Performance.Com
As I understand things, a cluster will failover when one or more resources fail (Services, shared disks, network IP’s, etc) or if the server becomes unresponsive.
In the case of your "Error 2019, …system has run out of nonpaged pool memory."
Does this then cause the SQL Service to stop? …or does the server become unresponsive?
Is there a way to be sure which of your cluster resource(s) is failing ? Are you running anti-virus software on your Windows 2003 servers? McAfee perhaps?
If so, you may want to read this article:
http://support.microsoft.com/default.aspx?scid=kb;en-us;888928 I also found some additional references to that error code (2019), concerning Windows 2000 and earlier Windows NT’s, whereby SQL Servers that are configured solely for TCP/IP instead of Named Pipes may result in a memory leak on the OS resulting in a 2019 error. This probably does not apply to Windows 2003 though. My only other guess is that some other software (SAN drivers perhaps?) is causing a memory leak on the OS since the RAID 1 disks were implemented.

Yes check for anti-virus or any other 3rd party tools installed on the cluster (like backup or monitoring) that could cause issues. See the bottom of this link for example of tools:http://support.microsoft.com/?kbid=822219 How much memory do you have on the servers?
Currently we are running Norton Antivirus version 9.0 with SQL excluded. When the failover happens, the system goes to 100% CPU usage and for a few seconds it is unresposive. The system then proceeds to failover to the other machine and become resposive again. During the failover SQL has quit and the server tries to bring it back online and usually cant. The first sign of the failover is a cluster shared resource fails a status check. We are running 16GB of RAM in each server. Microsoft has changed the amount of non paged pool memory available to each server. Out of the box, Server 2003 has 160mb available and ours now have the full 256mb available. Hope I have answered all the questions. One further update on the failovers, if I reboot the servers right before the SQL maintenace jobs the servers are able to complete the jobs without failing over. Thanks, Jay
]]>