Active/Active Instance Fails to Failover | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Active/Active Instance Fails to Failover

I cant seem to failover an instance of SQL on an active/active SQL 2000 install running 2000 Adv Server. When I try to failover, Cluster Administrator seems to lock up for a while and I dont see the different items in the Group going offline, pending online and then online as I normally do. Eventually the Cluster Administrator is active again, but nothing has moved.
The cluster group is already online on the second server. I dont see anything in the SQL Error logs. I only see the following in the server’s Event Log is Event ID #1117… Cluster resource SQL Server (dserver1) failed to come offline. I can go into the Cluster Administrator and bring all the items in the group offline. When I try to move the group after that, nothing happens either and no errors are seen. I do have quite a few databases (450 or so) on that server and I wonder if there is some sort of timeout happening on the failover process and because it cant finish before this timeout limit, the failover doesnt happen. I am tempted to just reboot the server, but thought I would find out if anyone has any thoughts on this.

450 databases sounds like a lot. It could take some time for SQL Server to shut down and start them up. Try increase the "Pending timeout" value on the SQL Server resorce in the cluster.
Have you ever been able to failover successfully in your cluster? —————————–
Brad M. McGehee, MVP
Webmaster
SQL-Server-Performance.Com
YES…it has worked before fine. The server has been rock solid though so it has been several months since I have failed over and rebooted it. Looking through my logs, I can see that it took more than one attempt to do a Move Group on a previous occasion (it worked the second time). No real changes to the server at all with the exception of continuous adding more databases so perhaps the timeout setting will resolve it. I will try to change the pending timeout value and failover again this weekend when usage is less and report back. Thanks.
]]>