Trouble with New Cluster on Windows 2003 | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Trouble with New Cluster on Windows 2003

I have just installed a 2-node cluster of Windows 2003 Enterprise and SQL 2000 Enterprise.
Overall the installation went pretty smooth other than installing and enabling DTC and adding it as a resource per KB Article 301600 http://support.microsoft.com/default.aspx?scid=kb;en-us;301600] and setting up the Network COnfig utility for a named instance as Microsoft recommends in KB Article #815431. Much smoother than the install I had with Windows 2000 a couple years ago. Failover of Disk resources was working just fine. BUT after installing SQL and SP3 on each machine, the failover of SQL will not work. Here is what is happening when I try to MOVE GROUP in Cluster Administrator… 1. All the resources go offline and then to Online Pending for the other server as normal. 2. The following resources go Online on the failover server: Disk N:
Disk V:
SQL IP Address1 3. But these do not go beyond Online Pending: SQL Network Name
SQL Server
SQL Server Agent
SQL Server Fulltext The SQL Cluster Group (where quorum, etc. resources are) was already failed over to the other server with a Move Group. In the event log, I see the following. Event ID 17052
[sqsrvres] RegOpenKeyExW failed [status 2] Event ID 17052
[sqsrvres] Online Thread: Error 2 bringing resource online repeated several times (once for each of the resources that didnt failover I assume) At first I thought it was a DNS issue since the Server IP address seems to failover but not the Server/Network name. But each server appears to resolve the others SQL server name, etc. just fine. All users involved are members of the Admin and Domain Admin security groups. If one server is completely shutdown the other shows it’s disk and the SQL Server Ip and SQL Server name failed over and Online but the other 3 items are shown as failed or online pending. Both servers have the same version of the SQL Server exe (2000.80.760.0) Any help is appreciated.
STEVE
http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/failclus.mspx
http://www.sql-server-performance.com/real_life_dba_cluster_failover.asp
… for quick reference that may help to resolve the issue. HTH Satya SKJ
Moderator
http://www.SQL-Server-Performance.Com/forum
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
When you installed SP3, did you install it on one node, or on both nodes? By this, I mean did you physically have to run SP3 twice, once on each node (while that node was active)? —————————–
Brad M. McGehee, MVP
Webmaster
SQL-Server-Performance.Com
I installed SP3 on both nodes. Each node was active serving up its SQL instance. But cluster group with Quorum, etc. was probably running on the same node for both SP3 installs (dont know if that matters). STEVE
I have not seen this issue before, so I can’t offer any fixes. But, if I was facing this issue, I would remove SQL Server from all nodes, then start over. Once SQL Server is installed (without any SP), I would test everything very well to see that everything was working. Then I would apply SP 3 to both nodes, and see what happens. If it still fails, then I would call Microsoft Support.
—————————–
Brad M. McGehee, MVP
Webmaster
SQL-Server-Performance.Com
OK…I uninstalled both instances and started over. Before I installed SQL I found and used the tips from http://support.microsoft.com/default.aspx?scid=kb;en-us;258750 which provides more details and instructions about configuring the private heartbeat adapter on a cluster than any docs I’ve seen about it. The install went perfectly and I installed the first active instance and was able to failover fine. Installed the second active instance on the other node and all seemed fine. After installing SP3a, I was no longer able to access the cluster via the Cluster Administrator tool. I was concerned that somehow the cluster was corrupted because it was basically dead and unreachable. After some digging I was able to connect to the cluster with the cluster IP. I could now see that the Cluster Name resource in the Cluster group was stuck at "Online Pending" while the Cluster IP was "Online". I was able to bring the Cluster Name online and since then all seems good. Both active instances seem to be able failover to eachother just fine….now I will move on to the testing steps in the doc on this site. From my experience with SQL clusters, I think the most unreliable part of SQL Clusters is the actual installation. Beyond that, I’ve had little or no problems with the software side of things. Thanks Brad and Satya.
I agree with your statement about installation issues. I have faced the same issue, but once they are overcome, the cluster is very reliable. —————————–
Brad M. McGehee, MVP
Webmaster
SQL-Server-Performance.Com
]]>