SQL Server Performance

Trouble with New Cluster on Windows 2003

Discussion in 'SQL Server Clustering' started by stevem123, Sep 14, 2004.

  1. stevem123 New Member

    I have just installed a 2-node cluster of Windows 2003 Enterprise and SQL 2000 Enterprise.
    Overall the installation went pretty smooth other than installing and enabling DTC and adding it as a resource per KB Article 301600 http://support.microsoft.com/default.aspx?scid=kb;en-us;301600] and setting up the Network COnfig utility for a named instance as Microsoft recommends in KB Article #815431. Much smoother than the install I had with Windows 2000 a couple years ago.

    Failover of Disk resources was working just fine.

    BUT after installing SQL and SP3 on each machine, the failover of SQL will not work.

    Here is what is happening when I try to MOVE GROUP in Cluster Administrator...

    1. All the resources go offline and then to Online Pending for the other server as normal.

    2. The following resources go Online on the failover server:

    Disk N:
    Disk V:
    SQL IP Address1

    3. But these do not go beyond Online Pending:

    SQL Network Name
    SQL Server
    SQL Server Agent
    SQL Server Fulltext

    The SQL Cluster Group (where quorum, etc. resources are) was already failed over to the other server with a Move Group.

    In the event log, I see the following.

    Event ID 17052
    [sqsrvres] RegOpenKeyExW failed [status 2]

    Event ID 17052
    [sqsrvres] Online Thread: Error 2 bringing resource online

    repeated several times (once for each of the resources that didnt failover I assume)

    At first I thought it was a DNS issue since the Server IP address seems to failover but not the Server/Network name. But each server appears to resolve the others SQL server name, etc. just fine. All users involved are members of the Admin and Domain Admin security groups.

    If one server is completely shutdown the other shows it's disk and the SQL Server Ip and SQL Server name failed over and Online but the other 3 items are shown as failed or online pending.

    Both servers have the same version of the SQL Server exe (2000.80.760.0)

    Any help is appreciated.
    STEVE
  2. satya Moderator

  3. bradmcgehee New Member

    When you installed SP3, did you install it on one node, or on both nodes? By this, I mean did you physically have to run SP3 twice, once on each node (while that node was active)?

    -----------------------------
    Brad M. McGehee, MVP
    Webmaster
    SQL-Server-Performance.Com
  4. stevem123 New Member

    I installed SP3 on both nodes. Each node was active serving up its SQL instance. But cluster group with Quorum, etc. was probably running on the same node for both SP3 installs (dont know if that matters).

    STEVE
  5. bradmcgehee New Member

    I have not seen this issue before, so I can't offer any fixes.

    But, if I was facing this issue, I would remove SQL Server from all nodes, then start over. Once SQL Server is installed (without any SP), I would test everything very well to see that everything was working. Then I would apply SP 3 to both nodes, and see what happens. If it still fails, then I would call Microsoft Support.


    -----------------------------
    Brad M. McGehee, MVP
    Webmaster
    SQL-Server-Performance.Com
  6. stevem123 New Member

    OK...I uninstalled both instances and started over.

    Before I installed SQL I found and used the tips from

    http://support.microsoft.com/default.aspx?scid=kb;en-us;258750

    which provides more details and instructions about configuring the private heartbeat adapter on a cluster than any docs I've seen about it.

    The install went perfectly and I installed the first active instance and was able to failover fine. Installed the second active instance on the other node and all seemed fine.

    After installing SP3a, I was no longer able to access the cluster via the Cluster Administrator tool. I was concerned that somehow the cluster was corrupted because it was basically dead and unreachable. After some digging I was able to connect to the cluster with the cluster IP. I could now see that the Cluster Name resource in the Cluster group was stuck at "Online Pending" while the Cluster IP was "Online". I was able to bring the Cluster Name online and since then all seems good.

    Both active instances seem to be able failover to eachother just fine....now I will move on to the testing steps in the doc on this site.

    From my experience with SQL clusters, I think the most unreliable part of SQL Clusters is the actual installation. Beyond that, I've had little or no problems with the software side of things.

    Thanks Brad and Satya.
  7. bradmcgehee New Member

    I agree with your statement about installation issues. I have faced the same issue, but once they are overcome, the cluster is very reliable.

    -----------------------------
    Brad M. McGehee, MVP
    Webmaster
    SQL-Server-Performance.Com

Share This Page