SQL Server Performance

Node Removed from Active Server Membership

Discussion in 'SQL Server 2005 Clustering' started by Foyura, Feb 11, 2008.

  1. Foyura New Member

    We have a active/passive 2 node cluster. Win Server 2003 and SQL Server 2005.
    Problem: Node 1 had a hardware failure. It was traced back to faulty network cards.
    Cluster Server tried to failover to Node 2 and it failed because of several errors related network-related issues.
    My question is: Why did Node 2 fail and get removed from the Active Server Membership and how can we prevent this from happening in the future?
    Here are error messages:
    1.CheckQueryProcessorAlive: sqlexecdirect failed. specified network name is no longer available
    2. sqlstate 09s501 native error = 0. Communication link failure
    3. ODBCConnectError: TCP Provider: Timeout expired
    4. The MSDTC service is stopping. Cluster service was terminated by request from Node 1.
    5. Cluster Network Name is no longer registered with its hosting system
    6. The TCP/IP interface for Cluster IP address MSDTC IP Address has failed
    7. Cluster Node 2 failed a critical operation. It will be removed from active server cluster membership. Check node is functioning properly and that it can communicate wit the other active server cluster nodes
    8. node 1 lost communication with node 2 on heartbeat
    9. node 1 lost communication with node 2 on public network
    10. cluster resource MSDTC Network Name failed
    11. Node 2 removed due to lost communication
    12. KRB_AP_ERR_MODIFIED error from the server host Node1. The target name was CIFS/ Virtual SQL Server Name. Commonly this is due to identically named machine accounts in the target realm and the client realm.
    13. Cluster service was halted to prevent an inconsistency with the server cluster.
    At this point, everything was up but the resources where offline. To bring it back, we simply brought the resources online and all was fine.
    Your thoughts are appreciated.
  2. satya Moderator

    The last step of error sums it up all,
    This issue can occur if the device drivers from another program are not compatible with the Cluster service. Third-party filter drivers can prevent the Cluster service from accessing cluster resources. Programs and tools such as antivirus packages, quota software, and defragmentation tools are suspect.
    Are you using any third party tools on SQL Server ?

  3. Foyura New Member

    Yes, they have McAfee on the nodes and Volume Shadow Copy is running.
    It was determine that Node 1 had a faulty network card. What is interesting is where it states that there is a duplicate host name --- possible DNS issue as well?
  4. satya Moderator

    Then it is something you should control on the network, otherwise cluster will not be able to failover with a seamless transition if the network is not letting it to win.

Share This Page