SQL Server Performance Forum – Threads Archive
How to monitor for failover eventsI have a two node cluster running SQL server, and am trying to set up CA’s Unicenter monitoring product to monitor for a failover. I need to know what error in what log truly signifies that a node has failed, and that I need to check to see that it failed over. I can scrape any event log for any eventID, but I can’t figure out which eventID really and truly means a node failure occured. The goal would be to get as few false alarms as possible. Any help would be appreciated.
Event ID 1135 (a WARNING) seems to be one of the initial ones in the System Event Log. Beyond that in a failover situation, you will see ERROR events for the ClusSvc in the System Event Log and WARNINGA/INFO when it is restored. Event ID: 1135
Cluster node ClusterNode was removed from the active cluster membership. The Clustering Service may have been stopped on the node, the node may have failed, or the node may have lost communication with the other active cluster nodes. Any WARNING or ERROR with ClusSvc in the Systems Event Log should probably trigger something in your monitoring. Another or additional way to check is for the existence of the actual disks on your servers…like checking that \DATASERVER1S$ (or whatever drives you stored data/logs on) exists. If \DATASERVER1S$ does not exist, but \DATASERVER2S$ does, then a failover has probably occurred. If neither exists, the failover or both servers may have failed. If your power is gone to the data servers, there are no event logs to check.
You can also monitor the event log for ‘failover manager’ events.