How a Cluster Failover Works (Putting the Pieces Together)
While there can be many different causes of a failover, let’s look at the case where the power stops for the active node of a cluster and the passive node has to take over. This will provide a general overview of how a failover occurs.
Let’s assume that a single SQL Server 2005 instance is running on the active node of a cluster, and that a passive node is ready to take over when needed. At this time, the active node is communicating with both the database and the quorum on the shared array. Because only a single node at a time can be communicating with the shared array, the passive node is not communicating with the database or the quorum. In addition, the active node is sending out heartbeat signals over the private network, and the passive node is monitoring them to see if they stop. Clients are also interacting with the active node via the virtual name and IP address, running production transactions.
Now, for whatever reason, the active node stops working because it no longer is receiving any electricity. The passive node, which is monitoring the heartbeats from the active node, now notices that it is not receiving the heartbeat signal. After a predetermined delay, the passive node assumes that the active node has failed and it initiates a failover. As part of the failover process, the passive node (now the active node) takes over control of the shared array and reads the quorum, looking for any unsynchronized configuration changes. It also takes over control of the virtual server name and IP address. In addition, as the node takes over the databases, it has to do a SQL Server startup, using the databases, just as if it is starting from a shutdown, going through a database recovery. The time this takes depends on many factors, including the speed of the system and the number of transactions that might have to be rolled forward or back during the database recovery process. Once the recovery process is complete, the new active nodes announces itself on the network with the virtual name and IP address, which allows the clients to reconnect and begin using the SQL Server 2005 instance with minimal interruption.
That’s the big picture of how SQL Server 2005 clustering works. If you are new to SQL Server clustering, it is important that you understand these basic concepts before you begin to drill down into the detail. In later articles, I will discuss, in great detail, how to plan, build, and administer a SQL Server 2005 cluster.