Windows 2000 Cluster Services: Testing and Verifying the Windows 2000 Cluster Service – Page 2

Now if you click on the “Disk Group,” you will notice that your disk resource did not fail over. This is also normal. This is because a failover will only force dependent resources to failover as a group, and the “Cluster Group” we failed over earlier is not dependent on the “Disk Group,” so it did not fail over. To fail over the disk group, right-click on the disk resource in the right pane of the window, and select “Initiate failure.” You will have to do this a total of four times in order to failover the disk resource to the other node.

Now that you have done, reverse your steps, and failover the “Cluster Group” and the “Disk Group” back to the original node.

Like the previous test, check out the Event Viewer logs to see if any error messages occurred. If everything worked as expected, you are ready for the next test.

Test Number 3: Turn Off Each Node

While the first two tests were performed from the Cluster Administrator, the next three tests are more real world. In this test, you will first need to ensure that all of the default groups are located on one of the two nodes. Then you will physically turn off (flip the switch) the active node (first node).

If you are watching the cluster groups from the Cluster Administrator from the inactive node after turning off the first node, you should see a failover occur and the resources should be automatically failed over to the second node. Check the Event Log for any potential error messages after this occurs.

Once you have checked for any potential problems, turn the node on that was turned off earlier (node 1) and wait until it fully boots. You will note that turning on the node that was turned off does not cause the cluster to fail back. The cluster resources will remain on the second node until you force them to return to the first node.

Now turn off the node with the active groups (second node), repeating what you did earlier with the first node. As before, you can use the Cluster Administrator from node 1 to watch the groups fail over to the first node. Check the Event Log for any potential error messages.

Once the groups fail back to the first node, turn the second node back on, and wait until it boots up fully.

This is a very good test to see if failover will work in the real world. If no problems arose from this test, then you are ready for the next test.

Test Number 4: Break Network Connectivity

This test is similar in concept as the above test. What we want to do is force a fail over. But instead of simulating a computer failure, we will be simulating a network-related error.

From the node that has the default resource groups (the first node), remove the network cable from the public network card. This will simulate a failure of the first node, and should initiate a failover to the second node.

If you are watching the cluster groups from the Cluster Administrator from the second node, you should see a failover occur and the resources should be automatically failed over. Check the Event Log for any potential error messages.

Once you have checked for any potential problems, plug the network cable back into the first node, and then remove the network cable from the public network card on the second node. As before, you can use the Cluster Administrator to watch the groups fail over to the first node. Check the Event Log for any potential error messages. Once you are done, plug the network cable back into the public network card on the second node.

If no problems arose from this test, then you are ready for the next.

Test Number 5: Break Shared Array Connectivity

This test is designed to help uncover potential issues with the shared disk array. I have seen clusters pass all of the above four tests, but fail this one if the shared disk array is not configured 100% correct. This test is designed to simulate what would happen if the controller card or cable connected from a node to the shared disk array fails.

From the node that has the default resource groups (he first node), remove the cable from the card used to connect to the shared array. This will simulate a failure of the first node, and should initiate a failover to the second node.

Once you have checked for any potential problems, plug the cable back into the first node, and then remove the cable from the card used to connect to the shared array on the second node. As before, you can use the Cluster Administrator to watch the groups fail over to the first node. Check the Event Log for any potential error messages. Once you are done, plug the cable back into the appropriate card.

]]>

Leave a comment Cancel reply