Windows 2000 Cluster Services: Testing and Verifying the Windows 2000 Cluster Service

The worst is now over. If you have successfully gotten this far, most likely the Windows 2000 Cluster Service is ready to be used. But since I trust a computer installation only as far as I can throw it (which is not very far), it is a good idea to test the Cluster Service installation to see if it is really working as it should. This section takes a look at several tests you can perform to ensure that Cluster Service is really doing its job.

In most cases, if there is any problem with your cluster, these tests will find them. What the problems are, and how to fix them, are beyond the scope of this article. If your cluster passes all of these tests with flying colors, then the odds are very strong your cluster will not have any future problems (although this can never be guaranteed).

Test Number 1: Moving Groups

The first test we will perform is very simple. What we will do is to move the current resources (including the Cluster Group and any Disk Groups) that were created with Cluster Service was installed, from the active cluster node to the inactive cluster node.

In your cluster, the two nodes can be divided into an active node (is in control of the cluster’s resources) and an inactive node (not in control of any cluster resources). If you are a clustering expert, then you know I have over-simplified this explanation, but this explanation is good enough for this testing.

After the Cluster Service has been installed on both nodes of the cluster, one of the nodes will be in control of all the default cluster groups (the active node) and the other node will not have any cluster groups assigned to it (the inactive node). The resources found on the active node, by default, include what is called the “Cluster Group” and the “Disk Group”. (There may be one or more Disk Groups, depending on how your shared disk array has been configured. In this example, we will assume there is only one Disk Group.)

The Cluster Group generally contains these cluster resources:

  • Cluster IP Address (the virtual IP address of the cluster).
  • Cluster Name (the virtual name of the cluster, used by clients to access the cluster).
  • Disk Q: (the quorum disk, may or may not be labeled Q:)

The Disk Group generally contains a single disk resource, a drive letter, which refers to a logical drive on the shared disk array. If you have more than one logical drive as part of the shared disk array, then there will be a separate Disk Group for each logical drive available.

Now that we got all that out of the way, let’s begin our first test to see if the cluster is functioning properly. Our goal in this test is to see if we can manually move both default cluster groups from the active node in the cluster to the inactive node, and then reverse our steps so that the cluster groups return to their original location on the active cluster. Here’s how:

  1. Start Cluster Administrator.
  2. In the Explorer pane at the left side of the Cluster Administrator, open up the “Groups” folder. Inside it you should see the Cluster Group and the Disk Group groups. 
  3. Click on “Cluster Group” to highlight it. In the right pane of the screen, you will see the cluster resources that make up this group. Note the “Owner” of the resources. This is the name of the active node.
  4. Each of the groups must be moved to the other node, one at a time. First, right-click on “Cluster Group,” then select “Move Group.” As soon as you do this, you will see the “State” change from “Online” to “Offline pending” to “Offline” to “Online pending” to “Online.” This will happen very quickly. Also note that the “Owner” changes from the active node to the inactive node.
  5. Now do the same for the “Disk Group.”
  6. Assuming there are no problems, both groups will have moved to the inactive node, which, in effect, has now become the active node. Once both nodes have been moved, look in the Event Viewer to see if any error messages were generated. If everything worked correctly, there should be no error message.
  7. Now, move both groups back to the original node by repeating steps four through six above.

This is a very basic test, but it helps to determine if the cluster is working as it should. The following tests are slightly more comprehensive, helping you to root out any other potential problems.

Test Number 2: Initiate Failure

This test is very similar to the one above, except we are pretending to failover the nodes. In effect, we are manually simulating a failover. Here’s how to perform this test:

  1. Start Cluster Administrator.
  2. In the Explorer pane at the left side of the Cluster Administrator, open up the “Groups” folder. Inside it you should see the Cluster Group and the Disk Group groups. 
  3. Click on “Cluster Group” to highlight it. In the right pane of the screen, you will see the cluster resources that make up this group. Note the “Owner” of the resources. This is the name of the active node.
  4. Now, right-click on the “Cluster IP Address” resource in the right pane of the window, the select “Initiate Failure.” What this does is to tell Cluster Service that the virtual IP address has failed.
  5. After you select this option, you will notice some activity under “state,” but that fairly quickly the resource returns to an “Online” status and that the “Owner” has not changed. It appears as if no failover has occurred. And you are correct. No failover has occurred. Believe it or not, this is normal and to be expected. This is because Cluster Services will try to restart a failed resource up to three times before it actual fails over (this number can be changed). So to actually initiate a failover, you must redo step number four above a total of four times before an actual failover occurs. When failover occurs, you will also notice that all of the resources in the “Cluster Group” also failover.
    Continues…

Leave a comment

Your email address will not be published.