Testing SQL Server 2000 Clusters

If you have successfully gotten this far, most likely the SQL Server 2000 cluster is ready to be used. But before I trust any production systems on a new cluster, it is a good idea to test the SQL Server 2000 cluster to see if it is really working as it should. This section takes a look at several tests you can perform to ensure that it is really doing its job.

In most cases, if there is any problem with your SQL Server 2000 cluster, these tests will find them. What the problems are, and how to fix them, are beyond the scope of this article. If your SQL Server 2000 cluster passes all of these tests with flying colors, then the odds are very strong your cluster will not have any future problems (although this can never be guaranteed).

Note: These tests are virtually identical to the tests I suggest you run on a newly created cluster after installing the Windows 2000 Cluster Service, but before installing SQL Server 2000 clustering. To be extra sure that there were not any new problems introduced after installing SQL Server 2000 on the cluster, these tests need to be repeated. By testing before installing SQL Server 2000 clustering, and after installing SQL Server 2000 clustering, you are in a better position to identify any potential problems.

Test Number 1: Moving Groups

The first test we will perform is very simple. What we will do is to move the current resources (including the Cluster Group and SQL Server resource group) from the primary cluster node to the secondary cluster node, and then back again.

Let’s begin our test to see if SQL Server Clustering is functioning properly. Here’s how:

  1. Start Cluster Administrator.
  2. In the Explorer pane at the left side of the Cluster Administrator, open up the “Groups” folder. Inside it you should see the Cluster Group and the SQL Server resource group. The name of the SQL Server resource group will be whatever name you have assigned it.
  3. Click on “Cluster Group” to highlight it. In the right pane of the screen, you will see the cluster resources that make up this group. Note the “Owner” of the resources. This is the name of the primary node.
  4. Each of the groups must be moved to the other node, one at a time. First, right-click on “Cluster Group,” then select “Move Group.” As soon as you do this, you will see the “State” change from “Online” to “Offline pending” to “Offline” to “Online pending” to “Online.” This will happen very quickly. Also note that the “Owner” changes from the primary node to the secondary node.
  5. Now do the same for the SQL Server resource group.
  6. Assuming there are no problems, both groups will have moved to the secondary node, which, in effect, has now become the primary node. Once both nodes have been moved, look in the Event Viewer to see if any error messages were generated. If everything worked correctly, there should be no error messages.
  7. Now, move both groups back to the original node by repeating steps four through six above.

This is a very basic test, but it helps to determine if the cluster is working as it should. The following tests are slightly more comprehensive, helping you to root out other potential problems.

Test Number 2: Initiate Failure

This test is very similar to the one above, except we are pretending to failover the nodes. In effect, we are manually simulating a failover. Here’s how to perform this test:

  1. Start Cluster Administrator.
  2. In the Explorer pane at the left side of the Cluster Administrator, open up the “Groups” folder. Inside it you should see the Cluster Group and the SQL Server resource group. 
  3. Click on “Cluster Group” to highlight it. In the right pane of the screen, you will see the cluster resources that make up this group. Note the “Owner” of the resources. This is the name of the primary node.
  4. Now, right-click on the “Cluster IP Address” resource in the right pane of the window, the select “Initiate Failure.” What this does is to tell Cluster Service that the virtual IP address has failed.
  5. After you select this option, you will notice some activity under “state,” but that fairly quickly the resource returns to an “Online” status and that the “Owner” has not changed. It appears as if no failover has occurred. And you are correct, no failover has occurred. This is normal and to be expected. This is because Cluster Services will try to restart a failed resource up to three times before it actual fails over (this number can be changed). So to actually initiate a failover, you must redo step number four above for a total of four times before an actual failover occurs. When failover occurs, you will also notice that all of the resources in the “Cluster Group” also failover.
  6. Now if you click on the SQL Server resource group, you will notice that the SQL Server resources did not fail over. This is also normal. This is because a failover will only force dependent resources to failover as a group, and the “Cluster Group” we failed over earlier is not dependent on the SQL Server resource group, so it did not fail over. To fail over the SQL Server resource group, right-click on the disk resource that contains the system database files in the right pane of the window, and select “Initiate failure.” You will have to do this a total of four times in order to failover the disk resource to the other node.
  7. Now that you have done, reverse your steps, and failover the “Cluster Group” and the SQL Server resource group back to the primary node.

Like the previous test, check out the Event Viewer logs to see if any error messages occurred. If everything worked as expected, you are ready for the next test.

Continues…

Leave a comment

Your email address will not be published.