How to Cluster Windows Server 2003
In some cases, the Cluster Wizard may put the SQL Server shared disk array in the Cluster Group resource group and not create a Group 0. If this is the case, then you will need to create a new resource group and then move the SQL Server shared disk array from the Cluster Group to the newly created SQL Server resource group.
Here’s how you create a new resource group using Cluster Administrator:
- Start Cluster Administrator.
- From the File menu, select New, then select Group. This starts the New Group Wizard.
- For the Name of the group, enter “SQL Server Group.” Optionally, you can also enter a description of this group. Click Next.
- Now, you must select which nodes of your cluster will be running SQL Server. This of course will be all of your nodes. The nodes are listed on the left side of the wizard. CTRL-click each of the nodes on the left and then select Add. This will move the selected nodes from the left side of the wizard to the right side. Click Finish.
The new SQL Server Group resource group has now been created.
Now that the group has been created, it must be brought online. Here’s how.
- From Cluster Administrator, right-click on the SQL Server resource group (it will have a red dot next to it) and select Bring Online.
- The red dot next to the resource group name goes away, and the SQL Server Group resource group is now online and ready for use.
Now, your next step is to move any disk resources from the Cluster Group (except the Quorum drive) to the SQL Server Group. This is a simple matter of dragging and dropping the disk resources from the Cluster Group to the SQL Server Group. Once you have done this, you are ready for the next step.
Test, Test, and Test Again
Once you have installed Windows 2003 clustering on your nodes, you need to thoroughly test the installation before beginning the SQL Server 2005 cluster install. If you don’t, and problems arise later with Windows 2003 clustering, you may have to remove SQL Server 2005 clustering to fix it, so you might as well identify any potential problems and resolve them now.
Below are a series of tests you can perform to verify that your Windows 2003 cluster is working properly. After you perform each test, verify if you get the expected results (a successful failover). Also be sure to check the Windows event log files for any possible problems. If you find a problem during one test, resolve it before proceeding to the next test. Once you have performed all of these tests successfully, then you are ready to continue with the cluster installation.
Preparing for the Tests
Before you begin testing, identify a workstation that has Cluster Administrator on it, and use this copy of Cluster Administrator for interacting with your cluster during testing. You will get a better test using a remote copy of Cluster Administrator than trying to use a copy running on one of the cluster nodes.
Move Groups Between Nodes
The easiest test to perform is to use Cluster Administrator to manually move the Cluster Group and SQL Server resource groups from the active node to a passive node, and then back again. To do this, right-click on the Cluster Group and then select Move Group.
Once the group has been successfully moved from the active node to a passive node, then use the same procedure above to move the group back to the original node. The moves should be fairly quick and uneventful. Use Cluster Administrator to watch the failover and failback, and check the Event Logs for possible problems. After moving the groups, all of the resources in each group should be in the online state. If not, you have a problem that needs to be identified and corrected.
Manually Initiate a Failover in Cluster Administrator
This test is also performed from Cluster Administrator. Select any of the resources found in the Cluster Group resource group (not the cluster group itself), right-click on it, and select Initiate Failure. Because the cluster service always tries to recover up to three times from a failure, if it can, you will have to select this option four times before a test failover is initiated. Watch the failover from Cluster Administrator. After the failover, then failback using the same procedure as described above, again watching the activity from Cluster Administrator. Check the Event Logs for possible problems. After this test, all of the resources in each group should be in the online state. If not, you have a problem that needs to be identified and corrected.
Manually Failover Nodes by Turning Them Off
This time, we will only use Cluster Administrator to watch the failover activity, not to initiate it. First, turn off the active node by turning it off hard. Once this happens, watch the failover in Cluster Administrator. Once the failover occurs, turn the former active node on and wait until it fully boots. Then turn off the now current active node by turning it off hard. And again, watch the failover in Cluster Administrator. After the failover occurs, bring the off node back on. Check the Event Logs for possible problems. After this test, all of the resources in each group should be in the online state. If not, you have a problem that needs to be identified and corrected.
Manually Failover Nodes by Breaking the Public Network Connections
In this test, we will see what happens if network connectivity fails. First, both nodes being tested should be on. Second, unplug the public network connection from the active node. This will cause a failover to a passive node, which you can watch in Cluster Administrator. Third, plug the public network connection back into the server. Fourth, unplug the public network connection from the now active node. This will cause a failover to the current passive node, which you can watch in Cluster Administrator. Once the testing is complete, plug the network connection back into the server. Check the Event Logs for possible problems. After this test, all of the resources in each group should be in the online state. If not, you have a problem that needs to be identified and corrected.