Un-Install a SQL Server Cluster: Step-by-Step

Step-9 Verify the Un-Install

1. Open services in the administrative tools and there should not be any services named such as MSSQL or SQLAgent containing the Instance name which installed was failed.

2. Open cluster administrators and there should not be any SQL Server resources registered.

3. Physical Directory structure should not exist.

4. SQL Server Program Folders / Shortcuts should not exist.

Chapter 9 – Tips for SQL Server Clustering

Reboot Sequence of Cluster Nodes

In situations where you would like to reboot an entire cluster, it is always preferred to reboot the cluster nodes one by one. Let’s look at an example of a cluster having two nodes: ABC (Active Node) & XYZ (Passive Node). To reboot entire cluster, first reboot the Active node ABC, this will initiate a failover and XYZ will become the active node. When node ABC comes up completely, then reboot XYZ. This will initiate a failover, and this time, ABC will become the Active node. As soon XYZ is up and running, it will be again in Passive / Stand-by mode.

Preferred Heartbeat Latency

The heartbeat is scaled based on ping latency between two nodes. It is highly preferred that latency period do not exceed 500ms between two nodes.

Manual Fail-Back

It is preferred that the fail-back configuration of a cluster be set to manual. The reason for this, as we all know, is that a failover / fail-back event produces a small outage of SQL Server. So if the fail-back is set to automatic, another failover would occur, further interrupting users. If fail-back is set to manual ,then we can do a fail-back in off hours so that it does not impact users.

Replication Environment

If you are going to implement replication on a clustered server, and the SQL Server cluster is going to participate as a Publisher and a Distributor, then consider using a file share located on the cluster disk resource as the snapshot folder. By doing this, whenever there is a failover event, replication will also failover along with SQL Server.

Test Your Failover and Fail-Backs Periodically

To ensure failover / fail-back will work when needed, it is important to schedule manual failovers / fail-back events periodically in order to test if they are working.

Critical Services to Failover

It is preferable to prevent the mssqlserver and the sqlserveragent services from being failed over because a less critical service has failed. To do this, configure the less critical services not to affect the entire cluster group. For example, if the Full-Text Search service fails, you most likely don’t want your entire cluster to failover because of it. To prevent this, you need to make sure that only critical services initiates the failover of entire group.

Failover Behavior

When any SQL Server administrative tools, such as Enterprise Manager, Query Analyzer, and Profiler, are performing some task, and a SQL Server failover occurs, the current process will stop and have to be restarted once the failover is complete.

Function to retrieve the Nodes of cluster

select * from ::fn_virtualservernodes()

Minimize the Failover Time – Recovery Interval

In case of SQL Server failover event, it has to go through an automatic recovery process when it restarts on the new node. Any transactions that were not written to the database when the failover occurred must be rolled forward or back as appropriate via the transaction log. This activity can take some time if there are a huge amount of transactions in memory that had yet to be written to the database when the failover occurred.

The recovery period can be forced by changing SQL Server’s “recovery interval” setting. By default, it is set to “0”, which means that it is set to automatic configuration. It says that the recovery period should not be longer than a minute. But as per system architecture and transaction volume, DBAs sometimes change the “recovery interval” to be a number, such as eight. Assigning a number to the “recovery interval” setting means forcing SQL Server not to take more than this number of minutes to recover. For example, if five is used as the setting, this tells SQL Server that it can take up to five minutes to recover. Technically this reduces how often checkpoints are performed in SQL Server. If you set the “recovery interval” to some number other than “0”, then it will take longer for a failover to occur, as it takes longer for the automatic recovery process to run. In most cases, you should leave the “recovery interval” set to the default value of “0”. This will make cluster node failover time optimal.

COM Objects in Clustering Environment

If your cluster uses Extended Stored Procedures, the xp’s must be installed on a shared array. If the xp’s use COM objects, these must be installed and registered on all modes of the cluster. If xp’s used in a clustered SQL Server are not located on the shared array, then should failover occur, the xp’s will fail to run. If COM objects used by an xp are not installed on all nodes and properly registered, then should failover occur, they will not function.

Copyright 2004 by the author. Published with the express permission of the author.

]]>

Leave a comment

Your email address will not be published.