SQL Server 2000 & 2005 Clustering
Chapter 1 – Basics of SQL Server Clustering
If your AAA critical SQL server faces a memory board failure, how long will be the outage? How much will this cost your business in productivity and data availability to the users? Being a SQL Server DBA can be demanding and stressful, as the success of your application is often a function of your database uptime. As DBA, we have some control over the uptime of SQL servers, but there are many uncertain areas, which we do not have full control of. There is not much a DBA can do if motherboard fails on a server. As you may already be aware, there is one way to help boost your SQL Server’s uptime, and that is by clustering your SQL Servers. This way, should one SQL Server fail in the cluster, another clustered server will automatically take over, keeping downtime to minutes, instead of hours or more.
Clustering can be best described as a technology that automatically allows one physical server to take over the tasks and responsibilities of another physical server that has failed. The obvious goal behind this, given that all computer hardware and software will eventually fail, is to ensure that users running AAA applications will have little or no downtime when such a failure occurs. Downtime can be very expensive, and our goal as DBA is to help reduce it as much as possible.
More specifically, clustering refers to a group of two or more servers, also called nodes, that work together and represent themselves as a single virtual server to the network. In other words, when a client connects to clustered SQL Servers, it thinks there is only a single SQL Server, not more than one. When one of the nodes fails, its responsibilities are taken over by another server in the cluster, and the end-user notices little, if any differences before, during, and after the failover.
One very important aspect of clustering that often gets overlooked is that it is not a complete backup system for your databases. It is only one part of a multi-part strategy required to ensure minimum downtime and 100% recoverability.
The main benefits that clustering provides is the ability to recover from failed server hardware — excluding the shared disk, and failed software; such as failed services or a server lockup. It is not designed to protect data, to protect against a shared disk array from failing, to prevent hack attacks, to protect against network failure, or to prevent SQL Server from other potential disasters, such as power outages.
Clustering is just a part of an entire strategy needed to help reduce SQL Server downtime. You will also need to have a shared disk array that offers redundancy and make tape backups. So don’t think that clustering is all you need to create a highly available SQL Server system. It is just one part of it.
Chapter 2 – Types of SQL Server Clustering
Once you decide to go for clustered SQL Server, you have to choose the cluster layout. This choice is extremely important for architecting the clustering environment and it can be made upon your application and business needs. Let’s look at the configuration types.
Active / Passive
An Active/Passive, or Single Instance cluster, refers to a scenario where only one instance of SQL Server is running on one of the physical node in the cluster, and the other physical node does nothing, other then waiting to takeover should the primary node fail, or a manual failover for maintenance. From a performance perspective, this is the better solution. On the other hand, this option makes less productive use of your physical hardware, which means this solution is more expensive.
If an active node fails and there is a passive node available, applications and services running on the failed node can be transferred to the passive node. Since the passive node has no current workload, the server should be able to assume the workload of the failed server without any problems (assuming the hardware of the nodes is the same).
2-Node Clustering Active / Passive Scenario
In this case, let’s look at a two node example, Node X and Node Y. Node X will be configured as Active Node — Primary Owner of SQL Server instance and having that instance running on it. As you can see in the case below, Node Y is in passive or standby mode, doing nothing. The active cluster will be communicating and working along with the shared disks.
2-Node Clustering Active / Passive Failover Scenario
When a failover occurs on Node X, SQL Server instance A will get transferred, with all its running processes, connections, and responsibilities to Passive Node Y, and now Node Y will be the Active Node. As you can see, even after the failover, the active cluster is communicating and working with Shared Disks as usual, there is no change.
4-Node Clustering Active / Passive Scenario
In this case, let’s look at an example of four nodes, Node X and Node Y, Node XX and Node YY. Node X will be configured as an Active Node — Primary Owner of SQL Server Instance A and Node XX is also an Active Node – Primary Owner of SQL Server Instance AA. As you can see in below case Node Y and YY are in Passive, or Standby mode, doing nothing.