An Introduction to SQL Server Clustering
What is Clustering?
Clustering can be best described as a technology that automatically allows one physical server to take over the tasks and responsibilities of another physical server that has failed. The obvious goal behind this, given that all computer hardware and software will eventually fail, is to ensure that users running mission-critical applications will have little or no downtime when such a failure occurs. Downtime can be very expensive, and our goal as DBA is to help reduce it as much as possible.
More specifically, clustering refers to a group of two or more servers (generally called nodes) that work together and represent themselves as a single virtual server to a network. In other words, when a client connects to clustered SQL Servers, it thinks there is only a single SQL Server, not more than one. When one of the nodes fails, its responsibilities are taken over by another server in the cluster, and the end-user notices little, if any differences before, during, and after the failover.
Microsoft added clustering features to its operating system when they introduced Windows NT Server 4.0 Enterprise Edition several years ago. The actual clustering feature was called MSCS (Microsoft Clustering Server). While some brave folks actually put the software into production, I personally avoided it as it was not as dependable as Microsoft led you to believe. Also, about this same time, SQL Server 6.5 Enterprise Edition was released, allowing it to be clustered. This was a very crude attempt at clustering SQL Server that was rarely implemented in the real world.
Later, when SQL Server 7.0 became available, there were major improvements in SQL Server clustering, but no way was it still good enough, as Windows NT 4.0 Server Enterprise MSCS was still being used at its foundation, and it just wasn’t good enough for maintaining high-availability servers.
Fortunately, Microsoft’s second attempt at clustering, now called the Microsoft Cluster Service, in Windows 2000 Advanced Server and Windows 2000 Datacenter Server, is much better. While I won’t claim it is perfect, at least I now am willing to use it to cluster mission-critical SQL Servers. Cluster Service not only works well when clustering SQL Server 7.0, it is even better if you want to cluster SQL Server 2000, which has greatly enhanced clustering support.
One very important aspect of clustering that often gets overlooked is that it is not a complete backup system for your applications. It is only one part of a multi-part strategy required to ensure minimum downtime and 100% recoverability.
The main benefits that clustering provides is the ability to recover from failed server hardware (excluding the shared disk) and failed software, such as failed services or a server lockup. It is not designed to protect data, to protect against a shared disk array from failing, to prevent hack attacks, to protect against network failure, or to prevent SQL Server from other potential disasters, such as power outages or acts of God.
Clustering is just one part of an entire strategy needed to help reduce application downtime. You will also need to purchase a shared disk array (more on this later) that offers redundancy, make tape backups, put the server behind a firewall, make sure your network connections have redundancy, use battery backup, and locate the server in a secure facility, among many other steps you can take. So don’t think that clustering is all you need for creating a highly available SQL Server. It is just one part.
What Are the Types of Clustering? When you decide you want to cluster SQL Server, you have a choice of configuring what is called Active/Active or an Active/Passive cluster. Each has its own pros and cons. Let’s look at each, in the context of a two-node SQL Server cluster. An Active/Active SQL Server cluster means that SQL Server is running on both nodes of a two-way cluster. Each copy of SQL Server acts independently, and users see two different SQL Servers. If one of the SQL Servers in the cluster should fail, then the failed instance of SQL Server will failover to the remaining server. This means that then both instances of SQL Server will be running on one physical server, instead of two. As you can imagine, if two instances have to run on one physical server, performance can be affected, especially if the server’s have not been sized appropriately. An Active/Passive SQL Server cluster refers to a SQL Server cluster where only one instance of SQL Server is running on one of the physical servers in the cluster, and the other physical server does nothing, other that waiting to takeover should the primary node should fail. From a performance perspective, this is the better solution. On the other hand, this option makes less productive use of your physical hardware, which means this solution is more expensive. Personally, I prefer an Active/Passive configuration as it is easier to set up and administer, and overall it will provide better performance. Assuming you have the budget, this is what I recommend.
Two- or Four-Node Clustering? SQL Server can be clustered using two nodes (using Windows 2000 Advanced Server), or it can be clustered using more than two nodes (using Windows 2000 Datacenter). Since I don’t personally have any experience is three or four node clustering, I won’t be discussing it here. But for the most part, what I say about two-node clustering also applies to three- or four-node clustering.