An Introduction to SQL Server Clustering
How Does Clustering Work?
Clustering is a very complex technology, so I will focus here on the big picture. In a two-cluster node, one of the SQL Servers is referred to as the primary node, and the second one is referred to as the secondary node. In an Active/Passive cluster design, SQL Server will run on the primary node, and should the primary node fail, then the secondary node will take over.
When you build a two-node cluster using Windows 2000 Advanced Server and Microsoft Clustering Service, each node must be connected to a shared disk array using either SCSI cables or fibre channel.
Typically, this shared disk array is a stand-alone unit that houses a RAID 5 or RAID 10 disk array. All of the shared data in the cluster must be stored on this disk array, otherwise when a failover occurs, the secondary node in the cluster cannot access it. As I have already mentioned earlier, clustering does not help protect data or the shared disk array that it is stored on. Because of this, it is very important that you select a shared disk array that is very reliable and includes fault tolerance.
Besides both servers being connected to a shared disk array, both nodes of the cluster are also connected to each other via a private network. This private network is u
sed for each node to keep track of the status of the other node. For example, if the primary node experiences a hardware failure, the secondary node will detect this and will automatically initiate a failover.
So how do clients who are accessing SQL Server know what to do when a failover occurs in a cluster? This is the cleverest part of Microsoft Cluster Service. Essentially what happens in a SQL Server cluster is that you assign SQL Server its own virtual name and virtual TCP/IP address. This name and address is shared by both of the servers in the cluster.
Typically, a client will connect to the SQL Server cluster using the virtual name used by the cluster. And as far as a client is concerned, there is only one physical SQL Server, not two. Assuming that the primary node of the SQL Server cluster is the node running SQL Server on an Active/Passive cluster design, then the primary node will respond to the client’s requests. But if the primary node fails, and failover to the secondary node occurs, the cluster will still retain the same SQL Server virtual name and TCP/IP address, although now a new physical server will be responding to client’s requests.
During the failover period, which can last several minutes (the exact amount of time depends on the number and sizes of the databases on SQL Server, and how active they are), clients will be unable to access SQL Server, so there is a small amount of downtime when failover occurs.
How the client software reacts to the failover process depends on the software. Some software will just wait the failover out, and when the failover has completed, it will continue just as nothing had happened. Some software will present a message box on the screen, describing a lost connection. Other client software will not know what to do, and users may have to exit, and then reload the client before they can access SQL Server again.
As part of the testing process when implementing a SQL Server cluster, it is important to find out how all of the client software that connects to SQL Server reacts to a failover. This way, you can inform your users of what to expect, so they are better able to deal with it when it does occur.
Once a failover occurs, you will want to find out what caused the failover, and then take the necessary action and correct the problem. Once the problem has been fixed, the next step is to failover SQL Server back to the primary node from the secondary node. You can schedule to do this anytime, preferably when user activity is light on the system.
What are the Pros and Cons of Clustering?
Implementing SQL Server clustering is a big decision, and one fraught with many gochas. Before you undertake such a large and important project, you will want to carefully evaluate the pros and cons of clustering, which include, but are not limited to:
Pros of SQL Server Clustering
- Reduces downtime to a bare minimum.
- Permits an automatic response to a failed server or software. No human intervention is required.
- It allows you to perform upgrades without forcing users off the system for extended periods of time.
- It allows you to reduce downtime due to routine server, network, or database maintenance.
- Clustering doesn’t require any servers to be renamed. So when failover occurs, it is relatively transparent to end-users.
- Failing back is quick, and can be done whenever the primary server if fixed and put back on-line.
- In some cases, clustering can be used to increase the scalability of an application. For example, if a current cluster is getting too busy, another server could be added to the cluster to expand the resources and help boost the performance of the application.
Cons of Clustering
- More expensive than other failover alternatives, such as log shipping or stand-by servers.
- Requires more set up time than other alternatives.
- Requires more on-going maintenance than other alternatives.
- Requires more experienced DBAs and network administrators.
Software Needed for Clustering
The software you need for SQL Server clustering depends on whether you want to cluster two nodes, or more than two nodes.
To cluster two nodes, you will need the following:
- Two Microsoft Windows 2000 Advanced Server Licenses
- One SQL Server 7.0 Enterprise or SQL Server 2000 Enterprise Licenses for Active/Passive, or two licenses for Active/Active
- The latest Windows 2000 and SQL Server Service Packs