An Introduction to SQL Server 2005 Clustering Basics
Clustering is a complex technology with lots of messy details. To make it easier to understand, let’s take a look at the big picture of how clustering works. In this article we take a look at:
- Active vs. Passive Nodes
- Shared Disk Array
- The Quorum
- Public and Private Networks
- The Virtual Server
- How a Failover Works
Active vs. Passive Nodes
Although a SQL Server 2005 cluster can support up to eight nodes, clustering actually only occurs between two nodes at a time. This is because a single SQL Server 2005 instance can only run on a single node at a time, and should a failover occur, the failed instance can only fail over to another individual node. This adds up to two nodes. Clusters of three or more nodes are only used where you need to cluster multiple instances of SQL Server 2005.
In a two-node SQL Server 2005 cluster, one of the physical server nodes is referred to as the active node, and the other one is referred to as the passive node. It doesn’t matter which physical servers in a cluster is designated as the active or the passive, but it is easier, from an administrative point of view, to go ahead and assign one node as the active and the other as the passive. This way, you won’t get confused about which physical server is performing which role at the current time.
When we refer to an active node, we mean that this particular node is currently running an active instance of SQL Server 2005 and that it is accessing the instance’s databases, which are located on a shared data array.
When we refer to a passive node, we mean that this particular node is not currently in production and it is not accessing the instance’s databases. When the passive node is not in production, it is in a state of readiness, so that if the active node fails, and a failover occurs, it can automatically go into production and begin accessing the instance’s databases located on the shared disk array. In this case, the passive mode then becomes the active node, and the formerly active node now becomes the passive node (or failed node should a failure occur that prevents it from operating).
Shared Disk Array
So what is a shared disk array? Unlike non-clustered SQL Server 2005 instances, which usually store their databases on locally attached disk storage, clustered SQL Server 2005 instances store data on a shared disk array. By shared, we mean that both nodes of the cluster are physically connected to the disk array, but that only the active node can access the instance’s databases. There is never a case where both nodes of a cluster are accessing an instance’s databases at the same time. This is to ensure the integrity of the databases.
Generally speaking, a shared disk array is a SCSI- or fiber-connected RAID 5 or RAID 10 disk array housed in a stand-alone unit, or it might be a SAN. This shared array must have at least two logical partitions. One partition is used for storing the clustered instance’s SQL Server databases, and the other is used for the quorum.
When both nodes of a cluster are up and running, participating in their relevant roles (active and passive) they communicate with each other over the network. For example, if you change a configuration setting on the active node, this configuration change is automatically sent to the passive node and the same change made. This generally occurs very quickly, and ensures that both nodes are synchronized.
But, as you might imagine, it is possible that you could make a change on the active node, but before the change is sent over the network and the same change made on the passive node (which will become the active node after the failover), that the active node fails, and the change never gets to the passive node. Depending on the nature of the change, this could cause problems, even causing both nodes of the cluster to fail.
To prevent this from happening, a SQL Server 2005 cluster uses what is called a quorum, which is stored on the quorum drive of the shared array. A quorum is essentially a log file, similar in concept to database logs. Its purpose is to record any change made on the active node, and should any change recorded here not get to the passive node because the active node has failed and cannot send the change to the passive node over the network, then the passive node, when it becomes the active node, can read the quorum file and find out what the change was, and then make the change before it becomes the new active node.
In order for this to work, the quorum file must reside on what is called the quorum drive. A quorum drive is a logical drive on the shared array devoted to the function of storing the quorum.
Public and Private Networks
Each node of a cluster must have at least two network cards. One network card will be connected to the public network, and the other to a private network.
The public network is the network that the SQL Server 2005 clients are attached, and this is how they communicate to a clustered SQL Server 2005 instance.
The private network is used solely for communications between the nodes of the cluster. It is used mainly for what is called the heartbeat signal. In a cluster, the active node puts out a heartbeat signal, which tells the other nodes in the cluster that it is working. Should the heartbeat signal stop then a passive node in the cluster becomes aware that the active node has failed, and that it should at this time initiate a failover so that it can become the active node and take control over the SQL Server 2005 instance.
The Virtual Server
One of the biggest mysteries of clustering is how do clients know when and how to switch communicating from a failed cluster node to the now new active node? And the answer may be a surprise. They don’t. That’s right; SQL Server 2005 clients don’t need to know anything about specific nodes of a cluster (such as the NETBIOS name or IP address of individual cluster nodes). This is because each clustered SQL Server 2005 instance is given a virtual name and IP address, which clients use to connect to the cluster. In other words, clients don’t connect to a node’s specific name or IP address, but instead connect to a virtual name and IP address that stays the same no matter what node in a cluster is active.
When you create a cluster, one of the steps is to create a virtual cluster name and IP address. This name and IP address is used by the active node to communicate with clients. Should a failover occur, then the new active node uses this same virtual name and IP address to communicate with clients. This way, clients only need to know the virtual name or IP address of the clustered instance of SQL Server, and a failover between nodes doesn’t change this. At worst, when a failover occurs, there may be an interruption of service from the client to the clustered SQL Server 2005 instance, but once the failover has occurred, the client can once again reconnect to the instance using the same virtual name or IP address.