How SQL Cluster Resource DLLs work with Cluster Components to Improve the Failover Process

Status Messages and Resource Monitor
The status messages shown above are generated through IsAlive and LooksAlive calls. The messages generated by both the calls are same. In our example, we will discuss using IsAlive queries. When IsAlive interval expires, Resource Monitor executes IsAlive call against the whole cluster group and this, in turn, is handled by SQL Resource DLL to run the query against its resources except SQL Service and SQL Server Text-Search resource. The messages returned by these calls include the following:  

  • Online/Offline    
  • Online/Offline Pending       
  • Failed
  • FALSE

 

 

The above status messages are passed back to Resource Monitor which in turn report to Cluster Service to take any action.

As shown in Figure 1.1, the Resource Monitor sits between the SQL Resource DLLs and Cluster Service. Any call made to SQL resources has to take place in the Resource DLL of the SQL Resource DLLs (either SQSRVRES.DLL or SQAGTRES.DLL). For example, if Cluster Service needs to check the availability of SQL resources, it will make a call to Resource Monitor; the Resource Monitor in turn will poll the SQL Resource DLLs to check the status of the SQL Resources. If Resource Monitor doesn’t receive any response from SQSRVRES.DLL or SQAGTRES.DLL or it can not detect the resources availability, it will pass the status back to Cluster Service. Cluster Service then passes this status message to related Managers as shown in the Figure 1.1. The related Managers could be Failover Manager. Managers take the action as per the status passed by lower layer components. The status message could indicate a failure of SQL resources or could indicate a simple status message. These messages and cluster actions are discussed later in this article with an example.

Moreover, if function executed by Resource Monitor doesn’t exist in the Resource DLL of the SQL Server, the request is simply discarded and no operation is carried out. In this case a event is logged in the event viewer. In fact, this happens only in case of the cluster-unaware applications.

How does SQL Resource DLLs help in failover process?
SQL Server doesn’t really utilize its own mechanism to failover the resources on the surviving node rather SQL Resource DLLs are written to “support” the failover process. The following figure shows a simple failover process:

FIGURE 1.3 – SQSRVRES.DLL and Status Messages in Cluster Failover Process

After IsAlive interval expires, Cluster Service asks the Resource Monitor to report the status of SQL resources, obviously after 60 seconds.

Resource Monitor checks the status of SQL Resources in Cluster configuration database (HKLMCluster). It provides SQSRVRES.DLL with the SQL Resource GUID and its current status.

SQSRVRES.DLL executes its IsAlive after it receives a signal from Resource Monitor to perform a check on the SQL Resources. It checks and reports back the status messages back to Resource Monitor. SQSRVRES.DLL reports the following status messages: Online/Offline, Online/Offline Pending, Failed, FALSE

After Resource Monitor receives the status, it, first, compares the status messages received from SQSRVRES.DLL with the stored one (Cluster configuration database). It takes the action as per the status messages reported by the SQSRVRES.DLL listed below:

a. If comparison is successful, no action is taken. For example, status message received in step 2 is “Online” and SQSRVRES.DLL also reports the same status.
b. If comparison is unsuccessful, the following actions are taken.

For example, if status message received in step 2 is “Online”, and    SQSRVRES.DLL reports the status message: “Offline”. Resource    Monitor executes an “Online” function. SQSRVRES.DLL receives this    message from Resource Monitor and executes Online function to bring    the SQL resource online. If resource is restarted successfully or it    comes online, the end of this process is the status messages will    reflect same in next interval.

Note: Actually speaking, Resource Monitor doesn’t take any action for Online/Offline status messages because an Administrator might have stopped the resource for maintenance purpose but the same should also reflect in Cluster configuration database before IsAlive queries. Resource Monitor takes action only when the comparison is not successful as stated above. Moreover, there shouldn’t be any inconsistencies in the Cluster configuration database. If there is any inconsistencies, that wouldn’t be longer than 60 seconds because IsAlive calls performed by Resource Monitor always updates the status in Cluster configuration database.

5. The mechanism isn’t really straight forward. There could be one more message returned by the SQSRVRES.DLL that is “Failed”. Before Resource DLL reports the “Failed” message back to Resource Monitor, it executes the Online function to bring the SQL resources online. When Online function fails, It retries five more times by performing a SQL Transact query; “SELECT @@SERVERNAME Transact SQL”. If all five retries fail, the resource is considered to be failed. The status is sent back to Resource Monitor.

(Please note SQSRVRES.DLL doesn’t really implement a separate Restart  function rather it always uses its own implemented Online function to bring  the “Failed” resources online). If resource doesn’t come online within the  specified interval or after few attempts, the resource is considered to be  failed as stated above.

6. After a resource has failed, the message is passed back to Resource Monitor. Cluster Service receives this message from Resource Monitor and starts the failover process with the help of Failover Manager.

7. If the resource is started successfully after few attempts, the failover process doesn’t occur. After next IsAlive interval these steps are repeated.

8. One more status message is “FALSE” that is returned by the IsAlive call to Resource Monitor when SQL Server is busy and can’t respond to the IsAlive requests. In this case, Resource DLL doesn’t retry five times and failover process starts.

If there is no Resource DLL for SQL Server, the failover process could take a longer time to move the resources from one node to surviving node. Because SQL Resource DLL is competent enough to handle the cluster functions executed by the Clustering Software, it doesn’t need to wait to decide which action to take. As stated above, the cluster-aware functions are mapped with SQL-specific functions, so it is easier for SQL Resource DLL to execute the functions as soon as they are executed from the Resource Monitor.

Conclusion
To summaries, SQL 6.5 was not a fully cluster-aware application. It made use of a generic DLL of the Windows Clustering. SQL 2000 and later versions are the cluster-aware application through the use of its own Resource DLL (SQSRVRES.DLL).

You also read that the Cluster Service doesn’t talk to SQL Resource DLL directly. In fact, it uses its Resource Monitors to talk to SQSRVRES.DLL. The status messages passed by the SQL Resource DLLs are received by the Resource Monitor and then appropriate actions are taken by the Resource Monitor.

Now we also know that the SQL Resource DLLs play an important role for its resources. The Resource DLLs make SQL Server a fully cluster-aware database application. The functions executed by the Resource Monitor on behalf of cluster service are supported by the SQL Resource DLL. This makes failover process faster.

You also read about the IsAlive and LooksAlive which play an important role in failover process. Without these calls, it’s not possible for SQL Resource DLL to report back the status of its resources.

To summarise: 

  • There are different intervals for LooksAlive and IsAlive calls
  • LooksAlive is executed every 5 seconds
  • LooksAlive only check the SQL Services with the help of SQAGTRESS.DLL. It doesn’t perform any check against the instance of the SQL Virtual Server.
  • IsAlive is executed every 60 seconds.
  • IsAlive is executed against the system database and SQL Virtual Instance by running a SQL Transact query. The query include “SELECT @@SERVERNAME Transact SQL”
  • The failover process starts when IsAlive 5 retires attempts fail.
  • The IsAlive and LooksAlive calls run under the Cluster Service Account
  • Both the SQL Resource DLLs run under the security context of cluster service account.

 

]]>

Leave a comment

Your email address will not be published.