Timeouts after 48hours on SQL Cluster | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Timeouts after 48hours on SQL Cluster

Hi there, I am new to clusters, I have an active/active 2 node cluster. I host about 5 mission critical Databases, we just went live with the cluster on monday, and 48 hours later one of the databases started timing out and having serious performance problems, yet other databases on the same instance were unafected. I monitor the cluster with MOM 2005 and mom did not report any IO or memory problems exceeding thresholds. There was no errors at all in the event or SQL logs, everything looks perfect on the server. The server cpu usage was idle, and network ussage was very low. What do you recomend I use to figure out why after 48 hours of ussage customers started getting timeouts? Failing the cluster over to node2 fixed the problem, but I am afraid it might come back again. Thanks for any info!

What kind of timeouts:
Application timeouts like ASP script timeout?
ODBC timeouts?
SQL lock timeouts?
DNS lookup timeouts (or other network releated) like "General network error"? Please post the exact error message.

Also, are you saying that when the database with the current problem was not on a cluster, there never were any performance issues, and now, that it has been moved to the cluster, that the performance problems occured for the first time?
Brad M. McGehee, MVP
This has been a hard one to track down. The exact error is; "ERROR: SC Component; SC Get Action; Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding" What happens is we run builds against the Data Stores in SQL. We figured out that this is probably not related to the cluster. We were able to reproduce this on the old non-clustered environement. We have ruled out networking problems as well. If we stagger the builds they run fine, if we run all the builds at once then we get timeouts. Even though we get timeouts, the cluster server and memory are not even close to exceeding thresholds, processor ussage is very low. So its either a configuration issue with the build scripts, or a problem with the database and how its coded I believe. I think this has to do with the fact that we upgraded to a newer version of the database and nothing to do with the clusters now. Thanks for the responses!

I would agree it is not the cluster. Now you have the hard task of identifying the specific issue. —————————–
Brad M. McGehee, MVP
BTW what is the default timeout value in Connection Timeout and the Connect Lifetime (for pooling) in your connection string(s). FIndout what is this SC service and check event viewer log for further information. Satya SKJ
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.