Problems on Failover with SQL Services | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Problems on Failover with SQL Services

SQL 2000 SP3, Windows 2000 SP4, Compaq 5800 After a recent bout of firmware and hotfix updating I was unfortunate enough to lose one of my cluster nodes to the blue screen of death. (go figure why only one)
Attempts to recover the node failed and I resorted to going back to backups and restored the OS partition back to node 2. I’m now left with a semi working SQL cluster. Windows 2000 and clustering services appears to be working on nodes correctly. However when I failover the SQL services, they will not transfer to the restored node, and generate a few event log errors. I’m quite sure the issue is related to SQL but I’m not sure how to remove sql from the restored cluster node and and reinstall it to the restored node, as I’ve only installed to cluster instance in the past. and I really would like to avoid 1. Can anyone offer advice on the errors below
2. Can anyone offer advice on reinstalling SQL to the restored node without reinstallingthe instance to both nodes, I have seen Q290991, so I happy about removing it from the restored node, it’s installing it again I’m concerned with.
Many Thanks Andy Peck Application Log
Event 17052 – Source: MSSQLSERVER
The description for the event ID 17052 in source MSSQLSERVER cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages for a remote computer. The following information is part of the event sqsrvrs StartResourceService: StartServices MSSQLSERVER failed error 3 This is then followed by 2 other event similar in text but with ResUtilStartResourceService failed status 3 and Error 3 bringing resources online System Log
Event ID 7000, Source: Service control Manager : The MSSQLSERVER service failed tostart due to th following error: The system cannot find path specified
Event ID 1069, Source: ClusSvc: Cluster resources "SQL Server" failed
Event ID 1069 refers the service accounts used should exists on the domain with necessary privileges. (Refer to registry for further permissions for this account on those directories) And it has correct password when its created/defined for the service.
_________
Satya SKJ
Moderator
SQL-Server-Performance.Com

Satya Thanks for taking the time to read my post. I can confirm the account is configured correctly and is operational, I have no access denied’s in the security log or anything else that would point to permissions.
I’ve even logged inot the server with the cluster acount as a sanity check. Thanks Andy
quote:Originally posted by satya Event ID 1069 refers the service accounts used should exists on the domain with necessary privileges. (Refer to registry for further permissions for this account on those directories) And it has correct password when its created/defined for the service.
_________
Satya SKJ
Moderator
SQL-Server-Performance.Com

When the one node failed, and you failed over to the other node, did you remove the node using Cluster Administrator? When you look at Cluster Administrator now, do you see one or two nodes? Have you tried to add a new node to the current cluster? —————————–
Brad M. McGehee, MVP
Webmaster
SQL-Server-Performance.Com
Brad When the node failed I did not evict it from the cluster as I wasn’t sure if I was going to do a rebuild or restore. In the end I choose to do a restore and therefore avoided the eviction process. In cluster Admin I see the both cluster nodes, everything appears to work correctly, the only problem is when failing over the SQL Server resource type, I can fail over the cluster group without problem. I have not tried to add another node into the cluster. Thanks for your time Andy

Confirm that the shared disk is owned by the correct node before you start the SQL services. Permissions problems can also prevent the SQL cluster services from working correctly when performing operations on the node. The account under which setup runs MUST have the appropriate permissions: Be a local administrator for both nodes.
Have the user right to "log on as a service".
Have the user right to "act as part of the operating system".
_________
Satya SKJ
Moderator
SQL-Server-Performance.Com

Satya The default dependencies for the SQL Services are the virtual cluster name and the shared disks, and during failover you can see these pass successfully to the restored node, and the only revert when the SQL service fails to failover. The users rights issue doesn’t apply as the other node is operational using the same account, and the problem node is built from a restore taken prior to the BSOD. Cheers Andy
You may try what Brad is referred to add another node.
As the setup confirms user accounts does have necessary privileges to deal with SQL services. _________
Satya SKJ
Moderator
SQL-Server-Performance.Com

Satya I’m unable to add another node because I have no fibre cards spare at this time.
I’ve used the MS troubleshooting guidehttp://support.microsoft.com/?kbid=266274
to verify the cluster service on the restored node is configured correctly.
Also I’m NOT seeing the 1069 error "The service did not start due to a login failure" I’ve been through the cluster log and examined the failover process on the restored node, I can see the cluster service handing over resources successfully until it comes to SQL Server – SQSRVRES StartResourceServices: StartServices MsSqlServer Failed error : 3 at which point it runs the backout process. Thanks again for yor time on this issue Andy
I’m still puzzled that error 1069 represents issue with SQL Service account, also make sure the password is correct, because it was probably changed (just in case). _________
Satya SKJ
Moderator
SQL-Server-Performance.Com

I’ve checked the SQL service account again, and it’s fine on both servers, I tested it by logging in with it, and I’ve also changing another service to use the credentials and then start the service up, also checked the NTFS permissons on the directory the SQL services files reside in.. The account is ok Cheers Andy
I did a bit of googling and found this:
http://dbforums.com/arch/30/2003/5/767629 Quote:
"It turns out out that SQL Server 2000 registers SOME of
its pathnames in the old 8.3 format–including the path
to the SQL Server service. As there are usually
multiple ‘Microsoft’ subdirectories under ‘Program Files’
and ntbackup does not usually restore program directories
in the order installed, registry editing is required to
fix the problem." So check the properties of the MSSQLSERVER service in the control panel and verify the path. During the restore process it could have switched from something like this:
C:pROGRA~1MICROS~4MSSQLinnsqlservr.exe to this:
C:pROGRA~1MICROS~3MSSQLinnsqlservr.exe Souds very likely since the error you get is "The system cannot find path specified". /Argyle
When I run into complicated issues like this on a cluster, I have found that the fastest and easiest way to resolve them is to rebuild the cluster (from scratch). This is dramatic, but it has always worked for me. If you can’t do this, just try to remove SQL Server clustering, and then readd it, perhaps that will be enough to fix the problem. But if not, then you will probably need to start from scratch, unless you want to give Microsoft Support a try, as they often have hacks to resolve many cluster issues. —————————–
Brad M. McGehee, MVP
Webmaster
SQL-Server-Performance.Com
Argyle You da man. Problem fixed, Truncated path statements, when the whole path was added the services start as required. You’ve got a most excellent google technique, I’ve searched for the last 3 days and found nothing but rebuild, rebuild, rebuild, on the same forum as well. Many thanks Andy

Argyle, That’s a brilliant observation. Hats off to you! Some one should point this bug to Microsoft. Gaurav
Moderator
Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard

You just need google fingers [<img src=’/community/emoticons/emotion-5.gif’ alt=’;)‘ />]<br /><br />/Argyle
Two months later, but I thought I’d reply because this thread put me on the track to another solution to the same problem.
In my situation, the server was failing over and the SQL Server came online successfully, but then would shutdown almost immediately. The message in the log was that the server was shutting down because of a request from a service. I found the same "bringing resources online" error, but not the "system cannot find path…" message in the event log.
Just to make sure, I checked the registry and service path settings to make sure they were the same. In the process, I was looking in the MSSQL$~1 directory on the secondary node and noticed that some of the files had a datetime stamp and size that was different than on the same directory in the primary.
We applied SP3a to the servers back in May and I guess it was my assumption that clustering would automatically replicate the updated/new server files from the primary to the secondary. I guess not, eh? I copied the whole directory from the primary to the secondary and tested the failover and failback with success.
Argyle you da man. We ran into this situation while moving a key SQL server to new hardware. Your fix below was right on the money. We appreciate your assistance as it really saved some time. Thanks Kindly,
quote:Originally posted by Argyle I did a bit of googling and found this:
http://dbforums.com/arch/30/2003/5/767629 Quote:
"It turns out out that SQL Server 2000 registers SOME of
its pathnames in the old 8.3 format–including the path
to the SQL Server service. As there are usually
multiple ‘Microsoft’ subdirectories under ‘Program Files’
and ntbackup does not usually restore program directories
in the order installed, registry editing is required to
fix the problem." So check the properties of the MSSQLSERVER service in the control panel and verify the path. During the restore process it could have switched from something like this:
C:pROGRA~1MICROS~4MSSQLinnsqlservr.exe to this:
C:pROGRA~1MICROS~3MSSQLinnsqlservr.exe Souds very likely since the error you get is "The system cannot find path specified". /Argyle

Glad to be able to help. A funny side note is that I ran into the same issue a month ago with another application than SQL Server and I had completely forgotten this issue and troubleshoot for a day before realizing what it was… <img src=’/community/emoticons/emotion-4.gif’ alt=’:p‘ />
]]>