Failed Clustered File Share Causing SQL Reboot | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Failed Clustered File Share Causing SQL Reboot

Hello all, We’re having problem with the Clustered File Share. The Clustered File Shares keep failing 1 or 2 times a day and when it does, it caused the entire cluster resource group to fail and thus caused the SQL instance that shared the same resource group to restart. Here is the error message from the System Eventlog: Date4/16/2007 2:43:11 PM
LogWindows NT (System) SourceClusSvc
CategoryFile Share Resource
Event1055
ComputerSQL01 Message
Cluster File Share resource ‘FileShare_Name‘ has failed a status check. The error code is 53. I looked up code 53 by issuing CMD command: net helpmsg 53. It is "The network path was not found". Did my Google searches and including this site but couldn’t find any. Opened an instance ticket with Micorosoft Product Support with their Cluster Support. Per their request, I have uploaded couple of PerfMon and the MPS Cluster Reports which includes all the App and Event logs, Cluster Logs, and Installation Logs. Nothing got resolved. This is an AMD 64-bit Windows 2003 Ent 2 nodes Clustered Server. Each server has 4 physical CPUs and 8 GB RAM. Active/Active clustering. 2 SQL Server 2005 64-bit instances and 1 SQL Server 2000 32-bit instance are installed. Also, if I move all the groups and SQL instances on one node, the failover happens more frequetly. So, currently, I have separated 2 SQL Server 2005 instances on one node and left the SQL 2000 instance on the other. This dampen the frequency of the above cluster failover error from happening. This one really stumped us. Appreciate all the help you can offer… Thank you very much. Eric

Go to the properties of the Clustered File Share resource in Cluster Administrator and under the "Advanced" tab, uncheck the "Restart" option and set it to "Do Not Restart". This will prevent the share from failing over the cluster when the share fails.
Thanks for your help Haywood. This eliminated the SQL instance restart, but I have to manually bring the failed file share back online, and the nightly jobs failed if they need to write to these file share.
quote:Originally posted by Haywood Go to the properties of the Clustered File Share resource in Cluster Administrator and under the "Advanced" tab, uncheck the "Restart" option and set it to "Do Not Restart". This will prevent the share from failing over the cluster when the share fails.

Yes, you still need to find out why the share is failing….but at least your instance isn’t shakin-n-bakin between nodes unexpectadly anymore. <img src=’/community/emoticons/emotion-5.gif’ alt=’;)‘ />
I was checking out the fix list for the Windows 2003 SP2. I found this KB Article about configurable cluster heartbeats,<a target="_blank" href=http://support.microsoft.com/kb/921181/.>http://support.microsoft.com/kb/921181/.</a> This makes me wonder if my Clustered File Share Failure was related.<br /><br /><blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>Originally posted by Haywood</i><br /><br />Yes, you still need to find out why the share is failing….but at least your instance isn’t shakin-n-bakin between nodes unexpectadly anymore. <img src=’/community/emoticons/emotion-5.gif’ alt=’;)‘ /><br /><hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote">
quote:Originally posted by eleung Thanks for your help Haywood. This eliminated the SQL instance restart, but I have to manually bring the failed file share back online, and the nightly jobs failed if they need to write to these file share.

Can’t you use UNC name? in the jobs to write? MohammedU.
Moderator
SQL-Server-Performance.com All postings are provided “AS IS” with no warranties for accuracy.

Thanks MohammedU.
UNC is indeed being used as \sql2kinst1sharenamefilename.txt. Or do you meant not to use clustered File Share?
What is the reason using the sharename, generally SQL Agent should have the access to all drives…
So I think you can use \VirtualServernamedrive$..filename.txt instead of \sql2kinst1sharenamefilename.txt.
MohammedU.
Moderator
SQL-Server-Performance.com All postings are provided “AS IS” with no warranties for accuracy.

That’s still a UNC path however… Use the physical drive letter: TO DISK = ‘X:BackupMyBackup.BAK’
Yep, but what was asking not to use share becuase it is local to the server <img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ /><br /><br />MohammedU.<br />Moderator<br />SQL-Server-Performance.com<br /><br />All postings are provided “AS IS” with no warranties for accuracy.<br />
That will still travel a ‘pipe’ and is not direct access to the disk. IMO, removing the network all-together is the more accurate test of the error. I also believe that this could be a kernel resource issue as well – http://support.microsoft.com/kb/304101
http://support.microsoft.com/kb/312362
Thanks again guys. The reason for the Clustered File Share because the drive is indeed a local drive to the server and not a shared drive. Therefore, if using phyiscal drive letter, the existing files won’t be available after a failover to another node. Regarding those kernel resource errors, I have not seen any on our logs.
It the drive is clustere shared you should be able see and connect to it even after failover.
Is this local drive (c and d) ? not configured in cluster share? I write sql agent job output to files to clustered shared drives and I never had any issue…
MohammedU.
Moderator
SQL-Server-Performance.com All postings are provided “AS IS” with no warranties for accuracy.

Oh no, I don’t have problem with the reading and writing to the clustered file share. Like my first post indicated, the Clustered File Share keep failing couple of times a day and caused the Resource Group failed and in turn caused the SQL Instance to restart. As suggested by Haywood, if I set the File Shares to not to restart, I have to manually bring it back online or else my nightly data imports and exports jobs to and from the File Share will failed too.
quote:Originally posted by eleung Thanks again guys. The reason for the Clustered File Share because the drive is indeed a local drive to the server and not a shared drive. Therefore, if using phyiscal drive letter, the existing files won’t be available after a failover to another node. Regarding those kernel resource errors, I have not seen any on our logs.

Not sure if I’m reading the above correctly but the share can not be on a local drive to just one of the physical nodes. I see no point in clustering a share that can not be failed over. The share has to be on a cluster aware shared disk, and in that case you can also use the drive letter X:myfiles… With your current situation it looks like a name resolution problem to your share so I would skip the UNC and put the data on a shared cluster disk instead and use X:myfiles…
Thanks Argyle. The Clustered File Share is needed because we have data import and export for and from outside sources i.e. Mainframe.
Yes but it must be on a shared cluster disk and not a local disk.
Problem resolved. It was due to a memory resource problem. Appreciate all you guys’ help. Thanks to SSC and Joseph Sack#%92s article, “Troubleshooting SQL Server with the Sysperfinfo Table”http://www.sqlservercentral.com/col…shootingsqlserverwiththesysperfinfotable.asp. It triggers me to research into how much memory does SQL Server 2000 32-bit recognize in a Windows 2003 64-bit environment. Thanks to Geoff N. Hiten onhttp://groups.google.com/group/micr…7c12facd265/9ab3ad70466660e2#9ab3ad70466660e2 and with Joseph#%92s query, I was able to confirm that SQL 2000 32-bit in the 64-bit Windows cannot address above 2 GB of RAM. After installed SQL 2000 SP4 Hotfix build 2040http://support.microsoft.com/kb/899761/, I configure SQL 2000 32-bit to enable AWE with 6 GB Max Server Memory. Once that#%92s done, the cluster failover disappeared. It#%92s more than 5 days now and no failover since. It still puzzles me that how#%92s increasing SQL Server 2000 memory has anything to do with Cluster File Share Failover? Cluster File Shares are external to SQL Server although they belong to the same Cluster Resource group.
Appreciate your feedback in this case, but they are still associated to the memory on the Operating system so having such bottle-neck it might be a problem to address. Satya SKJ
Microsoft SQL Server MVP
Writer, Contributing Editor & Moderator
http://www.SQL-Server-Performance.Com
This posting is provided AS IS with no rights for the sake of knowledge sharing. The greatest discovery of my generation is that a human being can alter his life by altering his attitudes of mind.
]]>