Event ID 1146 and 1069 – strange failover scenario | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Event ID 1146 and 1069 – strange failover scenario

Hi all,
I would like to share with you a strange situation that we notice on a 2
node cluster, windows server 2003 – no service pack.
The cluster is configured as follow:
Node A hosts SQL Server 2K sp3a and file system
Node B hosts Oracle 9i and Lotus Domino Server 6.5.3.
Sometimes, not always, when we move one group to the other node we end up in
very strange situations, like Event id 1146 and Event id 1069.
The description of event id 1146 is:
The cluster resource monitor died unexpectedly, an attempt will be made to
restart it.
The description of event id is:
Cluster resource ‘Network name’ in Resource Group ‘Group name’ failed. The problem is that when this happen, every group of the cluster are
affected by this problem.
I mean, for example, when we move SQL Server group to the other node and
this problem appears, all the other groups, like Oracle group or Lotus group
are affected by Event id 1146 and event id 1069.
I know there is a fix provided by Microsoft here:
http://support.microsoft.com/default.aspx?scid=kb;en-us;886652
The KB article say:
CAUSE
This problem occurs if the following conditions are true:
• You have a network name resource with Kerberos enabled.
• This name of the computer object that corresponds to the Kerberos-enabled
network name resource of the cluster is located in an organizational unit
that has a forward slash character in its name. So none of this conditions apply to us, but here comes my guess:
When Lotus Domino was first installed on the cluster, one share was also
created.
I don’t know if this share was created following the rules posted in this MS
KB:
http://support.microsoft.com/default.aspx?scid=kb;[LN];224967
I know for sure that when you create a cluster share the following registry
key is populated:
HKLMsystemCurrentControlSetServiceslanmanserverShares
I also notice that when you take offline that share, also the registry key
disappear.
The problem is that we found the name of one Share on that key that was
first created for the Lotus Domino group and then was deleted, but again I
don’t know how this was done.
Here comes the question:
Is it possible to think that this dirty registry key is the cause of the
strange failover scenario described before?
Sorry for this long post and for my bad english, but every suggestion is
really appreciated.
Cheers.
Franco
1069 is "Cluster resource’Network Name’ in Resource Group ‘GroupName’ failed." Check your DNS settings so that there is no name collision when the failover occur. A common issue is that when the group is going offline the name is not correctly unregistered from the DNS. Then when it tries to come back online on the other node it can’t becasue the name already exist on the network.
Thank you for the suggestion.
Cheers. Franco
I don’t have "DNS REGISTRATION MUST SUCCEED" on the network name cluster resource.
Is your suggestion still valid?
Please advise. Franco
Any other errors messages than those above?
No other errors. Franco
Which network name in which group is the 1069 error for? The cluster group, sql group, oracle group or domino group?
Tipically event id 1069 affects 2 groups at a time, but sometimes even more.
Yesterday was SQL Server and File system and one week ago was Lotus Domino and Oracle.
Franco
Hard to say what it is. Could be a conflict with some of the resource dlls. I would check %windir%clustercluster.log and if no additional info is found there open a case with Microsoft.
I think the same.
Thanks for all your time and support. Franco
]]>