SQL Server Performance

CheckQueryProcessorAlive: sqlexecdirect failed

Discussion in 'SQL Server Clustering' started by sriraj, Aug 1, 2007.

  1. sriraj New Member

    Hi all,
    we have a SQL server failover cluster ( Enterprise Edition, SP 4, Windows 2000 Advanced server, awe enabled) . All of a sudden our primary node failed over to the seconday node at mid noon.
    I am trying to find the root cause which triggered the fail over to the secondary node.
    All we could gather from the event viewer is :
    Event time : 2:45 PM3041 :BACKUP failed to complete the command BACKUP LOG ....._tlog_200707301445.TRN' WITH INIT , NOUNLOAD , NOSKIP , STATS = 10, NOFORMAT Event Time: 2: 54 PM [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed [sqsrvres] printODBCError: sqlstate = HYT00; native error = 0; message = [Microsoft][ODBC SQL Server Driver]Timeout expired [sqsrvres] OnlineThread: QP is not online. [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed [sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][ODBC SQL Server Driver]Communication link failure [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed [sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][ODBC SQL Server Driver]Communication link failure [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed [sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][ODBC SQL Server Driver]Communication link failure [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed [sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][ODBC SQL Server Driver]Communication link failure [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed Event time : 2: 55 PM [sqsrvres] ODBC sqldriverconnect failed
    [sqsrvres] checkODBCConnectError: sqlstate = 08001; native error = b; message = [Microsoft][ODBC SQL Server Driver][DBNETLIB]General network error. Check your network documentation.[sqsrvres] ODBC sqldriverconnect failed [sqsrvres] checkODBCConnectError: sqlstate = 01000; native error = 274c; message = [Microsoft][ODBC SQL Server Driver][DBNETLIB]ConnectionOpen (PreLoginHandshake()). [sqsrvres] ODBC sqldriverconnect failed [sqsrvres] checkODBCConnectError: sqlstate = 08001; native error = b; message = [Microsoft][ODBC SQL Server Driver][DBNETLIB]General network error. Check your network documentation. SQLServerAgent service successfully stopped.
    Event Time: 2:56 PM
    Started Failing over to secondary node.
    ============================================================ I have checked to see if this is due to a heavy query which exhausted all the worker threads consequently not allowing the primary node to respond to the Is Alive check from secondary node.-------------> this is not the case as the workload on the primary node at the time of this incident looked just average like all other days. I have verified with the network guys to check if there is any network port issue.--------> this is also rule out as they didn't have any problems with ports. I have logged into the primary node after the fail over and checked with the ODBC SQL server driver-----> doesn't look like this is cluprit either as i am able to test the new ODBC connection to other server ============================================================= I got no clue what else could be the root cause for the fail over to happen. I appreciate if any one of you can help me trouble shoot this issue. Thx
  2. satya Moderator

    Check under event viewer for system & application logs in this case for the same time of failover, that can give more information. Also look under cluster logs.
  3. sriraj New Member

    The error messages I posted earlier are from the Event viewer > application log.
    The SQL error log did not have any messages except for BACKUP failed message followed by login faild messages.
    Thx[quote user="satya"]
    Check under event viewer for system & application logs in this case for the same time of failover, that can give more information. Also look under cluster logs.
    [/quote]
  4. satya Moderator

    Hard to find what triggerred this error and it looks like cluster service was unable to connect to SQL Server to verify that it was running. There are numerous reasons why this could happen,by chance is this a X64 environment?
  5. sriraj New Member

    No, this is not a x64 bit version. it is x32 bit version.
    Thx
    [quote user="satya"]
    Hard to find what triggerred this error and it looks like cluster service was unable to connect to SQL Server to verify that it was running. There are numerous reasons why this could happen,by chance is this a X64 environment?
    [/quote]
  6. satya Moderator

    Then you have to monitor the events within this environment or open a support case with MS PSS if you have such arrangements.
  7. MohammedU New Member

    Did you check cluster.log? if not check it out...
  8. qwbillings New Member

    I have a x64 environment that seems to die every night at 7:00 PM MDT, which does coincide with the SAN controller becoming very busy due to backups occurring on other servers. We can reproduce the issue at will, but I am now looking for ways to prevent it from occurring.Here is my setup:HP DL585 Opterons with 64 GB of MemoryQLogic 2340 HBAsDS4300 Storage ArrayServer 2003 R2 Enterprise x64 (SP2)SQL 2005 64bit (SP2)We have had MSPS onsite, and they can't figure it out. I have loaded the IBM approved driver and bios and configured the server according to thier recommndations, but the thing keeps dying.If anyone has any ideas on why this is occuring and what I can do to stop it, I would appreciate any and all help as I am at my wits end.Thanks,Wade
  9. satya Moderator

    IBM...HP SAN... check firmware drivers update in this case, recently we had similar issue and found out firmware issue on the disks.
  10. jwadz New Member

    I am currently experiencing the same issue. We are loosing all of our windows 2003 clusters on a specific storage array - 1 single node, 1 two node, and 1 four node cluster at the same time every Saturday night. Failures coincide with a heavy backup run on one of the three clusters. Our hardware is configured as follows for all servers: Dell 6850 (Intel) running Windows 2003 enterprise R2 SP2 (x64)Microsoft SQL Server 2005 Enterprise Edition - Release 9.00.3200.00 (x64) (SP2+)
    QLogic 2460 HBA's connecting to a Brocade 4900 switches and a SUN/STK 6540 storage array controller.
    If any more information was determined I would be very interested to hear...
  11. Ovrkill New Member

    Was wondering what backup software you use? We are having the same issue. Every Friday night anywhere imbetween 9:00PM and Saturday Morning around 5:00am our SQL will crash. We are running 2 Dell 6850 running Win2003 R2 x64 attached to an EMC switch and an IBM DS4300. The cluster is in an Active/Passive mode. The cluster does not fail over and the only clustered resource that goes offline is the SQL server. It seems to only take a sec or two, but its enough to cause issues with other processes we have running since the database becomes unavailable and the service stops and restarts.
    We were thinking it was an issue with our SAN, but in this thread there is different host hardware, different San hardware, different HBA controllers, and Different switches. Maybe we might look at what we have similar... msSQL 2005, Clustering Service, Windows 2003 R2 x64, Possibly Backup software? thats about the only other software we have on the nodes.
    We also have 2 other Database clusters on the SAN some that even boot from the san, and they do not crash or have any issues.
  12. gregolso New Member

    Anybody found an answer to the issues above? I'm have the same issue on on node in our new 4 node Windows 2003 SP2 R2/Sql 2005 Sp2 enterprise (All active) cluster. During a nightly maintance job on one of the instances, I get the same error and warnings (also including one "information " message about a io taking longer then 15 seconds to complete). It appears to me that during this job, for some reason the cluster is not able to verify the health of the sql instance so it restarts it.
    Very frustrating issue.
    Hardware:
    HPDL560 G2 Quad 2.9 (quad core) procs, 36gb or ram, connected to an EMC CX3-40 through Qlogic hba and Brocade 5100 switch.
  13. satya Moderator

Share This Page