Trace Logs and finding error source | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Trace Logs and finding error source

Hi, I’d like some help tracing the source of errors found in the trace logs to the query that invoked the error. Our server has been failing several times a week. We don’t have a DB admin, just me, the developer, so please be patient. I can view all the errors in the Profiler, but not sure how to follow error to query source. This seemed to be what I was looking for but I have not been able to impliment or make sense of this:
Displaying Execution Plans by Using SQL Server Profiler Event Classes Can someone please help or let me know what other info I can provide to get help with this? Thanks! Andrew
Do you have any errors in SQL error logs or/and Event Viewer? Luis Martin
Moderator
SQL-Server-Performance.com Although nature commences with reason and ends in experience it is necessary for us to do the opposite, that is to commence with experience and from this to proceed to investigate the reason.
Leonardo Da Vinci Nunca esperes el reconocimiento de tus hijos, eso ocurrirá luego de tu muerte
All postings are provided “AS IS” with no warranties for accuracy.
if you are getting outright OS or SQL Server failures, follow the MS PSS directions on data collection and let them handle it.
or you could look at the data and handle it yourself
if this is just queries that fail, when you run a profiler trace, one of the event class groups is Performance,
several classes within Performance has execution plan data,
but the one with the most information is Show Plan Statistics. i am not a big fan of using profiler to capture execution plans on a live server,
my preference is to find the problem query,
then examine the execution plan from query analyzer

quote:Originally posted by joechang if you are getting outright OS or SQL Server failures, follow the MS PSS directions on data collection and let them handle it.
or you could look at the data and handle it yourself
if this is just queries that fail, when you run a profiler trace, one of the event class groups is Performance,
several classes within Performance has execution plan data,
but the one with the most information is Show Plan Statistics. i am not a big fan of using profiler to capture execution plans on a live server,
my preference is to find the problem query,
then examine the execution plan from query analyzer

OK, so bear with me… this is such an academic question; How do you use the trace logs to find the problem query? I have two significant errors in my logs. One is Hash Warning and the other is Missing Join Operand (both from memory right now). So how do I trace the occurrence of the errors to the source, offending query? Thanks, Andrew
ONce you’ve capture with PROFILER, paste the same query to the query analyzer and check the execution plan, post the relevant errors and also refer to MSDN or books online for relevant information. Satya SKJ
Microsoft SQL Server MVP
Contributing Editor & Forums Moderator
http://www.SQL-Server-Performance.Com
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
quote:Originally posted by satya ONce you’ve capture with PROFILER, paste the same query to the query analyzer and check the execution plan, post the relevant errors and also refer to MSDN or books online for relevant information. Satya SKJ
Microsoft SQL Server MVP
Contributing Editor & Forums Moderator
http://www.SQL-Server-Performance.Com
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.

Sorry I feel like I’m going around in circles here… How do you "capture with PROFILER"? That is the whole question. I don’t know how to link or otherwise associate a particular instance of an error in the trace logs to the query that caused it to determine the query that is causing the trouble. Andrew

i don’t know why hash warning is even a warning,
if one intended to do a hash join, why warn the missing join operand is something that should be looked into,
did you mean to do a cross join?
if not, fix the query that is missing the join operand
quote:Originally posted by joechang i don’t know why hash warning is even a warning,
if one intended to do a hash join, why warn the missing join operand is something that should be looked into,
did you mean to do a cross join?
if not, fix the query that is missing the join operand

To the best of my knowledge and off the top of my head, the hash warning occurs when the memory limit for the given join is exceeded. Still looking for an answer to my original question, please.
http://support.microsoft.com/?id=224587
http://vyaskn.tripod.com/analyzing_profiler_output.htm
http://www.developer.com/db/article.php/3482216
http://techrepublic.com.com/5100-1035_11-1043670.html
http://techrepublic.com.com/5138-9592-5657128.html
http://support.microsoft.com/default.aspx?scid=kb;en-us;325297
… about PROFILER usage, fyi. Satya SKJ
Microsoft SQL Server MVP
Contributing Editor & Forums Moderator
http://www.SQL-Server-Performance.Com
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
quote:Originally posted by satya http://support.microsoft.com/?id=224587
http://vyaskn.tripod.com/analyzing_profiler_output.htm
http://www.developer.com/db/article.php/3482216
http://techrepublic.com.com/5100-1035_11-1043670.html
http://techrepublic.com.com/5138-9592-5657128.html
http://support.microsoft.com/default.aspx?scid=kb;en-us;325297
… about PROFILER usage, fyi. Satya SKJ
Microsoft SQL Server MVP
Contributing Editor & Forums Moderator
http://www.SQL-Server-Performance.Com
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.

Hmmm…. I’ve reviewed the links provided and all of them appear to deal with SQL 2K and below. We are using 2005. I tried to follow the instructions but the interface for the profiler is apparently different then it was for 2K, and I’m not experienced enough to wade through it. The traces are already on the server. I’ve never created a new trace. I believe they are generated automatically by the server… Can someone shed some light on how these existing traces are being generated and how I can modify them to include the data I need? When I open an existing trace on the server and view the properties, events tab, there appears to be no way to select the column "TextData" for the two errors we are seeing (listed above). If the TextData column for the errors will allow me to see the offending query, can someone please tell me how to enable that column in my traces that are currently being generated on the server? Thanks, Andrew
I missed these 2 y’day:
http://www.extremeexperts.com/SQL/Yukon/DeadLockDetection.aspx
http://searchsqlserver.techtarget.com/tip/1,289483,sid87_gci1171978,00.html?bucket=ETA Satya SKJ
Microsoft SQL Server MVP
Contributing Editor & Forums Moderator
http://www.SQL-Server-Performance.Com
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
dnught -<br /><br />I think I see the source of the confusion – in SQL Server, traces are not the same thing as logs. Logs will show basic server activity and diagnostics. They are listed under Management &gt; SQL Server Logs in Management Studio. They will not show individual queries.<br /><br />Traces are another thing entirely, and have to be set up and run to capture data. Traces actually capture the SQL activity, queries being passed in to the server (a little like sniffing traffic to the database). Traces are generally only run on an as-needed basis, and have a performance impact. They are configured using SQL Server Profiler to capture particular queries, meeting criteria you can define.<br /><br />It is possible to manually run a trace, capturing the times that batches are started and completed, then compare those timestamps to errors in the log to try to correlate the two. In my experience, however, that’s not the most productive method of troubleshooting. If the thing is actually crashing, it’s unlikely that a query is really the cause, but rather some issue with the server itself. Queries can be slow, or drain resources, but it’s rare that they actually trip up the server in such a way that an error is logged. It can happen, but it’s not normal.<br /><br />Out of curiosity, are you on an up-to-date SQL server installation? SP1? (just came out, so that’s perhaps not a fair question – but it might fix your problem anyway <img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />.
<blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>Originally posted by merrillaldrich</i><br /><br />dnught -<br /><br />I think I see the source of the confusion – in SQL Server, traces are not the same thing as logs. Logs will show basic server activity and diagnostics. They are listed under Management &gt; SQL Server Logs in Management Studio. They will not show individual queries.<br /><br />SNIP<br /><br />Out of curiosity, are you on an up-to-date SQL server installation? SP1? (just came out, so that’s perhaps not a fair question – but it might fix your problem anyway <img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />.<br /><hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote"><br /><br />Thanks for another great reply. <br /><br />We are NOT on SP1 yet. This question has brought another general business related question to the table, and I’d really like some comments or advice from professionals regarding this issue…<br /><br />Here’s my situation in a nutshell… <br /><br />We HAD 4 servers Win 2K Servers, Exchange 2K, SQL 2K, + all owned, but no one to run the servers (legacy company stuff, long story) so I ran them myself (Standalone Developer/IT Manager, what fun) with sporadic and dangerous results since it was never correctly hardened to begin with, ie: worms, system failures, etc. This is an important part of the story.<br />At my suggestion and a bit of no-brainer logic, the CEO finally decided to update the hardware.<br /><br />NOW we have UPGRADED to Win 2003, Exchange 2003, SQL 2005 leased boxes and licenses, AND we#%92ve stepped up our Support package afforded by our Host to cover these types of matters. We paid them a reasonable fee to prepare, configure, harden the new network and port over the SQL data and web files. I did the development transition, ie: connection strings, file permissions, etc. for the web apps to complete the migration. <br /><br />The COST of the upgrade has been substantial for our small company. We paid a hefty price for the server transfer/migration, and we#%92re paying a substantial amount of fees for co-location each month. To be more specific, it is costing us over $2K/month for housing and connection for the three servers, limited support, bandwidth allocation, and backup services.<br /><br />SO, the problem is; a month after the initial setup, the SQL server has begun to fail on a daily basis. It seems to require a reboot. This can happen at any time of the day though I have not charted it yet since the Host, on off-business hours, reboots it for us and does not necessarily notify us. The system/application logs and traces have not revealed anything to me that I can decipher. The Host has suggested that certain resource consuming apps (such as a .net mass mailing app for newsletters) may be causing the failures, though they have offered only anecdotal evidence to suggest that this could be the case – “We#%92re seeing some activity related to the mail server before some of the failures” even though the SQL server has nothing to do with the exchange server whatsoever and the SQL failures don#%92t appear to correspond to timing related to launching the mailing app.<br /><br />So my question is… as a professional DBA, what would you say are the chances that certain queries invoking the errors in the trace logs (ie: Hash Warning and Missing Join Operand) would cause the SQL service to bail out entirely? Shouldn#%92t part of the host#%92s responsibilities, since they set up the server, be to alert our dev team (that#%92s me) to the exact query causing the error? Should I expect them to take responsibility for the stability of the environment since they set it up or should I be responsible since I program against it? I can see them bouncing a hefty invoice back to us for researching the issue and implying it#%92s our fault… my last contact from them was a CC of the tech dept to their sales dept regarding whether we should be charged to update the SQL server to SP1. I#%92m not sure how they will respond but my boss is gonna have a stroke, ESPECIALLY if that does not fix the problem, and for good!<br /><br />This is a bit of a nightmare for me since it was my suggestion that new servers would leave us in a much better place from a reliability standpoint (since we certainly are), and now they are implicating me as the developer for the instability.<br /><br />Can some pros weigh in on my dilemma? I could use some advice re: the politics of these sorts of issues…<br /><br /><br />Thanks,<br /><br />Drew<br /><br /><br />
i am ok with upgrading to reasonably recent software, ie, W2K should definitely be replaced with W2K3,
don’t know about E2K/3,
S2K5 is new, so whether you upgrade right aways or not is your choice,
on hw upgrade, if you were not having high usage, i would question the decision to upgrade all,
i would have suggested buying 1 new server, install in a disciplined manner, hardening etc, move 1 app over,
take the old server, wipe clean, fresh install etc,
ie, just buy 1 new server so you give each system a clean install, on managed colocation, i have serious issues with the quality of managed services provided,
most will definitely charge you for it, but it is unclear how many can actually provide meaningful services.
there are a couple that i have some confidence it, one is DataReturn they have a permanent rep at a MS Technology Center, so even if they cannot diagnose your problem, they have close contact with a MS facility to do so.
another (for this issue) is SqlSentry, because the ceo is also a good DBA, one that i do not have confidence in, which shall not be named here, because there idea of managed services is to install BMC Patrol Agent, which has never provided any meaningful advice on what is wrong, and was observed to cause problems of its own, on your technical problem, as i said earlier, disread the hash warning, because there is nothing wrong with a hash, unless it is because the table & index design, and query was written poorly, causing a table scan when an indes seek should have been better, for which the hash is also used.
ie, it is not the fault of the hash that is the source of the problem missing join is a serious coding problem,
but neither of these should be the true cause of the SQL process crashing,
however, whoever built this app, if it has poor execution plans and missing join operands, then it probably has all sorts of other problems that would cause SQL to crash are you using /3GB? if so, drop it, it is actually less stable in W2K3 than W2K on politics, if there is no problem that has upper management visibility,
never be the sole advocate of change,
discuss it until other people think it is their idea,
ie, just because you don’t have the latest toy, deal with it. anyways, if SQL Server is crashing, open a MS PSS case, don’t waste your time with bozo’s with no real SQL Server skills
quote:Originally posted by joechang i am ok with upgrading to reasonably recent software, ie, W2K should definitely be replaced with W2K3,
don’t know about E2K/3,
S2K5 is new, so whether you upgrade right aways or not is your choice,
on hw upgrade, if you were not having high usage, i would question the decision to upgrade all,
i would have suggested buying 1 new server, install in a disciplined manner, hardening etc, move 1 app over,
take the old server, wipe clean, fresh install etc,
ie, just buy 1 new server so you give each system a clean install,

5 year old servers and software with no admin do not a stable system make. In this case, the (primary) choices were …
1. Hire a sys admin full or part time. Cost ~2-5K/month PLUS the hardware would still need replacing and the systems were never properly configured so the mess would need to be cleaned up. It’s easier to start from scratch!
2. Get new leased machines and software and let the co-host provide administration as needed. We’ve been pulling it off for 2 years with no other sys admin besides myself, so this should not be unrealistic.
quote:Originally posted by joechang
on managed colocation, i have serious issues with the quality of managed services provided,
most will definitely charge you for it, but it is unclear how many can actually provide meaningful services.
there are a couple that i have some confidence it, one is DataReturn they have a permanent rep at a MS Technology Center, so even if they cannot diagnose your problem, they have close contact with a MS facility to do so.
another (for this issue) is SqlSentry, because the ceo is also a good DBA,

In this case the co-host is relatively large company, assuring that they have the proper resources. So the question is not "Can they fix it?", rather it’s "Should we get charged for this particular ‘fix’".
quote:Originally posted by joechang
one that i do not have confidence in, which shall not be named here, because there idea of managed services is to install BMC Patrol Agent, which has never provided any meaningful advice on what is wrong, and was observed to cause problems of its own, on your technical problem, as i said earlier, disread the hash warning, because there is nothing wrong with a hash, unless it is because the table & index design, and query was written poorly, causing a table scan when an indes seek should have been better, for which the hash is also used.
ie, it is not the fault of the hash that is the source of the problem missing join is a serious coding problem,
but neither of these should be the true cause of the SQL process crashing,
however, whoever built this app, if it has poor execution plans and missing join operands, then it probably has all sorts of other problems that would cause SQL to crash

I designed the app, all of them, several sites, major complexity, no small task. What’s an execution plan? (enough said?) There are THOUSANDS and THOUSANDS and THOUSANDS of lines of code and many, many queries. The entire reason this thread has come up is that it would appear to be a rather complex process to trace the given errors to their source query to attempt to find problems and fix them. If the host won’t do that as part of their service, I shall have to do this myself. None of the suggestions made on various forums have made much sense to me in that the error trace should simply indicate clearly what query is causing the problem. Is this so unreasonable a task to ask of an SQL server? It may be easier for someone with experience, but it would seem that now is the time for my initiation…
quote:Originally posted by joechang are you using /3GB? if so, drop it, it is actually less stable in W2K3 than W2K

Since I don’t know what/3GB is, I would not know.
quote:Originally posted by joechang
on politics, if there is no problem that has upper management visibility,
never be the sole advocate of change,
discuss it until other people think it is their idea,
ie, just because you don’t have the latest toy, deal with it.

In this case the goal of updating was not to "get the latest toy", but rather to ensure that thousands of dollars would not have to be expended to restore a failed network since viruses, worms, and trojans had infected it. AND it was just time for a new network on new machines. We are not a large company, so "upper management above me" means the owner, who relies on me for my (limited) expertise. He could get better service (and I’d have less headaches) if he hired a DBA, a Sys Admin, an IT Manager, and keep me as a coder only, but the costs would be prohibitive.
quote:Originally posted by joechang
anyways, if SQL Server is crashing, open a MS PSS case, don’t waste your time with bozo’s with no real SQL Server skills

Hmmm… but part of the point of my post was that we pay them for support. SO, I was trying to determine if they should be relaying the cause of the coding errors to me or if I should be expected to handle that myself. In regard to MS PSS, if the cause is some scripting errors, will that incurr a cost by MS?

just because the co-lo is a large company does not mean they have technical competence,
the only guarantee is they will charge a lot for any work, if they show no skill in determining the true source of the error, i would not continue to engage the co-lo, tracing an error message to the true source is not simple matter, not in SQL Server, any other Database or even any other software,
in all cases, it will require solid understanding of what is happening underneath, and some skill i believe the MS PPS policy is, if its your fault, you pay, if it is their fault, you will not be charged
]]>