Bad day at the office! | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Bad day at the office!

My Data Warehouse load started taking 2+ hours longer than usual the last couple of days. Normally when this happens I just re-boot it and it’s fine for another 4 or 5 weeks or so. It’s a quick, easy solution and saves me spending hours and hours troubleshooting it.<br /><br />So this morning I rebooted it (after warning users). ever since then it’s been SICK. Can’t log on to the desktop using Domain credentials, weird error messages all over the place when trying to do network-related things (e.g. changing Netbios name), me and a colleague tried removing it from the domain, that wouldn’t work (another weird error – which I don’t have to hand as I’m home now), you can rename it ON the domain but can’t take it out into a workgroup then re-add it.<br /><br />Maybe a WINS database corruption? We applied those new WINS patches to our domain controllers today [still on NT 4.0 DCs <img src=’/community/emoticons/emotion-6.gif’ alt=’:(‘ />]. Anyway I had to come home and leave it, my boss was having a look remotely.<br /><br />Don’t know if anyone’s got any ideas? I have checked for viruses – clean. I’m going to have to try to forget it over the whole weekend and face it on Monday (not pleasant).<br /><br />It’s Windows 2000 Standard Edition SP4 + patches and SQL Server 2000 Standard Edition SP3a + hotfix.<br /><br />Tom Pullen<br />DBA, Oxfam GB
Certainly sounds like it could be an issue with WINS, consider giving MS a call. In the meantime, check your network card properties to make sure there are no bad/dropped TCP/IP packets. Maybe replace the network card too if you can Good luck!
Yeah I was ready to phone MS but in our sector you have to check and double check before getting the corporate credit card out, if you get my drift.. If it’s still not playing monday, I will get straight on the phone. Thanks for the reply.. Tom Pullen
DBA, Oxfam GB
Sound network card issue to me.
Luis Martin
Moderator
SQL-Server-Performance.com All postings are provided “AS IS” with no warranties for accuracy.
yes but the network card is working fine – you can ping, you can net view, you can map drives using domain credentials, it’s just certain things don’t work. it’s quite perplexing.. it actually has 3 NICs, 1 for public network, 1 for backup LAN, 1 for private switched LAN to source servers for the load, etc. it seems more software-related to me than anything.
Tom Pullen
DBA, Oxfam GB
How about last drivers NIC’s? Luis Martin
Moderator
SQL-Server-Performance.com All postings are provided “AS IS” with no warranties for accuracy.
yeah they’re up-to-date, thanks for the suggestion. One idea is to switch the backup LAN card to be the public one and vice versa, but as I’ve said, I think it’s OS/Domain related, I think the NIcs are fine as some basic network things work.. Tom Pullen
DBA, Oxfam GB
Hi Tom, I’d suggest that you chec your DC’s error logs for any messages, enable additional auditing if you need to see more info. What message do you get when trying to log onto the desktop with a domain account? Cheers
Twan
Twan, thanks – I am at home so don’t have all the error messages to hand, it’s a series of seemingly spurious errors (e.g. computer’s domain account is missing, etc.) My manager has been fighting with it from home most of the day and his diagnosis is a W2K re-install… with which I agree – a clean start. We can leave the data partitions in tact so recovery will be relatively painless. At one stage he phoned me and he thought it was fixed, while he was on the phone, he tried to restart SQL Server (running under a domain account) and it failed with login failure. It’s just a series of these kind of errors… The DCs look fine and there are no issues with any other servers on the domain so, we kind of conclude, it’s a local OS-related issue, reasonable, no? Thanks for all of your help and suggestions, and I will update you all with news before we go away for 4 days’ rest next weekend (for me, it can’t come too soon…) Tom Pullen
DBA, Oxfam GB
Personally, I’ve never seen local OS cause this. If they just changed the WINS and you’re having trouble with that, that would be the logical thing to troubleshoot. Have you tried dropping the computer completely from all WINS and DNS servers, then reapplying? That would be the logical path in this instance. Also, why are you still using WINS? It’s never been reliable. I don’t understand why people still use it. MeanOldDBA
[email protected] When life gives you a lemon, fire the DBA.
Derrick, good questions.. and the short answer is, I don’t know. Our network guys are a law unto themselves… Tom Pullen
DBA, Oxfam GB
&lt;RANT&gt;<br /><br /><img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ /> That just bugs the fire out of me. The first response of network guys is to always say "it’s not us". You then find out it is, or work around the issue until you have things working. They should never be a "law unto themselves". If they are, someone should fire your IT management.<br /><br />&lt;/RANT&gt;<br /><br />MeanOldDBA<br />[email protected]<br /><br />When life gives you a lemon, fire the DBA.
Derrick, I feel your pain. But I am a minion, not a fromage grande… Tom Pullen
DBA, Oxfam GB
Hi ya, I must concur with Derrick that this doesn’t feel like a local OS problem… Have you tried:
– reset the domain machine account
– remove the sql server from the domain, delete the machine account and then re-add it
– when logged into the sql server you can ping the dc, and resolve it’s name to an ip have you turned on the sucess/failure auditing of account logons on the DCs and then check to see what messages are shown? Cheers
Twan
Twan, no – I haven’t looked at the DC security logs yet. I had to leave on friday and haven’t been back yet. – you can’t remove it from the domain, only rename it ON the domain, and that throws errors. We tried manually deleting the computer account from the Domain (on a DC), resynched the domain, then re-added it manually (from the DC, not the sql server), still same problems
– my manager tried some hack he found on the internet to reset the SID for the machine account, this didn’t work either
– you CAN ping the PDC by name, so name resolution from the sqlserver doesn’t seem to be an issue The symptoms don’t seem like a local problem but if it isn’t, why isn’t any other server having the same problem, and why do the spurious errors only get thrown up locally and not anywhere else? Thanks for ideas..
Tom Pullen
DBA, Oxfam GB
WINS database is corrupt. I would almost bet money on it. MeanOldDBA
[email protected] When life gives you a lemon, fire the DBA.
Tom<br /><br />If you get a chance post the error messages and hope this festive week will bring the joy back…. for your OLAP users [<img src=’/community/emoticons/emotion-2.gif’ alt=’:D‘ />]<br /><br /><hr noshade size="1"><b>Satya SKJ</b><br />Moderator<br /<a target="_blank" href=http://www.SQL-Server-Performance.Com/forum>http://www.SQL-Server-Performance.Com/forum</a><br /><center><font color="teal"><font size="1">This posting is provided “AS IS” with no rights for the sake of <i>knowledge sharing.</i></font id="size1"></font id="teal"></center>
Example error message from Friday (when trying to remove the server from the domain) Network Identification The following error occured attempting to unjoin the domain "xxxxx". The specficed service does not exist. No Error Number, etc. Still struggling with it! Tom Pullen
DBA, Oxfam GB
Sounds like a group policy is blocking this. Check your local security policies.
Are you using Active Directory, if so then check:
Settings,CN=<servername>,CN=Servers,CN=<sitename>,CN=Sites,CN=Configuration,DC=<domain>… Satya SKJ
Moderator
http://www.SQL-Server-Performance.Com/forum
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
No, it’s an NT 4.0 domain. It is now fixed. repair of OS booting from W2K CD fixed it. All data intact and no need to reinstall SQL Server. Almightily relieved DBA in the house! Tom Pullen
DBA, Oxfam GB
Hi Tom, are you able to send a list of the disabled services on both the SQL box and the DC? Cheers
Twan
quote:Originally posted by thomas
It is now fixed. repair of OS booting from W2K CD fixed it. All data intact and no need to reinstall SQL Server. Almightily relieved DBA in the house! Tom Pullen
DBA, Oxfam GB

Congrats Tom. Glad to hear that it’s finally solved. Regards.
quote:Originally posted by Twan Hi Tom, are you able to send a list of the disabled services on both the SQL box and the DC? Cheers
Twan

yes twan will do when I get a second. got downtime on live financial OLTP server 3pm – 7pm to move a couple of large tables to their own partition, do some memory tuning and try to get missing SQL Server perfmon counters back, so in all i’m a bit busy, but damn glad my DW is working again! Tom Pullen
DBA, Oxfam GB
Hi Tom, if your server is working now, then don’t worry about sending the list of disabled services… presumably a repair would have reset those back to defaults anyway… Cheers
Twan
yes indeed. had more fun & games with this server today, the network connection (public LAN) kept disconnecting and interrupting user queries. arrghhhh.. why me? I got it re-patched into a different port on the switch and it seems to have fixed it. do any of you ever get the feeling that it never rains but it pours? Tom Pullen
DBA, Oxfam GB
Hm, a bad network card perhaps? Or a bad one that is not actually in use but is still disturbing the system?
i have had those thoughts myself but … the intermittent nature of it makes me suspect it’s not the card. my belief (maybe wrong, but..) is that network cards tend to be like processors, i.e. they either work or they don’t, there’s not much "intermittency" likely to happen. that’s my boss’s opinion, and he’s pretty hardware-savvy.
Tom Pullen
DBA, Oxfam GB
speed and duplex settings on the server and switch port can sometimes cause intermittent problems… check that they’re either both set to auto or both hard fixed to 100Mb/full or 1000Mb/full Cheers
Twan
yes the public one is set to 100/full duplex and the private ones are auto-detect as directed by our network men. they say that these issues (which have also intermittently affected other servers) may indicate a requirement for the edge switches to be rebooted to clear the arp tables, etc. Sounds like a good idea although i must admit when it comes to networks most of it sounds like a load of old ‘arp to me anyway. Boom boom! Tom Pullen
DBA, Oxfam GB
forgive me but arp tables are a layer 3 feature and therefore wouldn’t be on a switch, but would be on routers? switches have cam tables to map ports to mac addresses, but these tend to be very dynamic and tend to not get out of date (in my experience) doesn’t sound quite right to me… cheers
Twan
these things are "passports" which operate as both switches AND routers, allegedly. Tom Pullen
DBA, Oxfam GB
Thomas,
I’m not an expert in network issues, but if it were me, I would really be pushing for trying a different network card. I do know of at least once in our shop that a flaky network card had intermittant problems. Its certainly not unheard of for a circuit board to have heat related problems, or a broken solder joint. Steve
thanks, steve. if these issues persist, i shall most def be doing just that. Tom Pullen
DBA, Oxfam GB
]]>