Recurring inconsistencies – SQL2000 on Win2003. | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Recurring inconsistencies – SQL2000 on Win2003.

I’m at my wits end…. please help.<br /><br />I have posted a similar entry before, a few hardware upgrades were done (not to certain exactly what, seeing that we are not involved in the server administration… stuff regarding hyperthreding, etc), but the problem is still occurring (4 times on the same instance, and same db already, in a months time). We are running 5 instances of SQL Server2000 (8.00.760) on a consolidated server (8 physical processors, 8 virtual processors) in a SAN. MDAC version 2.8, Emulex 9002 cards (I think), and FastT900 disks.<br /><br />As precaution, TornPageDetection is switched on on all the db’s, and we are running integrity checks on the db’s every evening, and last night it fell over for inconsistency errors again. The following message was written to a text file by die Integrity Check job: <br /><br />[4] Database Genome: Check Data and Index Linkage…<br />[Microsoft SQL-DMO (ODBC SQLState: 42000)] Error 8928: [Microsoft][ODBC SQL Server Driver][SQL Server]Object ID 309576141, index ID 12: Page (1:8963<img src=’/community/emoticons/emotion-11.gif’ alt=’8)’ /> could not be processed. See other errors for details.<br />[Microsoft][ODBC SQL Server Driver][SQL Server]Table error: Object ID 309576141, index ID 12, page (1:8963<img src=’/community/emoticons/emotion-11.gif’ alt=’8)’ />, row 166. Test (ColumnOffsets &lt;= (nextRec – pRec)) failed. Values are 3748 and 24.<br />[Microsoft][ODBC SQL Server Driver][SQL Server]CHECKDB found 0 allocation errors and 2 consistency errors in table ‘LinkThree’ (object ID 309576141).<br />[Microsoft][ODBC SQL Server Driver][SQL Server]CHECKDB found 0 allocation errors and 2 consistency errors in database ‘Genome’.<br />[Microsoft][ODBC SQL Server Driver][SQL Server]repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (Genome ).<br /><br /> The following errors were found:<br /><br />[Microsoft][ODBC SQL Server Driver][SQL Server]Object ID 309576141, index ID 12: Page (1:8963<img src=’/community/emoticons/emotion-11.gif’ alt=’8)’ /> could not be processed. See other errors for details.<br />[Microsoft][ODBC SQL Server Driver][SQL Server]Table error: Object ID 309576141, index ID 12, page (1:8963<img src=’/community/emoticons/emotion-11.gif’ alt=’8)’ />, row 166. Test (ColumnOffsets &lt;= (nextRec – pRec)) failed. Values are 3748 and 24.<br />[Microsoft][ODBC SQL Server Driver][SQL Server]CHECKDB found 0 allocation errors and 2 consistency errors in table ‘LinkThree’ (object ID 309576141).<br />[Microsoft][ODBC SQL Server Driver][SQL Server]CHECKDB found 0 allocation errors and 2 consistency errors in database ‘Genome’.<br />[Microsoft][ODBC SQL Server Driver][SQL Server]repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (Genome ).<br /> ** Execution Time: 0 hrs, 1 mins, 26 secs **<br /><br />Repairing the problem is not my issue, but solving it, that is. Any suggestions please???<br /><br />Thanks.<br />
My gut feel says you need to start checking out somewhere like your emulex card drivers, BIOS level’s on cards etc. Most of the time I’ve heard about this, it seems to have been a case that the OS has believed that the data has been written, and tells SQL that data has been written, but when the data is written from the cache on the card, there is a problem. Good luck with this … everyone who I’ve seen with this has had a LOT of fun with it (not!) Panic, Chaos, Disorder … my work here is done –unknown
What type of hardware environment you’re having currently?
What is the level of service pack on the SQL Server? When non-clustered indexes are created with the CREATE INDEX, ALTER TABLE, or DBCC DBREINDEX statements by using a parallel plan, pages may be used in the index that were not allocated. Use of the unallocated pages may lead to one or more of the following error messages as reported by DBCC CHECKDB or CHECKTABLE Change the configuration value for the "max degree of parallelism" option to 1 to avoid the use of a parallel plan to build an index. Satya SKJ
Moderator
http://www.SQL-Server-Performance.Com/forum
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
We had a problem with this on an EMC SAN with an HP server. We ended up failing back to Windows 2000 and haven’t had a problem since then. We’re waiting until the first service pack. We’ll test in development for a couple months after that.
MeanOldDBA
[email protected] When life gives you a lemon, fire the DBA.
Wow … failed back to Win2K, no hardware changes at all, and problem solved? That’s kinda scary – wonder if anything was raised to MS? Panic, Chaos, Disorder … my work here is done –unknown
We had Microsoft, EMC, and HP working on it consistently for over a month. We had a SQL Server/Windows engineering team working directly with HP and Microsoft. MeanOldDBA
[email protected]mail.com When life gives you a lemon, fire the DBA.
Thanx Derrick, Was there ever a bug number assigned to this? Would be interesting to track this… CiaO Panic, Chaos, Disorder … my work here is done –unknown
Let me find out our tracking number. It might take me a couple days. It’s been a few months. Have you opened up a Microsoft Support Ticket though? Also, I’m curious. We had an 8 processor HP DL760 hyper-threaded. What kind of server do you have? MeanOldDBA
[email protected] When life gives you a lemon, fire the DBA.
Derrick, I was actually asking from an interest point of view. I’ve just moved to a new company – I was working at a site (I am 99% certain that SQL_Girl above is working at the same site) that had a problem exactly as described above. When I was last involved, there was no MS Support ticket raised – I think that having your tracking number might help them there, though they have to go through a vendor to log things for MS 🙁 That server, If I am correct, is also an 8-way hyperthreaded box – I *think* it is an IBM X440 – SQL_Girl, can you confirm? At the new site I am at, one of facilities guys mentioned having a similar experience when they moved to win2K, and it turned out to be emulex driver’s – SQL_Girl, have they looked at those? Anyway, I’ll be interested in tracking this one – kind of important for any of us who are lloking at win2K3 …. CiaO Panic, Chaos, Disorder … my work here is done –unknown
Yes SQL_Guess, I am working at your old site. <img src=’/community/emoticons/emotion-5.gif’ alt=’;-)’ /> And, confirmed, it is an IBMx440. We have raised the question about the emulex drivers with our server admin team. They’ve in the meantime changed the zoning on that box (going through one HPA instead of 2 now), and with that change they discovered that the HPA’s and firmware are not on the newest levels. They are implementing the newest levels on Sunday. We’re also getting together with Microsoft and IBM tomorrow to try and sort out something. Will keep you all posted. <br /><br />Derick, no luck on that tracking number yet?<br /><br />Thanks all.
No, I haven’t had any luck at all tracking it down. I would be interested in seeing what upgrading your cards does though. I’m not sure I would be comfortable with only having one card zoned though on a production box. That’s kind of scary. MeanOldDBA
[email protected] When life gives you a lemon, fire the DBA.
Sorry for digging up an old thread, but the applies to what we are experiencing. Did the firmware upgrades fix the issue? We are also having a lot of errors almost daily now. We are at the highest patch level for SQL Server 2000 as well as for Windows 2003. Appearantly, we have a new update for our SAN and BIOS that needs to be installed. We are not experiencing any issues like this on our Windows 2000/SQL Server 2000 machines.
If you see the most of errors related to the hardware also then upgrading the FIRMWARE should do some kind of resolution to stop these errors. Check event viewer also for information. Satya SKJ
Moderator
http://www.SQL-Server-Performance.Com/forum
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
mpex, we’re still on the 2000 Advanced Server rollback. I have no intention of moving us to 2003 until after the first service pack is released now. Are you on and EMC? What kind of server do you have? MeanOldDBA
[email protected] When life gives you a lemon, fire the DBA.
We are running a Proliant DL740 and it appears that we also have a Emulex LP950 adapter. We plan on installing a Firmware update in the near future, but was wondering if SQL_Girl saw any improvements since that time. It however sounds like we have similiar hardware Derrick with the exception of EMC.
Hmmm – I’ve popped a mail off to SQL_girl to let her know about the activity here, and hopefully she will come back with some feedback. Panic, Chaos, Disorder … my work here is done –unknown
Hi there.<br /><br />After strugling with this issue for months, and upgrading every piece of hardware, hammering the db’s, monitoring, etc, we finally logged a call at Microsoft. Within a day they came up with a "solution". <br /><br />See description below: <br /><br />"<br />Checking the boot.ini file I see you are also running the /PAE switch have a look at this article below ill send the link to the ftp in a moment to download the fix, we have seen some corruption issued caused by this bug , I would get the Platform guys who deal with the OS to install this patch,<br /><br />Data is corrupted when PAE is enabled on a Windows Server 2003-based WGID:583<br />ID: 834628.KB.EN-US CREATED: 2004-01-09 MODIFIED: 2004-05-18<br /><br />Public | kbAudOEM <br /><br />===============================================================================<br />——————————————————————————-<br />The information in this article applies to:<br /><br /> – Microsoft Windows Server 2003, Web Edition<br /> – Microsoft Windows Server 2003, Standard Edition<br /> – Microsoft Windows Server 2003, Enterprise Edition<br /><br />——————————————————————————-<br />SYMPTOMS<br />========<br /><br />When you run Microsoft SQL Server on a Microsoft Windows Server 2003-based computer, data that is saved to the SQL Server database may be corrupted. <br /><br /> When you view the transaction log file, one or more log entries in the file may be filled with a string of zeros. The string of zeros is exactly one record long and is not cache-aligned. <br /><br />CAUSE<br />=====<br /><br />This problem may occur if you use the Intel Physical Addressing Extension (PAE) specification to support more than 4 gigabytes (GB) of installed memory in your computer. This problem occurs when a Page Table Entry (PTE) is in the process of having its physical address changed, and only the low-order word has been filled in when another processor begins using this page. To prevent a PTE from being used before its complete physical address has been assigned, the hotfix that is described in the "Resolution" section inserts a memory barrier instruction at the end of the PTE address update sequence. <br /><br />Memory corruption is not specific to SQL Server, and it may occur when you run other memory-intensive programs on a PAE-enabled system that has more than 4 GB of memory installed.<br /><br />RESOLUTION<br />==========<br /><br />Hotfix information<br />——————<br />A supported hotfix is now available from Microsoft, but it is only intended to correct the problem that is described in this article. Only apply it to systems that are experiencing this specific problem. This hotfix may receive additional testing. Therefore, if you are not severely affected by this problem, Microsoft recommends that you wait for the next Windows Server 2003 service pack that contains this hotfix.<br />"<br /> <br />Well, it seems like this hotfix has solved the problem. After applying it we have not have any data inconsistancies on this box (last one occured on 25 May 2004).<br /><br />Our reference number is: SRQ040602600856<br /><br />Hope this helps, trust me I know how frustration this can be!! <img src=’/community/emoticons/emotion-5.gif’ alt=’;-)’ />
Well, that’s UGLY. Guess we have another good reason to follow Derrick’s advice, and hang off of Windows 2003 until SP1 !! Glad to hear you are past this pain, SQL_Girl Panic, Chaos, Disorder … my work here is done –unknown
Another reason to NEVER have production running on a system that hasn’t had at least the first service pack released. <img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ /> I can’t understand why people think I’m crazy when I say this. Oh well.<br /><br />MeanOldDBA<br />[email protected]<br /><br />When life gives you a lemon, fire the DBA.
]]>