How 11GB of SQL Server Data Mysteriously Disappeared, and How I Ended Up Better Off than Before
The previous weekend, I had spent most of it working, upgrading a SQL Server 7.0 cluster to a SQL Server 2000 cluster. Fortunately, the rather complex upgrade went without a hitch. What made the upgrade complex was that the cluster held a 31GB ERP database that was used on a 24/7 basis, and unplanned down-time was not an option. After the conversion was over and fully tested, I breathed a sigh of relief.
During the following week, I closely monitored the cluster, looking for any sort of unexpected issues, but none developed. In fact, before I went home on Friday for a well-deserved break (after having worked the previous weekend), I checked out everything again, and the 31GB database was humming along just fine. The conversion was a total success.
Or was it? After a refreshing weekend, I came back to work on Monday. One of my daily rituals when I first get in is to check out the cluster to see how it is running, and if all maintenance jobs, and so forth, had successfully run over the weekend. The first thing I checked was to see if the nightly full backups had completed successfully. At first glance, they appeared to have run correctly. But I noticed one small difference. The backup file was on 20GB, not the normal 31GB.
I didn’t get too excited at first, I just assumed there was some problem with the backup. After some checking, I couldn’t see any problems. So next, I used Enterprise Manager to check the size of the data in the database. It said 20GB, not 31GB as I expected it to say. At this point, my life became just a little bit more stressful.
It was still early in the day, and nobody had called me to tell me they couldn’t access the database. I thought this was strange, as if the database had become corrupted and lost 11GB of data, then I should be receiving a lot of phone calls. So my next step was to contact the help desk, asking them about user complaints. There weren’t any. And that was mighty strange.
Next, I contacted my boss, letting him know of potential problems with the database, then I began investigating more. Still no calls from users. In fact, there were many users connected to the server, so apparently the missing data wasn’t causing a problem. This was just plain weird.
After more careful investigation, I still couldn’t find anything strange. The logs all looked OK, all the jobs had run, everything seemed normal, but something obviously wasn’t. The only thing I found that I guessed might have something to do with the missing 11GB of data was that for the first time since the upgrade from SQL Server 7.0 to SQL Server 2000, just over a week ago, was that the database reorg job had run. But I couldn’t see how this would cause any problems as we ran the reorg every weekend ever since the database went into production over two years ago.
One of my first thoughts was that SQL Server 2000 might do reorgs differently than under SQL Server 7.0, but I researched this in Books Online and at Microsoft’s website. There was no indication of any change like this, besides, if there was, I couldn’t imagine that 11GB of data would just disappear.
Being at a loss, I decided to call Microsoft’s Technical Support line to see if they could offer any light on the subject. The first support person didn’t have the slightest idea how this would be possible and had to bounce me up to the next tier of support. This person also didn’t have any experience with such an issue, but asked me to run a number of DBCC commands on the current database (and an older copy that I had not done a reorg on) to find out more about the data’s structure before and after the reorg. I did this and sent him the results.
While I was waiting for the results, I performed an experiment. I had another copy of this 31GB database (used for testing) that had also been converted from 7.0 to 2000, but I had yet to run a reorg on it. So I decided to reorg it to see what would happen. Once it completed the next day, 11GB was also missing from it. So at least the problem appeared to be related to the reorg, and I could duplicate the issue.
That same day, I heard back from the Microsoft support representative. While he said he couldn’t provide an exact cause of this issue, he had an educated guess, which seemed reasonable. He said that it appeared that when the database was running under SQL Server 7.0, that for some reason when a data page was emptied (such as when data is deleted), that the data pages were not released as they should. In other words, as data pages became empty, SQL Server didn’t know this and did not reuse them. In fact, SQL Server still considered these empty data pages as part of the database.
If this was true, and I have no reason to doubt it, then when we upgraded to SQL Server 2000 and then ran a reorg, SQL Server 2000 was smart enough to find the empty data pages and get rid of them. In doing so, it freed up 11GB of empty data pages. As you might imagine, by getting rid of 11GB of empty space, disk I/O performance was greatly boosted, and overall performance of the server went way up, which provided a very happy ending to a story that could have been tragic.
If you are running SQL Server 7.0, could this be happening to you? Possibly. Since we were unable to exactly figure out why the data pages under SQL Server 7.0 weren’t being released as they should have, it is hard to know how common this problem is. But in our experimentation, we did find a way for you to figure this out for yourself. Here’s what you have to do. Be sure you perform these steps when the database is not too busy, as the following steps can use up a lot of resources when running, and run for a long time, especially if your have a large database.
If you haven’t done so recently, reorg your SQL Server 7.0 database.
Use Enterprise Manager to find out how much data is in your database. This is the amount of data in the database, not the total size of the database (which includes empty space).
Next, run this command: DBCC UPDATEUSAGE (‘databasename’), where databasename is the name of the database. The official use for this command is to report and correct inaccuracies in the sysindexes table, which may result in incorrect space usage reports. Apparently, it can also fix the problem of unreclaimed data pages.
Reorg the database.
Again, check the size of the data in your database. If the size is the same as before (or very close), then you are not affected by this issue. But if the data in the database is now smaller, then you have been affected by this problem, and it is now corrected.
Also, if your data did shrink, you may want to shrink the database file in order to reclaim some unnecessary free space in the database file.
While this experience was somewhat stressful, in the end, things worked out well, and now my database is faster than ever. If you have a large and slow SQL Server 7.0 database, you may want to consider taking the steps I have suggested above. They can’t hurt anything to run (although they do eat up server resources when running), and you may be able to reclaim wasted space, which can breath new life into its performance.