You’ve spent thousands of dollars on that cool technology; clustering, redundant controllers, redundant disks, redundant power supplies, redundant NIC cards, multiple network drops, fancy tape backup devices, and the latest and greatest tape technology. You’re all set. There’s no way your going to have downtime.
But one day something does go wrong, is it the disks? No way, you’ve implemented RAID with hot swappable hard drives. Is it the server? Can’t be, you’ve got your servers clustered and any downtime due to failover would be so small that hardly anyone would even notice it. Well, if it’s not your stuff, it must be the network. Those guys are always making changes and not letting you know about it until there’s a problem. No, checked with them, no changes. What’s left? It must be the application. Once again the application group rolled out a new version and no one informed the DBAs. Once again foiled, the application team says no changes went out.
A little investigation on the database and you’ve noticed that some of the create dates on a few stored procedures have yesterday’s date. I guess they could have been recompiled, but you didn’t do it. You take a look at one of the procedures and low and behold, someone was kind enough to actually put in comments. It was Dave, one of the DBAs in your group, and guess what, he’s on vacation this week. It turns out he created a scheduled job to run this past Sunday and forgot to let you know about it. He changed a bunch of procedures for a load that occurs each month. You don’t know much about the load except that it runs and he is responsible. You have no idea why the change was made and if you undo the change what affect it might have. To make things worse you try to find the old code, but you can’t find it anywhere.
The heat starts to rise as the phones keep ringing with people wondering why the database is not responding. You start to panic, trying to figure out what to do. This is a pretty critical database and a lot of people in your company need it to do their job. The last time something like this happened, you caught hell from your boss because her boss was breathing down her neck. Now you wish you were the one on vacation instead of Dave. You take a deep breath and think things through.
You remember Dave sent you an email about the load around this time last year when he went on vacation. You quickly do a search and you find the email. The email gives you steps on how to undo the load and what if any consequences you may face by undoing things. You go to a previous nights backup, do a database restore and script out the procedures. You’re taking a gamble that you’ve got the right procedures, but that’s your only course of action. After five or six hours of user complaints and a lot of sweating, you’ve got the database back to normal again, or at least you think so. You say to yourself, “I need a vacation and Dave’s dead meat when he gets back.”
Have you ever found yourself in this situation? Something gets changed and you don’t know about until there’s a problem. Or someone makes a change and says “Don’t worry, it’s a small change. No one will even notice.” I think we have all found ourselves in these situations. The only way to fix things like this is to bolt down your servers and make the users sign their life away if they want to use your server. Not too likely, but it’ll work if you could get it implemented.
I think we need to look for a solution in the middle of the road. Something that works for you as a DBA and something that works for the rest of the world. People just don’t understand how stressful your job really is. You’re the bottom of the totem pole, well maybe the NT Engineers are the bottom, but still you’re pretty close. All types of changes occur outside of your control and the only time you are aware is when something goes wrong.
Well you might not be able to fix the things outside of your control, but you are the DBA, the master of the databases. In order to implement change control company-wide it takes a lot of effort, coordination, and buy-in from a lot of people. But that doesn’t mean you can’t start with your own area and put control mechanisms in place for the databases. So where do you begin?
For most changes to take affect and have a quick payback, implementing things slow and steady is the way to go. Identify a critical database, kind of like the one Dave screwed up, and start to create guidelines around how you would handle changes for this database. If you try to make a master plan to solve all of the problems, you will never get it implemented. Create a small project plan, or a task list of things you need to accomplish, and take care of one item at a time. If something doesn’t work, adjust the plan accordingly.
Evaluate and Tighten Database Security
There are probably other reasons why you want tight database security, but have you ever thought about it from a change control perspective? The greatest process in the world won’t prevent people from sneaking things into production. You might be able to catch and find out who did it, such as the change that DBA Dave did above, but at that point it’s too late. Take a look at your security settings and determine ways people might be able to make changes to your database that they shouldn’t. Start with the server roles and see who has access that really shouldn’t. Then take a look at the database roles. If you are using the standard database roles, see how you can create customized roles with limited access. Usually when things don’t work due to security access, the fix is to grant too much access. Now it’s your turn to reverse the tide.
Establish a Release Schedule
Instead of making production changes on the fly, make changes on a periodic controlled basis. If people know that changes are to be made on a schedule you set, they can adjust their schedule accordingly. It doesn’t mean that you can never put changes into production outside of this schedule, it just means that somebody better have a really good reason why they need something immediately instead of waiting for the scheduled release. This may sound like it will slow down the development process, and you know your users need those changes right away, but having a more controlled approach will actually benefit your end users as well. The users will know what’s coming and when it’s coming instead of finding out about a change after something that use to work no longer works.