Learn About SQL Server Disaster Recovery from Greg Robidoux of Edgewood Solutions
How do you identify the biggest areas of vulnerability for a SQL Server?
SQL Server used to be for departmental applications where not much support was given to these groups by the IT department, and therefore downtime or re-keying of data was an acceptable process. As SQL Server has moved into the enterprise-wide application space, it is extremely important that companies implement a comprehensive recovery plan for their SQL Servers.
Most companies have spent a lot of money on redundant hardware, power and room conditioning, but that is where things usually stop. I guess the feeling is that if the hardware doesn’t go down there is nothing else to worry about. If companies don’t plan to implement a full disaster recovery process, the one area that I would stress for them to address is their Change Management process. Besides hardware, failures due to application code and human error make up the next biggest segment for downtime. With a sound Change Management process, this area of downtime could just about be eliminated.
How do you suggest DBAs test their disaster recovery plan?
The threat of a total disaster, which would require a hot site to be utilized, is not that common. The real threat people should think about is isolated disasters and recovery from these incidents. These consist of dropped tables, major updates that cannot be reversed, bad code that was moved into production, etc. Since these types of things occur more frequently, DBAs that do not have any type of DR plan in place should begin with things they can fix themselves. Once these items are in place and documented, DBAs can work with other IT and business departments to institute a more comprehensive solution that crosses the enterprise.
Testing should occur as much as possible. The more testing that occurs and the more rehearsals that are performed, the better the plan will be when it needs to be called into action. Scenarios should be identified based on past problems that may have occurred and plans should be written to address these issues. As for equipment, it would be nice to simulate the production environment, but where budgets are an issue, a desktop PC could be put in place as a test machine. It is pretty inexpensive to implement a small test server, and if a company is dependent on the availability of their database servers, then the DBA has to sell management on the importance of test servers.
What tips and suggestions can you offer for DBAs for backing up their data in order to ensure that they can recover their data?
Along with executing backups, DBAs need to perform database restorations to ensure the backups are valid. I recently published an article called “Backup and Restore – Back to Basics with SQL LiteSpeed”, that is available on our web site www.edgewoodsolutions.com, about backup and restore procedures, as well as features that SQL LiteSpeed offers in terms of speed, disk savings, verification and encryption. Here are some general tips from the article:
Backups should occur to disk first for faster backups and restores
Use full recovery mode to eliminate data loss
Use a combination of full, differential and transaction backups
Periodically execute a full system restore to make sure the plan is successful
Do you recommend any specific hardware configuration (cluster, hot server, RAID, etc.) for SQL Servers in order to reduce potential vulnerabilities?
The more redundant the environment, the better the vulnerabilities are addressed. In most hardware configurations, technical staff has spent sufficient resources on redundant hard drives, controllers, power supplies, network cards, etc. In addition, they spend a lot of time configuring replication, clustering or standby servers, but the one aspect most companies fail to implement are the Change Management processes that should correspond to the hardware investment. Unfortunately, the greatest hardware cannot prevent someone with the appropriate privileges from changing a configuration or migrating new code into production. These processes are as important as the hardware environment from an availability perspective.
The ideal hardware configuration really depends on the type of environment and the need for availability. There are several of options that could be put in place for this ideal setup, but cost is usually the prohibiting factor. DBAs need to determine what is acceptable downtime, and the hardware configuration should be adapted to that need. The one thing I can say is ideal, is having a standardized hardware and SQL Server configuration across severs.
If the worse occurs, and the physical facility goes away. What is the best way to prepare for such an event? And how do you recover from such event?
In the scenario you have explained, it is necessary to address these items in the discovery and planning stages of the disaster recovery project. It is always best to plan for these types of failures rather than scrambling once the failure has occurred. Based on the criticality of the data, it may be necessary to have a hot site with employees in this different geographical location. Typically, if a site has been destroyed, then staff in the impacted area may not be able to address the business needs due to transportation issues or additional priorities. As such, having a separate facility with staff may be the best bet.
In a total site loss, all systems cannot be recovered simultaneously; therefore it is very important when planning your disaster recovery to have a priority list of servers and applications, so these can be addressed first. In addition, there are also other servers that may contend for the same resources, so for this type of scenario site wide planning and documentation are very important.