What steps are there to creating a disaster recovery plan?
Based on my experience, it is best to start with a small plan and begin the process with future goals in mind that are tested and implemented on a predefined schedule. For example:
Determine the business needs for availability as well as the corresponding budget for the solution.
Determine disasters to prevent as well as recover from, including mistakes, hackers, hardware failure, and database corruption.
Plan the data collection disaster recovery process.
Document the environment with a standardized set of documents relating to the hardware, software, application, and personnel.
Develop a key contact list and escalation levels.
Develop a media kit that has all the necessary software versions and service packs.
Standardize server configurations across servers.
Have backups readily available on disk (Look at SQL LiteSpeed for smaller and faster backups).
Have spare hardware or servers available.
Communication plan for the recovery process.
Testing the plan to ensure success.
Implement the solution.
Re-testing and revising the plan as needs change.
Look at third party tools that can assist with collecting server information (BindView for SQL Server and NetIQ ConfigurationManager for SQL Server).
Look at third party tools for restoring lost data (Lumigent Log Explorer).
Lastly, keep documentation up to date.
How big a part does documentation play in a disaster recovery plan?
Documentation is one of the key components to having a successful disaster recovery process. Without documentation it is very difficult to perform a planned recovery. What happens in most instances is that the recovery process is handled in a fire-fighting mode. Several actions are taken to fix the problem at hand, without knowing what fixed the problem, or possibly creating subsequent problems. Too often systems documentation has not been a priority and is not in useable format for most DBAs. Most of the time DBAs rely on personal experience, past emails, and fast Internet searches in order to recover.
Parts of a comprehensive DR document include the following:
Contacts and escalation list
Software versions and service packs applied
Server names and IP addresses
Priority order of servers and applications for recovery