What is a Disaster Recovery Plan

The elements of a disaster recovery plan (also called a Business Continuity Plan) is a document that describes how an organization will survive a disaster. These can be natural disasters like hurricanes, fires, terrorist attacks, or any event that may prevent the business from operating. A good disaster recovery plan includes everything from how to replace lost personnel to how to relocate everything in the event that an entire building is lost. The plan includes human loss, product loss, customer loss, technology hardware loss and data loss.  This article will discuss only hardware loss and data loss, but it is important to think of the disaster recovery plan in the larger context of an overall disaster recovery plan. 

Why is a good disaster recovery plan important

When disaster strikes, time is critical, resources are stretched, and capabilities are limited.  Planning ahead for a disaster ensures minimum disruption by identifying everything that might be needed beforehand and ensuring that there is redundancy in each element. For example, a good disaster recovery plan will identify individuals responsible for restoring a database, and a backup to that individual in case the primary individual is lost. The backup individuals can then be trained properly to take action when it becomes necessary.  Once the crisis starts, it is too late to take these steps. 

Elements of a Disaster Recovery Plan

A Disaster Recovery plan includes many elements that help us be prepared in a crisis. The purpose of identifying all of these up front is to ensure that we have primary and backup human resources trained for each task that will be necessary to be performed in a crisis, and that that we have reliable backups in place of all physical and technical resources (applications, databases, servers, networks, buildings, vehicles, machinery) that will be required in order to stay in business or get back in business after a disaster. Some of the more critical elements of the plan follow. Since this is a database blog, the remainder of this article will be focused on applications and databases.

Scenarios

We want to enumerate as many possible disaster scenarios as we can in order to ensure a robust plan. As we describe each scenario in detail, we will find blind spots that we have and we will address them. The scenarios must describe what may happen, what that will look like, exactly what steps we will need take to get back in business, and exactly who will do them. Examples of technology related disasters:

  • Main data center is hit by extended power outage due to flooding damage to regional power grid
  • Infrastructure is hit with ransomware attack
  • Hurricane cuts connectivity to main data center
  • Human error causes loss of a large data table for mission critical applications.
  • Storage system firmware update causes corruption in production database

Inventory of applications (including dependencies on databases) 

Include nameless applications (reporting or analytical tools used against a data mart or data warehouse). Collecting this information on each application will help us know exactly who to call when disaster strikes, saving valuable time. Ensure that every known database is referenced here.

  • Application Owner
  • Recovery Time Objective
  • Recovery Point Objective
  • Responsible IT persons (primary and backup)
    • Application
    • Network
    • Cloud Infrastructure
    • Storage
    • Server
    • Database
    • Backup Maintenance

Test the Elements of a Disaster Recovery Plan

    • Test Procedures for each application in inventory
      • Identify systems to be used for test restore if applicable
        • Responsible party to provision these systems
      • Example Pre testing steps
        • Determine which applications/databases are in scope of this test
        • Gather data points to validate. This typically involves finding an example of both recently entered or modified data, and old data, to ensure that a full range of timeframes is represented and continues to be available after the recovery.
      • Example steps for conducting the test   — some or all of these may be applicable
        • Failover to backup database
        • Restore backup database
        • Point application to test database 
      • Example Post testing steps — some or all of these may be applicable
        • Validate the data points
        • Switch back to primary
        • Repoint the applications to primary database
  • Update the Disaster Recovery plan to reflect any lessons learned, staff changes, new, changed, or decommissioned databases, applications, or hardware.
    • Testing Schedule
      • When will tests be conducted?
        • Frequency — recommend minimum of twice per year.
        • What point in the quarter, month, week,
        • Time Of Day
  • Test Cases
    • Screens/reports to review
    • Data points to validate 
  • Responsible parties
    • Who will be responsible for conducting the test?
    • Who will be responsible for validating the results?

Living Document

As with many documents critical to our businesses, this must be a living document. This document contains names and contact information for key personnel that must be called in a time of crisis. It is critical that this document be updated regularly so changes in staff and responsibilities are reflected.  New applications and databases are added regularly as well, these must also be kept current.  Best practice is to update this document each time the tests are conducted.

Database Disaster recovery tools

One key aspect of the recovery plan from a database perspective will be the designation of a tool or tools to create standby databases that can be used in the event of a failure of the primary database. Most database tool vendors provide tools to do this. We will discuss the tools provided in Oracle for this purpose as well as a third party tool (Dbvisit). Future articles will describe DR options for SQL Server and other database.

Oracle Data Guard

Oracle provides a tool called Oracle Data Guard that can be used to configure and manage standby databases. Oracle Data Guard provides the capability to create and manage local or remote standby databases, and manage the transition from primary to standby and back. and it can create logical or physical standbys. At the center of Oracle Data Guard is a set of command line utilities entered at the Oracle console (SQL Prompt).  Oracle’s enterprise manager tool (Cloud Control) provides a graphical interface on top of dataguard and simplifies the use of the tool.  

Oracle Data Guard comes included as part of Oracle Enterprise Edition. A more powerful tool enabling greater use of standby databases is also available for enterprise edition called Active Standby. Unlike basic Oracle Data Guard, Active Data Guard has additional license fees.

DBVisit

Both of the Oracle Data Guard tools previously mentioned require Oracle Enterprise Edition.  There is no DR solution available from Oracle for Standard Edition. Fortunately, DBVisit offers a solution. Dbvisit provides the functionality to create and manage standby databases for Oracle Standard Edition. The tool offers a graphical user interface that makes creating and managing a DR solution for Oracle Standard Edition simple. And the licencing is much lower than the cost to upgrade to Oracle Enterprise Edition. If the only reason for needing Oracle Enterprise Edition are the DR capabilities of Oracle Data Guard, DBVisit is a good option. 

These are the Elements Of A Disaster Recovery Plan

In summary, a good DR plan should include everything about what an organization must do to recover from an emergency. This includes the who, what, when, where and how for the entire process from the moment that an emergency occurs to when the organization is fully recovered. 

If you would like to discuss creating and implementing a Disaster Recovery Plan, especially the Database Related components of your plan, give us a call and we can talk about the best approach.

Also, please leave comment with thoughts that you have about disaster recovery planning.  Let me know if you include things I didn’t mention. Or share stories about how a plan helped in a disaster, or how the absence of a plan hurt 🙁

And if you like this article, please share it with your colleagues and subscribe to our blog.