Every business needs proper IT disaster recovery planning as part of its overall business continuity plan. You might think you don’t need to plan for disaster recovery because disasters like floods and fires are so uncommon. But the top reason for IT disasters and significant unplanned downtime is human error: clicking a malicious link, accidentally deleting critical data, misconfiguring systems causing failures, etc. Other top IT disaster causes include commonplace occurrences like software bugs, hardware failures, power outages, and cyberattacks (including insider threats). Natural disasters are at the bottom of the list.
If you don’t have an IT disaster recovery plan, your recovery attempts will be chaotic and take much longer than if your team follows a well-considered, up-to-date plan that they are at least somewhat familiar with. Just a few hours of downtime impacts employee productivity and customer experience, leading to lost revenue and reputational damage. Days of downtime can threaten an SMB’s ongoing viability.
What does proper IT disaster recovery planning look like?
An IT disaster recovery plan describes your IT disaster recovery strategy and key goals and lists your IT disaster recovery procedures. Its purpose is to help your company take appropriate action to quickly recover IT operations and protect data. That means bringing critical systems back online in the right order, taking steps to reduce or eliminate data loss, and minimizing downtime for physical server hardware, user workstations, databases, etc.
A typical SMB IT disaster recovery plan has at least these core elements:
- Description of recovery goals, such as the recovery time objective (RTO) for each critical IT system, and the recovery point objective (RPO), which is the maximum amount of acceptable data loss as a function of time. You also need to define what constitutes a disaster and when to invoke your IT disaster recovery plan.
- Recovery team assignments who is responsible for carrying out the IT disaster recovery plan, and who are their backups if they are unavailable? How will they be notified when a disaster occurs? Where will they work, how will they communicate, etc.?
- IT system recovery procedures. These are your emergency response steps, including what systems will be recovered first and in what order.
- Incident response procedures to contain the damage from ransomware and other malware (e.g., isolating infected systems).
- Data backup/recovery procedures. Exactly how and where is each key data resource backed up, and what are the corresponding recovery steps?
- IT asset inventory a detailed list of hardware and software/application assets, including their criticality rating, where they reside (on-premises, cloud-based), and whether they are owned, leased, purchased as-a-service, etc. You can’t recover what you don’t know exists.
- Testing procedures. Regular testing and updating of your IT disaster recovery plan is key to efficient, successful recovery. This includes tabletop exercises and read-throughs of the plan, not just full-on simulations. A best practice is to review your IT disaster recovery plan quarterly or twice annually to keep it current with your ever-changing IT environment, staffing, and business goals.
- Disaster recovery sites. If your IT disaster recovery plan includes failing over to a remote hot site, or recovering from a mobile site, you should include those details in a separate section of your plan. For example: What replicas and backups are you maintaining via the hot site? What software, services, vendors, etc. are involved to maintain and operate the hot site setup?
How do you construct an IT disaster recovery plan?
Building an IT disaster recovery plan takes careful research and discussion among all stakeholders. You should holistically understand your business needs, as well as your risk profile.
Therefore, conducting a risk assessment to identify top threats your organization faces, including specific threats to critical assets like your intellectual property, is a key initial step. You must also interview people who work with essential systems to understand what those systems do and what inputs and connections they depend on.
As you develop a holistic, prioritized understanding of IT operations, you can connect with technical and business leaders to discuss the impacts of interruptions to various critical systems. This input serves as the basis for key recovery metrics like your RTO and RPO.
Finally, based on your analysis of assets, risks to those assets, and your recovery objectives, you can plan a disaster recovery configuration. For example, do you need a cloud-based/hosted hot failover site? What backup and replication scenarios are most critical (e.g., the backup plan for your cloud-based ERP solution)? What software, services, vendors, etc. are needed to help you create the overall setup you need?
A best practice is to submits several options and price tags to management, to strike the optimal balance between recovery cost and disaster risk. Once management approves a plan, it’s time to communicate the plan around your organization and initiate training as needed.
The Database Component of IT Disaster Recovery Planning
Among the most important assets that your DR plan will protect are your databases and these require special consideration in your disaster recovery planning and testing. Databases may have their own D/R configuration such as hot or cold Standby databases using Oracle dataguard or SQL Server log shipping, or standby databases managed by DBVisit using either of those databases. In addition to testing switchover/failover to standby databases, proper database backup monitoring, protection, and testing must be part of your DR plan. A disaster recovery plan will fail if the backups that are part of the plan are compromised or not complete when the time comes to recover to them.
Next steps of IT Disaster Recovery Planning
Many companies cannot afford to face significant unplanned downtime and strive for continuous availability of critical systems like their website or ERP environment—including associated databases.
Buda Consulting understands how companies rely on databases and the importance of high availability to support quick database recovery and minimize the impacts of outages, interruptions, cyberattacks, natural disasters, etc.
Our Reliability Review service will help you evaluate your current risk scenario and help you determine the level of protection your business needs. Contact us to schedule a time to discuss your business needs and goals.