Developing a procedure to recover from a failure or disaster while simultaneously maintaining business operations during the recovery process is a key part of technology planning for business. Often this is part of an overarching Disaster Recovery/Business Continuity Plan (DR/BCP) doctrine. This master plan typically relies on key performances aspects of the Backup and Disaster Recover systems.
Disaster Recovery Plan
The two main metrics used to plan include:
- Recovery Point Objective (RPO) – The point in time to which data and systems can be restored
- Recovery Time Objective (RTO) – The amount of time required to complete the recovery procedure
These two metrics are predicted by the capabilities of the backup systems, the criticality of the data or system, and the established retention policies. The different systems supporting business operations may have a different relative change rate of data, varying degrees of mission criticality, complexity, or any other combination of salient points. For example, a web server with static content that does not change frequently and is not mission critical may be scheduled to be backed up once a day and have a lower priority for restoration. On the other hand, the ERP system that has continuous transactions that are considered critical to the business may be backed up every 15 minutes and be prioritized as one of the first system required to be brought back online. Each key component needs to be assessed separately, prioritized, and then be added to the overall plan for a given scenario.
These two metrics are key for typical file data restoration operations. They tie directly to the backup systems capabilities in how often a backup can be taken and how long a given backup is retained in storage to allow restoration. Most backup systems today are disk-based which helps streamline operations and increase data retention capabilities over the old tape-based systems. The way that these systems are sized and implemented should align with any established retention policies the business may have, whether sourced by internal policies/procedures or by external compliance requirements.
An important aspect of a backup systems capabilities, and a differentiating factor for the multiple vendor solutions available, may have more to do with how it addresses RTO for a down system. Typical business continuity scenarios include the need to temporarily run mission critical systems in a Business Continuity mode of operation while key hardware systems are repaired or replaced. In some circumstances these key systems may need to be run in the temporary environment for a couple of weeks or longer. The inability to transition these systems to a functional mode and maintain business continuity during an emergency can have profound consequences for the business.
Some key requirements and capabilities that should be considered to align business operational requirements with the technological capabilities of the backup and recovery systems include:
- Corporate Retention Policies
- Backup System Storage Capacity
- Backup System Off-site Storage
- The ability to support virtualization to temporarily host a down system
- The typical amount of time required to complete a restore in file, application, volume, server, or site recovery scenarios
- The various recovery options available for a particular backup system
Once the capabilities of the backup system can be quantified, they will reveal the proper operational expectation and inform the decision making process for developing an effective DR/BCP plan. Testing the recovery procedures for each expected scenario, such as file or full server restoration, should be completed to validate the plan and provide adequate training of the IT staff to best handle a disaster when it occurs.
Do you need help with your disaster recovery plan? Sikich is here to help.