Information Technology Disaster Recovery

Within every growing organization, there will come a time when the company has moved beyond a single server, or small set of services to a larger scale of operation. As this occurs, the need to create and plan for problems, outages, within the company as well as from external vendors becomes necessary. Below is an example of a very generic Information Technology Disaster Recovery Plan.

As CIO/CTO you are responsible for ensuring the confidentiality, integrity, and availability of the company’s data. But this does not take away from the fact that external third party vendors are used, and when they are it is vital to choose ones that provide Software as a Service (SaaS). Especially utilizing vendors with Service Level Agreements (SLAs) that exceed 99.99%. However, even with such agreements in place, it is important to recognize that service disruptions can occur.

The goal of Information Technology Disaster Recovery (ITDR) is to prepare for and recover from a disaster should an incident arise that could negatively impact business continuity. The following guidelines define the steps to take to maintain or restore technological infrastructure and applications in the case of a disaster. It applies to hosted systems that require ITDR procedures to be implemented and all members of a business technology team including employees, employees of temporary employment agencies, and contractor personnel regardless of geographic location.

Disaster Recovery terms defined

Fault tolerance – is the property that enables a system to continue operating properly in the event of the failure of some (one or more faults within) of its components.
Recovery Point Objective (RPO) describes the interval of time that might pass during a disruption before the quantity of data lost during that period exceeds the Business Continuity Plan’s maximum allowable threshold or “tolerance.”
The Recovery Time Objective (RTO) is the duration of time and a service level within which a business process must be restored after a disaster in order to avoid unacceptable consequences associated with a break in continuity. In other words, the RTO is the answer to the question: “How much time did it take to recover after notification of business process disruption? “

ITDR Procedure

Processes should begin with partnering with appropriate vendors. Whenever possible, cloud vendors that are able to provide SLAs that exceed 99.99% are chosen. The vendors selected must be able to meet high availability requirements, have a proven disaster recovery plan in place, and are able to provide fault tolerance. Through the risk assessment process for vendors we identify key vendors that would need to be restored as primary in the case of a disaster. This also requires a proper vendor management plan as well as a vendor assessment policy be in place and followed rigorously.

In most cases ITDR is dependent on the vendor being available and response processes reflect notification to the vendor and to impacted customers.

A vendor risk assessment is carried out and they are rated against an evaluation matrix and the SLAs can be viewed. A vendor risk assessment and evaluation is completed for potential vendors too. This assessment includes the following:

Review of SOC Report
Review of vendor BC/DR Plan
Review of SLA offerings for RTO and RPO options

Service Continuity and Disaster Recovery

Certain external hosts guarantee that the volumes will be available 99.99% of the time in a given monthly billing period. Volumes shall be deemed available unless: (i) the Service returns a Server Error Response to a valid request during two or more consecutive ninety (90) second intervals, or (ii) data stored on volumes becomes inaccessible to the applicable cloud server. If this guarantee is not met eligible credit can be calculated as a percentage of the service fee for the affected volume. This is not always the case and varies from vendor to vendor and service to service. Each SLA should be considered during the evaluation process.

Some vendors with solid disaster recovery strategies are designed to deliver high availability and tolerate system or hardware failures with minimal impact. To help ensure business continuity, these vendors often maintain regional disaster recovery plans for their services in the U.S. and EU along with documentation in the form of an annual run-book that outlines all the steps required to complete a datacenter failover.

Internal Information Technology groups should also be a part of this plan, and follow similar requirements placed upon third party vendors. Geographically redundant centers should be tested and failed over (and back) as apprpriate at a quarterly or semi-annual basis.

Failing back from a disaster

Once the primary site is restored to a working state, failing back to the primary is necessary. Depending on the vendor’s ITDR strategy, this typically means reversing the flow of data replication so that any data updates received while the primary site was down can be replicated back, without the loss of data.

Testing your ITDR Plan

ITDR should be tested during annual Business Continuity Plan Testing. Additionally, the ITDR plan has subjective testing based on the level of risk the vendor poses to your operations and if there is an agreement with the vendor in the Statement of Work.

Systems monitoring

Servers are monitored for memory capacity, CPU load, disc-space, and logged in users. This allows you to anticipate load issues that might impact your ability to serve your customers. Additional application and API specific monitoring should be enabled on an as needed basis, and monitoring results (aka dashboards) should be available widely.

Delegation of Authority

This procedure can be amended at any time by the Chief Information Officer (CIO). Any amendment that is purely stylistic or constitutes neither a material nor substantive change or modification of this Policy may be approved and adopted by the Head of Risk or the Director, System Architecture without submitting such proposed amendment or modification to the CIO.

Responsible department

Information technology will be responsible for the maintenance of this procedure.

Review date

This procedure will be reviewed and updated as necessary whenever there is a material, relevant change to our operations, structure, business or location, but at least annually.

Associated points to cross reference when dealing with ITDR

Business Continuity Plan
Vendor Risk Assessment Procedure
Information Security Plan
Information Security and Privacy Incident Response Policy & Plan

Leave a Reply Cancel reply