Business Continuity Tutorial

7.1 Operational Procedures

Hello and welcome to module 7 of CompTIA Cloud Plus course offered by Simplilearn. In this module, we will discuss the points to be considered for business continuity in the cloud. The next slide will give us a glimpse of the topics covered in this module.

7.2 Objectives

By the end of this module, you will be able to Discuss the disaster recovery and its objectives Define the business strategy for disaster recovery Explain the disaster strategy terminologies Identify the solution to meet the availability requirements Let us begin with the first topic, that is, disaster recovery.

7.3 Disaster Recovery

Disaster Recovery (DR) process involves using proactive and reactive strategies to maintain business continuity. Proactive strategy refers to creating a plan to deal with an expected difficulty whereas reactive strategy refers to facing an unexpected difficulty and recovering from it. Companies invest a lot of time and money to plan, prepare, rehearse, document, train, and update its procedures and processes to deal with a disaster. This helps the companies to minimize the impact of the disaster. Disaster recovery is a continuous process. Disaster recovery in cloud is simple and more cost-effective compared to the traditional infrastructure, as the business will be charged based on the service usage. Cloud enhances the ability of a company to scale up their business infrastructure on an as-needed basis. In a pay-as-you-go model, cloud provides a business with access to an infrastructure that is dependable, reasonable, highly scalable, and fast. For a disaster recovery (DR) solution, the cloud will lead to significant cost savings. Let us discuss the mission critical components in the next slide.

7.4 Disaster Recovery (contd.)

A mission-critical system refers to the failure of a system that leads to the failure of the entire business operations, affecting the productivity. It is mandatory to focus on mission-critical applications or servers when planning for disaster recovery. If a system’s failure results in the organization’s failure to operate and generate income, that system would be considered mission critical. These systems are to be identified and backed by a proper disaster recovery method to ensure there is no lost revenue for the organization. We discussed mission critical components in this slide; in the next slide, we will discuss disaster recovery objectives.

7.5 Disaster Recovery Objectives

Most organizations will have two recovery objectives, i.e., RPO and RTO. Recovery Point Objective (RPO) is the maximum amount of time in which data can be lost for a service due to a major incident. Recovery Time Objective (RTO) is the amount of time between an outage and the restoration of the service. Consider redundancy to help meet the expected RTO and RPO. You can use a redundant system to provide a backup to the primary system in case of a failure. One of the best examples of using a redundant system is RAID. In addition, geographical diversity should also be considered when planning for a disaster that may affect a particular geographic region. Disaster comes in many forms. Therefore, it is necessary to choose a location for implementing disaster recovery servers that are less prone to natural disasters. In the next slide, we will discuss typical strategies for disaster recovery.

7.6 Business Strategy for Disaster Recovery

This disaster recovery strategy is flexible as it depends on the varied business scenarios and its impact on IT. The image on the slide illustrates the phases of disaster recovery that are iterative (read as eye-tea-rate-ive) in nature due to changes in technology and needs, as per the business process versions. Let us understand each phase in detail. Click each phase to learn more. The first phase is risk evaluation. Risk evaluation determines all the potential risks that are present in the business process. It also helps in risk mitigation by making proactive strategies. The second phase is business impact analysis. This analysis assesses the impact on business from all possible disruptions. This analysis quantifies as well as qualifies the impact. It also helps to identify the recovery priorities and interdependencies. The third phase is DR plan and strategy. This phase involves designing the process for disaster recovery. It ensures that the DR design complies and adheres to RPO/RTO and the budget criteria. It also complies with the regulatory and auditing criteria. The fourth phase is DR drills. These drills are performed with respect to disaster scenarios and deals with resource planning. It stimulates the disaster scenarios and tests the recovery success. It also deals with the data management. The fifth phase is the DR management. It administers and monitors the DR. In the next slide, we will discuss the terminologies used in the disaster recovery.

7.7 Disaster Recovery Terminologies

The terminologies used in disaster recovery are as follows: Redundancy: More than one connection, site, or device is used and maintained to remove the barrier in the information flow that could occur at a single point of failure. Redundancy maintains the backup to use it when the primary source goes down. High Availability (HA): A constant functional system or component that is available in case of physical system failure. It requires technical implementations and backup server (redundancy). Some of the non-high availability resources are the actual physical resources where the virtual instances reside. Failover: This is the process of switching the production servers to a backup-server facility, temporarily. It can also be a redundant site. Failback operation: A failover operation is always followed by a failback operation. A failback operation is the process of returning the production servers to its original location. Let us continue discussing the disaster recovery terminologies in the next slide.

7.8 Disaster Recovery Terminologies (contd.)

Hot Sites: These are the duplicate servers of the active production server, used for fault-tolerance. They synchronize with the production servers for the updates in the production server to be done in hot sites as well. Hot site is used in critical businesses, like banking, finance, etc., where it is essential that the data must be live and updated constantly. Cold Sites: Cold sites on the other hand are the duplicate servers similar to that of hot site; however, this site is not synchronized with the production server 24/7. In addition, the updates are done at the end of the day. This type of typical recovery option is suited for companies, which do not require live updating. Businesses like shopping center chains, hotel chain businesses usually prefer such type of backup. Warm Sites: A warm site is a combination of both hot and cold sites, and has a readily available hardware but at a much smaller scale than a hot site. Warm sites will have backups but will not be the updated ones. Warm sites maintain the threads of the backup rather than the entire file. It acts like a cache server. Mean Time between Failures (MTBF): It is the duration between the two successive failures of the production server machines. This duration becomes important to understand the success rate of the previous solution delivered during the failure period. This checks the reliability of the solution provided. Mean Time to Recovery (MTTR): A device takes time to recover in case of any failure. This average time taken to recover is known as Mean Time to Recovery or MTTR. So far, we have understood the terminologies in disaster recovery. In the next slide, we will look into replication.

7.9 Replication

Replication in disaster recovery helps improve the fault-tolerance and reliability of the system data. The two types of replication are synchronous and asynchronous. Synchronous replication copies the data over the network to another device, allowing multiple copies of up-to-date data. Synchronous replication writes data to both the primary and secondary sites at the same time, so both locations have the same data. Synchronous replication is more expensive than asynchronous replication and can affect the performance of the application that is being replicated. With asynchronous replication, there is a delay before the data is written to the secondary site. New data can be accepted at the primary site without waiting for the data to be written to the secondary site. If the primary site fails before the data could be replicated to the secondary site, the data not yet written to the secondary site may be lost. Sometimes organizations must also go for site mirroring as a solution to replication. Site mirroring refers to copying the entire data to another site, which may reside in the same or a different location. However, the reliability of such setup depends on the type of process implemented in an organization. In the next slide, let us understand “Backup and Recovery”.

7.10 Backup and Recovery

Backup is the process of copying and archiving data so that it can be restored later. The data is restored to either the original location or an alternate location, should the original data be lost, modified, or corrupted. Backup is also created to enable the recovery of data from an earlier period. There are three styles of backups, namely, full backup, incremental backup, and differential backup. A full system backup backs up the entire system, including hard drive. It makes a copy of all the data and files on the drive in a single process. The benefit of a full backup is that an organization can take any of the backups from any day they were executed, and restore the data from a single backup media. The differential backup backs up only those changes that were made since the last full backup was executed. An incremental backup also backs up only those files that have changed since the last backup execution. The last backup can be either a full backup or an incremental backup. This makes incremental backups faster, and requires less space. Archiving refers to systematically storing the data such that the retrieval of the data is faster. This is done to ensure the availability of the data. Offline storage is also used as a backup solution for data in case of network failure. This refers to storage, which is isolated from network. In the next slide, we will explore typical solutions to meet the availability requirements.

7.11 Solutions to Meet the Availability Requirements

Following are the solutions to meet the availability requirements: Fault Tolerance: In this, the production servers will be live even if the physical servers are down. Any problem in the production server will not affect the availability of the business service. For example, the use of VMware VSphere package for virtualization approaches. In case of server failures, a constant availability of servers and applications is provided by VMware. The constant availability is ensured by the creation of virtual machine’s live shadow instance, with which the primary virtual machine is synchronized. Fault-tolerance is highly preferred in banking and finance sectors, as they have to maintain their production servers live under any circumstances. Clustering: It connects multiple computers to provide redundancy and parallel processing. These types of computers are connected over a fast local area network and each node present will have its own individual operating system, synchronizing with each other. Clustering is of two types, namely, local clustering and geo clustering. The clustering mentioned earlier was local clustering. In geo clustering, multiple computers are logically connected but physically present in different geographical locations. The typical connection medium for such a cluster is the Internet. Let us continue discussing the solutions to meet the availability requirements in the next slide.

7.12 Solutions to Meet the Availability Requirements (contd.)

Load Balancing: This is another way to eliminate the problem of availability. Load balancers are programmed to detect live servers, to be dead servers, or completely dead servers. If an event that may interrupt the service occurs, the load balancer takes the responsibility of re-routing the entire request, either to the hot site or to the cold site. This is subject to the availability of services installed. Multipathing: This is also one of the methods for assuring the availability of the services. It was first introduced in the Solaris servers, called as IP network multipathing. This ensures the availability by spreading the load of connections all over the available cold sites. This technique is expensive, as it requires minimum 10 or more number of cold sites. Let us move on to the quiz questions to check your understanding of the topics covered in this module.

7.14 Summary

Here is a quick recap of what was covered in the module: Disaster recovery involves proactive and reactive strategies to maintain the business continuity. The procedures for disaster recovery are risk evaluation, business impact analysis, DR plan and strategies, performing drills, and managing the DR. The solutions to maintain the availability of the services are load balancing, multipathing, clustering, and fault tolerance.

7.15 Thank You

We have completed the CompTIA Cloud Plus course. Thank you and happy learning.

  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

Request more information

For individuals
For business
Phone Number*
Your Message (Optional)
We are looking into your query.
Our consultants will get in touch with you soon.

A Simplilearn representative will get back to you in one business day.

First Name*
Last Name*
Work Email*
Phone Number*
Job Title*