Loading in 2 Seconds...
Loading in 2 Seconds...
Chapter 9 Business Continuity Planning and Disaster Recovery. BCP and DR (770).
An organization is dependant on resources, personnel and tasks performed on a daily bases to be healthy and profitable. Loss or disruption of these resources can be detrimental. Causing great damage or even complete destruction of the business.
Business MUST have a plan to deal with unforeseen events.
Business Continuity Planning is a broad approach to ensure that a business can function in the event of disruption of normal data processing operations.
Disaster Recovery Planning is a subset of BCP. The goal of a DRP is to minimize the effects of a disaster and take necessary steps to ensure that the resources, personnel and business processes are able to resume operation in a timely manner.
The objectives of BCP are the following
The goal of a BCP is ultimately to help a company resume operating of business functions as soon as possible after a damaging event. If you think about it, a BCP is really part of the larger “security” program. As such a BCP should be part of the security policy*
ISC states 5 Phases in BCP. We will outline them now, and detail them later.
5. Develop the contingency plan – document the results of the BIA findings and recovery strategies in a written plan
Project Management and Initialization:
In this step
Phase 2 of the BCP steps is to conduct a Business Impact Analysis. In short this step is to outline what procedures and resources the company depends on, how important each processes is and how long the business can do without each resource. The formalized step are conversed next.
5. Calculate how long these functions can survive without these resources
6. Identify vulnerabilities and threats to these processes
7. Calculate the risk for each business process
8. Document findings and report them to management
In this step the BCP committee needs to identify the types of people that will be part of the BIA gathering sessions.
These people should represent the different departments that make up the business.
After determining the general roles, we need to actually find the actual employees that fill these roles, so we can interview them.
In this phase the BCP team must create data gathering techniques to use when interviewing and gathering other information to support the BCP objectives. (surveys, questionnaires etc)
Based on the information gathered by the interviews and the data gathering techniques, we need to now identify which business processes and functions are critical for the successful operation of the business.
One we know what the important processes are we need to determine what are the resources* that these processes depend upon. These resources can be all kinds of things such as servers, data, people, buildings etc! (not just IT related things)
Now we need to prioritize and calculate the maximum time we can survive without the business processes identified in Step 3. This maximum time is called the “Maximum Tolerable Downtime (MTD)*” here are some common MTD classifications.
Keep in mind when prioritizing things, we have to use quantitative and qualitative analysis to determine just what is critical. For example loss of some process might not cause immediate financial loss, but could damage reputation or competitive advantage, and that damage could be devastating.
Here are some common MTD classifications that you should memorize*
Now we need to identify vulnerabilities and threats to these processes and the resources that are required for them. (remember Risk Management/Risk Analysis!
On the next slide we will examine some example threats.
Some examples are:
Determine the probability/risk for each business function.
Once we have done this research, we must document and provide our findings to management. Note at this point we really have not started creating a Business Continuity Plan yet, We’ve just done the research. Once Management reviews findings and gives the OK to proceed, we will actually develop the plan*
Pretty Straightforward, though a lot of work. Now that we know what we need to protect and the threats involved. Look at ways to PREVENT these problems from occurring, so we never have to worry about dealing with them. This is really just doing a Risk Analysis and determining Cost Effective Countermeasures.
Ok now we are at the stage where we actually are developing a PLAN for business continuity. Before was just initial research and getting management to give us the “OK” to develop a plan.
A more “technical” and “tangible” stage. The idea is to figure out what the company ACTUALLY needs to do to be able to recovery the necessary business processes in the event of a catastrophe.
We will go into more detail on each of these categories coming up.
A Business Process is a set of interrelated steps linked through specific actives to accomplish a specific task. For these processes the team must know the components of the process including
Facility Recovery is concerned with the ability to move processing operations to an alternate facility in case of the failure of the main facility. We can have multiple method to deal with this including
Lets looks into each of these more
A subscription service is a contract with a 3rd party to provide access to a facility. There is generally a monthly fee to retain the right to use the facility along with a large “Activation” fee and hourly fee when actually using the facility. This is obviously a short term only solution. There are 3 types of subscription services which we will talk about more of in the next slides
Hot Site – a facility that is fully configured and ready to operate in a few hours. The only resources missing from a hot site is the actual data and the actual employees.
+ can allow for annual testing
+ ready within hours
A facility that is usually “partially” configured with some computing equipment, but not the actual hard core hardware. I.e. a “hot” site without the expensive stuff.
Supplies basic environment, (AC, electrical, plumbing etc), but NO actual computing equipment. Can take a while to activate.
- May take weeks to get activated and ready
RA also called “Mutual Aid” is when two companies agree to help each other out in the case of an emergency. Ultimately this is not really practical for most business.
Can you guys tell me what the Pros and Cons of this are? Can you tell me why this is not really practical.
Pretty much these are HOT sites, that are OWNED by a company (rather than a service bureau). This also may have live or slightly delayed data backups and some staff.
- VERY EXPENSIVE (duplicate costs except for personnel)
+ best solution if turn around time and ability to recover all processing aspects are required
Another approach is rather to than have only one center that facilitates a certain business function. Split the work among multiple active centers such that there is no single point of failure.
Ok so we have plans to recover our facilities and our main processing requirements. But what about the “lower level” of things
These considerations need to be taken into consideration too we will briefly talk about these in the next few slides
Ok so we have a space to process, but unless we have a hot site or redundant site, and our building is destroyed… where do we get the servers from, what about the desktops that our staff need? Do we have a vendors to provide these, how long will it take to get new equipment from them? What happens of we have “legacy” equipment… what do we do?
We need to take all of these questions into consideration when planning.
Like the hardware backups, but specifically about hardware. How do we get copies of the software, how to we roll out installs. What about licensing?
What about custom software that we had created that we cannot just go out and buy at the store?
Software escrow – what is this? Anyone?
OK so we have the equipment and software… how do we get it all rolled out and configured such that it was the same at the company.
Incorrect configurations COULD cause compromises in integrity or confidentiality! (how?)
Do we even how our old network was configured? Can we reproduce it?
An Important concept for BCP that should be in company policy is that ‘All documentation should be kept-up to date and properly protected’
What happens if our backup facility is 250 miles away? How do we get people there?
What happens if the disaster was a natural catastrophe and some important employees are injured or worse… what do we do now?
Executive Succession Planning – what is this?
How do we notify the users about a disaster and the change of operating procedure?
Once there we need to have some type of people on the ground directing issues pertaining to employees. These people should be easily identified.
We also need to be concerned on how to manage other tasks that we might not have the resources to do in the traditional manner. (example automated data processing, or normal communication methods) How do we handle that. The BCP team needs to consider these types of issues.
How do we ensure we have data to load back into our new offsite systems? Data changes constantly. We need a solution that makes sense and is cost effective (this will vary business to business).
We will talk about traditional backup types as well as “electronic vaulting” on the next few slides.
Traditional backups have some method of backing up files to a removable medium. The first things to understand about backups is the “archive” bit. Every time a file is altered the “archive” bit is set to notify the system that a file may need to be backed up. Now lets talk about the 3 backup types
This must be done to some degree of regularity, depending on the business needs.
+ everything gets backed up
+ if you do a full backup every day, you can restore with only 1 restore operation
- Takes a long time, can be expensive to complete in a timely manner
Backup any file that has changed last full backup. Steps are
This allows you to quickly restore data in the event of a disaster in 2 operations. Simply
The idea is the backup any file that has changed between the last full backup OR the last incremental backup. Steps are
It depends on your needs.
Personally I believe in the following strategy
REMEMBER, for all these to work you still need a full backup periodically.*
Can you mix differential and incremental backups? (Why or Why not?)
All backups should be stored both onsite and offsite (why)
When storing offsite, would the next building over be appropriate?
There should be a clear written process on how to restore files (why)
Someone should periodically test the backups by performing restores to a “test” system (why)
What situations would a full backup be appropriate
What situations would a differential backup be appropriate
What situations would an incremental backup be appropriate
When choosing an offsite storage facility think of the following
Disk mirroring / shadowing – coping data to one or more hard drives such that a system has a multiple copies of data in case of a drive failure
Disk duplexing- same as shadowing, but using multiple disk controllers.. (why?)
Electronic Vaulting* is the idea of sending all changes to a file to a remote site (using non-backup methods). This usually is not done real-time but in batches.
(example bank transactions might be copied daily to another office)
RJ is another method of transmitting data to an offsite facility. However it is different than EJ.
A type of backup, however rather than backing up to a local device you “back up” to a remote device.
Now that we covered recovery strategies we need to look at a couple of recovery concepts that we will need to understand in the planning stage.
When planning we must also recognize that there are 3 different teams in DR.
Lets look at these in the next slides
Damage Assesment –
Restoration Team – should be responsible for getting the alternate site into a working and functioning environment
Salvage Team – responsible for starting the recovery of the original site.
Now we need to actually come up with a goals and a plan for attaining these goals. These goals must contain certain key information.
OK so we have this great plan that we’ve spent millions of hours and dollars creating.. But does it work, or will it sink and completely fail… we’ll we should try testing it.
So what are some testing methods?... Next slide
BCP is distributed to departments and functional areas for review. The Managers read over and indicate if anything is missing or should be modified. (Manager “checks” off that the plan is OK for their department)
Representatives from each department come together AS A GROUP, they walk through the plan and different scenarios from beginning to end to make sure nothing is left out.
A specific scenario is propose, all required employees come together and start to simulate that the event has happened and start taking action to recover. The idea is to see if any problems come up or if any concerns were left out.
Some systems are moved to the alternate site and processing takes place. The results are compared to the real processing to see if anything needs to change.
Most intrusive test.. The original site is actually shutdown and processing is moved to the alternate site (really needs to be a hot site). The recovery team fulfils it’s obligation in preparing the systems and environment for the alternate site.
Now that we have the plan we need to maintain it! Systems and processes become out of date and need constant “refresh” why?
We can help keep the plan updated by taking the following actions