1 / 53

Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Disaster Recovery. Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology The Pennsylvania State University University Park, PA 16802 chu@ist.psu.edu. Theory  Practice. Learning by Doing. IST 515. Objectives. Describe the basic differences between BCP and DRP

Download Presentation

Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Disaster Recovery Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology The Pennsylvania State University University Park, PA 16802 chu@ist.psu.edu Theory  Practice Learning by Doing IST 515

  2. Objectives • Describe the basic differences between BCP and DRP • Describe the steps involved in creating a disaster recovery plan tests. • Identify and describe the various types of recovery strategies. • Describe how to formulate a recovery strategy. • Compare and contrast strategies for backup. • Identify the advantages and disadvantages of mutual aid agreements. • Compare and contrast the advantages and disadvantages of hot sites and cold sites. • Compare and contrast the advantages and disadvantages of using service bureaus.

  3. Readings • Hansche, S., Berti, J. and Hare, C., Official (ISC)2 Guide to the CISSP Exam, Auerbach, 2004. Chapter 9 (Required). • Swanson, M., Wohl, A., Pope, L., Grance, T., Hash, J., and Thomas, R., Contingency Planning Guide for Information Technology Systems, NIST Special Publication 800-34, June 2002. • Wikipedia, Disaster recovery. http://en.wikipedia.org/wiki/Disaster_recovery

  4. Disasters

  5. BCP Cycle

  6. Areas Covered in BCP • Contact points. Who to contact during office hours, outside office hours, and in an emergency; • Roles and responsibilities. A well-defined organizational structure for the business continuity and recovery teams; • Risk levels. A categorization of business risks and the level of risk the organization deems acceptable; • Continuity and recovery service levels. How much time is acceptable for responding to threats, implementing continuity plans, and recovering from failure scenarios;

  7. Areas Covered in BCP • Business continuity reviews. How and when the organization reviews business continuity plans; • Business continuity processes. Processes and procedures that inform staff how to react to and handle particular failure scenarios; • Incident reporting and documentation. Methods of recording and documenting incidents and responses to them; • Testing. Acceptance criteria and testing requirements for the business continuity plan; and • Training. Training requirements for staff involved in business continuity and disaster recovery processes.

  8. Step 1: Initiate the BCP Project • Obtain and confirm support from seniormanagement. • Identify key business and technical stakeholders. • Form a business continuity working group. • Define objectives and constraints. • Establish strategic milestones and draw up a road map. • Begin a draft version of business continuity policy.

  9. Step 2: Identify Business Threats • Technology threats include natural disaster (such as flooding), fire, power failure, systems and network failure, systems and network flooding (when attackers try to overwhelm a network with traffic), virus attack, denial-of-service attack, theft, vandalism, and sabotage. • Information threats come from hacking, theft, fraud, fabrication, alteration, misuse, natural disaster, fire, and the degradation of the ink on paper records. • People threats include illness, recruitment shortfalls, resignation, compassionate leave, pregnancy, weather, and unavailability of transportation or office access.

  10. Step 2: Identify Business Threats • Identify the community of business and technical stakeholders. • Conduct threat identification workshops. • Delineate and document business threats.

  11. Step 3: Conduct a Risk Analysis • Conduct risk analysis workshops. • Assess the likelihood and impact of threat occurrence. • Categorize and prioritize threats according to risk level. • Review outputs of risk analysis with management. • Ascertain level of risk acceptable to the organization. • Document outputs in business continuity policy.

  12. Step 4: Establish the Business Continuity Team • Identify key business, technical, and customer services stakeholders. • Form and empower the business continuity team. • Clarify and agree on team objectives and working mode. • Define roles and responsibilities; produce a work plan. • Identify incident engagement and response processes. • Update business continuity policy.

  13. Roles of BC Team • A business continuity manager is the first point of contact, manages the incident, initiates the business continuity plan, mobilizes the business continuity team, and presents key decisions to business owners when appropriate. • The business owner makes key decisions about how the business handles incidents. • The technical services manager manages disruptions to technical services, such as IT infrastructure and applications; initiates continuity arrangements; and interacts with third-party business continuity service providers. • An estate manager manages disruptions relating to buildings, offices, and the surrounding environment; initiates continuity arrangements and interacts with third party business continuity service providers.

  14. Roles of BC Team • The business operations and customer services manager manages disruptions to business operations and customer services; keeps customers informed if there is a noticeable impact on customer service levels; initiates continuity arrangements; and interacts with third-party business continuity service providers. • Business continuity (or resumption) teams are technical, estate, or customer services teams that execute the business continuity plans. • A recovery manager guides the business’ recovery to normal operations.

  15. Step 5: Design the Business Continuity Plan • Identify critical and noncritical business services. • Establish preferred business continuity service levels and profiles for continuity and recovery. • Reaffirm key constraints (such as time and cost). • For each threat, identify possible continuity strategies and evaluate them in terms of time, cost, and benefits. • Identify and engage potential business continuity partners. • Draft a set of continuity plans and work toward an agreed set of plans with senior management. • Produce and execute an implementation plan.

  16. Common Strategies • Technology: Redundancy (of hardware and network, for example), maintenance and support agreements, and backup and restore capabilities are common defensive strategies. • Information: Recover information by using data mirroring, backup and restore, auditing, and off-site or secondary data storage. • People: To temporarily shore up people-related resources, use contract staff, rotas (workloads that a company can change in response to business demand or personnel shortfalls), call-out arrangements (having certain staff in standby mode to be called to work as necessary), rental offices and sites, manual procedures, and service-forwarding agreements (such as with specialist call centers).

  17. Evaluating Criteria • Costs for acquisition, deployment, testing, training, and associated management overhead; • Level of protection; • Business resumption response time; and • Time to implement, including time for acquiring, deploying, and testing the business continuity strategy and for conducting relevant and necessary training.

  18. Step 6: Define Your Business Continuity Processes • Identify, define, and document business continuity processes. • Review and verify business continuity processes with relevant stakeholders. • Identify training requirements. • Develop training exercises, role-playing scripts, and simulation case studies. • Initiate training and awareness programs.

  19. Business Continuity Processes • Handling specific failure events, such as fire and network failures; • Backup and restoration of systems and business data; • Virus management; • Incident reporting; • Problem escalation hierarchies; • Customer and staff communication; • contact procedures for third-party support providers.

  20. Step 7: Test your business continuity plan • Define business continuity acceptance criteria. • Formulate the business continuity test plan. • Identify major testing milestones. • Devise the testing schedule. • Execute tests via simulation and rehearsal; document test results. • Assess overall effectiveness of business continuity plan; pinpoint areas of weakness and improvement. • Iterate tests until the plan meets acceptance criteria. • Check, complete, and distribute business continuity policy.

  21. Reasons for Testing BCP • Validate the plan’s effectiveness in meeting your stated business continuity service levels; • Identify, at an early stage, any shortcomings in the plan; • Assess whether your business continuity service levels are realistic and achievable given your budgetary and time constraints; and • Give senior management and other parties (such as regulatory bodies) confidence in the plan.

  22. Step 8: Review your business continuity plan • Develop a review schedule for different types of review. • Arrange a business continuity review meeting or workshop. • Update the business continuity document. • Kick off another BCP cycle if necessary.

  23. When to Review BCP • Significant changes to the business—for example, the launch of new e-business operations; • Changes in business priorities; • Shifts in the legal or regulatory landscape; • Significant world events (wars or terrorist attacks); • Changes to the IT budget; • Physical relocation of IT systems and operations; • Outsourcing of IT systems and operations; • Developments in IT infrastructure; and • Significant changes in the labor market.

  24. Common Pitfalls In BCP

  25. Disaster Recovery • Disaster recovery refers to the immediate and temporary restoration of critical computing and network operations after a natural or man-made disaster within defined timeframes. • An organization should document how it will respond to a disaster and resume the critical business functions within a predetermined period of time; minimize the amount of loss; and repair (or replace) the primary facility to resume data processing support.

  26. Disaster Recovery Planning • A comprehensive statement of consistent actions to be taken before, during, and after a disruptive event that causes a significant loss of information systems resources • The procedures for responding to an emergency, providing extended backup operations during the interruption, and managing recovery and salvage processes afterwards, should an organization experience a substantial loss of processing capability.

  27. Disaster Recovery Planning • To provide the capability to implement critical processes at an alternative site and return to the primary site and normal processing within a time frame that minimizes the loss to the organization, by executing rapid recovery procedures.

  28. Goals and Objectives of DRP • Protecting an organization from major computer services failure. • Minimizing the risk to the organization from delays in providing services. • Guaranteeing the reliability of standby systems through testing and simulation. • Minimizing the decision-making required by personnel during a disaster.

  29. Disaster Recovery Procedures • The recovery team. • The salvage team. • Normal operations resume. • Other recovery issues: • Interfacing with external groups • Employee relations • Fraud and crime (vandalism and looting) • Financial disbursement. • Media relations.

  30. Recovery Strategies

  31. Recovery Strategies Recovery strategies consist of a set of predefined and management approved actions implemented in response to an unacceptable business interruption. The focus is on recovery methods to meet the predetermined recovery timeframes established for the operation and functioning of the critical business functions. Developing the recovery strategies includes compiling the resource requirements and identifying the alternatives available during recovery.

  32. Sample of Business Unit Priorities

  33. Steps in Developing Recovery Strategies • Document all costs with each alternative. • Obtain costs for any outside services. • Develop written agreements. • Evaluate risk reduction and resumption strategies based on a full loss of the facility. • Identify risk reduction measures and revise resumption priorities and timeframes. • Document recovery strategies and present them to management for comments and approval.

  34. Recovery Strategies Strategies should address recovery of: Business operations Facilities & supplies Users (workers and end-users) Technical (network, telecommunication, data center) Data (off-site backups of data and applications)

  35. Business Recovery Strategies • Business recovery strategies focus on critical resources and the MTD for each business function. • The business unit priorities are taken directly from the BIA. The length of the recovery window for each business unit dictates the priority for recovery. The strategies involved identifying the following: • Critical business units and their associated business functions. • Critical IT system requirements for each business function. • Procedures for connectivity to IT infrastructures (e.g., mainframe, mini, LAN, WAN).

  36. Business Recovery Strategies The strategies involved identifying the following: • Critical equipment and supply requirements for each business function. • Essential office space requirements of each business unit. • Key personnel for each business unit. • Redirection of postal service mail, voice telecommunications, and data networks to the recovery site. • Business unit interdependencies with other units. • Off-site storage (procedures, media, documents). • Vendor services.

  37. Facility and Supply Recovery Strategies • Facility recovery involves identifying recovery procedures for the alternate facility, including space, security, fire protection, infrastructure, utility, supply, and environmental requirements. • Determine minimum space for recovery of critical business units. • Determine space needs for less critical resources. • Determine security needs at recovery sites. • Determine fire protection needs. • Determine critical furnishings and office equipment. • Determine infrastructure requirements. • Determine utility and environmental needs. • Determine what office/business supplies are needed.

  38. User Recovery Strategies • The strategies involved with personnel requirements focus on manual procedures, vital records, and restoration procedures. A critical component is establishing methods to implement the process and maintain the records so that information can be easily and accurately updated to the electronic format when service is restored. The plan should specify the followings: • Manual procedures. • Vital record storage (i.e., medical, personnel). • Employee notification procedures. • Employee transportation arrangements. • Employee accommodations.

  39. Technical Recovery Strategies Technical recovery strategies define alternate recovery strategies for the data center and network infrastructure components. Methods: Subscription services. Mutual aid agreements. Redundant data centers. Service bureaus.

  40. Subscription Services • Subscription services provide an alternate facility or “site” for recovery. They are characterized as hot, warm, cold, mirror, • and mobile sites. • Hot Site. A fully configured site with complete customer required hardware and software provided by the service. • Warm Site. Similar to a hot site, but the expensive equipment (i.e., mainframe) is not available on-site. The site is ready in hours after the needed equipment arrives.

  41. Subscription Services • Cold Site. Does not include any technical equipment or resources, except environmental support such as air conditioning, power, telecommunication links, raised floors, etc. • Mirror Site. Also referred to as full redundancy, is a computer service facility equipped with utilities, communication lines, and appropriate hardware that is fully operational and processes each transaction along with the primary site. • Mobile Site. A trailer that can be set up and link by a trailer sleeve to create a space to suit the subscriber’s recovery needs.

  42. Reciprocal or Mutual Aid Agreements • This strategy is to establish reciprocal or mutual aid agreements with other companies to provide facilities to the other in the event of a disaster. • Reciprocal agreements require the companies to have similar hardware and software computing environments. • Typically, reciprocal agreements are dismissed in practice because few information system facilities have the extra capacity needed to run both their own and another organization’s needs for any extended period of time.

  43. Technical Recovery Strategies Redundant Processing Centers: Expensive Maybe not enough spare capacity for critical operations Service Bureaus: Many clients share facilities Almost as expensive as a hot site Must negotiate agreements with other clients

  44. Data Recovery Strategies The objectives are to back up critical software and data, store the backups at an off-site location, and retrieve the backups quickly during a recovery operation Backups of data and applications Off-site vs. on-site storage of media How fast can data be recovered? How much data can you lose? Security of off-site backup media Types of backups (full, incremental, differential, etc.)

  45. Recovery Management • This is sometimes referred to as Crisis Management. Essentially, it is the overall coordination of the organization’s response to a crisis. • The goal is to deal with the issues in an effective and timely manner and avoid or minimize damage to the organization’s profitability, reputation, and ability to operate. • The flow of accurate information is a key ingredient to effective crisis management. The effective management of information can serve as the first line of defense against a crisis and can also be the most effective mechanism in the process of restoring both the business functions and public confidence.

  46. Testing the Disaster Recovery Plan • To verify the accuracy of the recovery procedures and identities • To prepare and trains the personnel to execute their emergency duties • To verify the processing capability of the alternative backup site

  47. Testing DRP Creating the Test Document: • Testing Schedule and Timing • The Duration of the Test • The Specific test steps • Who will be the participants in the test • The task assignments of the test personnel • The resources and services required (supplies, hardware, software, documentation, and so forth)

More Related