1 / 46

Setting the Standard for DR

Setting the Standard for DR. John Pollard – 23 March 2006. PAS 77 – Guide to IT Service Continuity Management. PAS 56 Guide to Business Continuity Management. Business Continuity Management. RISK MANAGEMENT. IT DISASTER RECOVERY. FACILITIES MANAGEMENT. SUPPLY CHAIN MANAGEMENT.

chandra
Download Presentation

Setting the Standard for DR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Setting the Standard for DR John Pollard – 23 March 2006 PAS 77 – Guide to IT Service Continuity Management

  2. PAS 56 Guide to Business Continuity Management Business Continuity Management RISK MANAGEMENT IT DISASTER RECOVERY FACILITIES MANAGEMENT SUPPLY CHAIN MANAGEMENT QUALITY MANAGEMENT HEALTH & SAFETY KNOWLEDGE MANAGEMENT EMERGENCY MANAGEMENT SECURITY CRISIS COMMUNICATIONS & PR * Source: PAS 56:2003 Guide to Business Continuity Management

  3. IT Service Continuity Management … managing an organisation’s ability to continue to provide a pre-determined and agreed level of IT Services to support the minimum business requirements … * Source: ITIL: Best Practice for Service Delivery

  4. Threats • Loss, damage or denial of access to key infrastructure services • Failure or non-performance of third parties • Loss or corruption of key information • Sabotage, extortion or industrial espionage • Infiltration or attack on critical information systems

  5. Scope • Generic framework and guidelines for a continuity programme, including: • Management structure & responsibilities • How to conduct business criticality & risk assessments • How to define and create an IT Service Continuity plan • How to rehearse an IT Service Continuity plan • Solution architectures and design considerations

  6. What is a PAS? * Source: BSI

  7. Status Group formed First draft External review Expected release Edit Revise Contracts / Structure / Content Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 2004 2005 2006

  8. Contributors

  9. ITSC Strategy • Define direction and high-level methods to meet IT service level objectives • Agreed at Board level • Needs to consider 4 stages of major incident • Initial response • Service recovery • Service delivery (following incident) • Normal service resumption • Enable rehearsal of major incident

  10. ITSC Strategy & Plan Business Strategy Threat Analysis Business Criticality IT Service Continuity Strategy IT Architecture IT Service Continuity Plan Rehearsals Costs Processes

  11. Maintaining an ITSC Strategy Monitor

  12. Management Structure Crisis Management Team CMT CMT Business Continuity Management Team BCMT BCMT Incident Management Team IMT IMT

  13. Business Criticality & Risk Assessments • Identify business units & processes • Categorise criticality of processes • Identify IT services supporting the business processes • Categorise criticality of IT services • Review • By location • By business unit

  14. Business Criticality Categories • Critical • Vital to day-to-day operation • Mandatory • Vital to meet statutory requirements • Strategic • Important for implementation of long-term strategy • Tactical • Important for short/medium term objectives

  15. Risk Assessment Process Learn Lessons

  16. ITSC Plan • Part of wider BCM Plan • Model plan should include: • Initial response • Incident assessment • Roles & responsibilities • Procedures • Rehearsing the plan • Maintaining the plan

  17. Recovery Objectives • Recovery Point Objective (RPO) • The point in time to which work is restored. E.g. Start of day • Recovery Time Objective (RTO) • The time required to recover service

  18. Balancing Cost & Recovery Objectives

  19. IT Architecture – Resilience Considerations • Location & distance between sites • Number of sites • Staff access & proximity • Remote access • Dark site vs. manned site • Staff skill levels • Telecoms connectivity and redundant routing • Automation required • Telephony and email • 3rd party / external links

  20. High Level Process Flow

  21. Task Summary Sheet

  22. Rehearsal • A body to control & coordinate • Objectives & success criteria • Rehearsal plan & scripts • Staff briefing • Logs and critique forms • Observers • Post-rehearsal review

  23. Areas to Rehearse • Callout • Walk through reviews • Walk through exercises • Component rehearsals • Integration rehearsals • Relocation rehearsals • Failover rehearsals • Major incident simulations

  24. Architectures

  25. Site Models • Active / Contingency • Cold site • Active / Active • Service runs from both sites • Active / Alternate • Service can run from either site • Active / Backup • Warm standby site • Multi-site and other hybrids

  26. App App Data Resilience Tape/backup Database Application Host Storage Array SAN

  27. Replication Modes • Synchronous • Increased write latency • Typically OK for OLTP • May impact batch processing • Requires greater inter-site bandwidth than other options • Snapshot • Point in time copy • Only valid on completion of transfer • Minimal/no performance impact • Near real-time • Frequent snapshots • Minimal performance impact

  28. Service Continuity Technology People Processes A Holistic Approach Service Continuity is much more than technology

  29. john.pollard@unisys.com

  30. Defining the Standard for DR Part II - Workshop John Pollard – Unisys PAS 77 – Guide to IT Service Continuity Management

  31. Typical Challenges • Tape recovery slow • Manual build is complex • Complex inter-operation between systems • Difficult to define critical and non-critical • Management of failover site • Keeping sites in step • Windows Servers

  32. Synchronous Write Latency Server Transfer time Write 1 ≈ 0. 5 mSec Write 2 ≈ 0.5 mSec Storage Array Storage Array Communication link Latency = 2 * Write Time + Transfer Time For 200 kilometres using Fibre Channel Latency = 2 * 0.5 + 4.0 = 5.0 mSec

  33. Site Synchronisation • Major challenge • Cultural change is needed • Critical to successful operation • DR systems • Build at recovery time • Slow / complex recovery • Maintain ready to use • How to validate changes • Live run • System dependent

  34. Windows Servers • Build DR servers at recovery time • Lengthy recovery process • Prone to errors • Complex – requires higher skill level • Maintain DR servers ready to use • HW does not have to be identical • Complex SW change and configuration management • How to validate releases • Boot servers from storage array • Requires matching HW • SW only installed once • Simplifies SW change and configuration management • Simplifies failover process / improves recovery

  35. Windows Boot from SAN Production Site DR Site Test Server Live Server DR Server Live Data Test Data Live OS Test OS Data OS Storage Array Storage Array

  36. Virtualisation • Reduced investment • Fewer servers dedicated for resilience • Expand/replace if long term outage • Flexibility • Allocate/use servers as required • Potentially reduced capacity • Depending on system and scale of incident • Configuration may not have been proved

  37. Service Management Identify Affected Areas • Service Desk • Incident Management • Problem Management • Configuration Management • Change Management • Release Management • Testing

  38. Operational Assessment • Understand people and process • Gap analysis

  39. Delivery Approach Discover Model Design Implement Manage • Business Objectives • Current Issues or Problems • Existing/Target Infrastructure • Success Criteria • Vision • Existing Systems, Applications & Services • Physical ‘As-Is’ Model • Logical ‘As-Is’ Model • Data profiling • Security assessment • ‘To-Be’ Logical Model • ‘To-Be’ Physical Model • Project plan • Resource schedule • Develop business case • Implement target environment • Migrate and consolidate applications • Application and middleware integration • Define and implement test strategy • Operational assessment & gap analysis • Implement operational & management processes

  40. Workshop • Determine high-level requirements • Determine Business Drivers • Determine Success Criteria • Overview systems and applications • Identify team members, sponsors, etc. • Agree timelines

  41. SERVERS STORAGE NETWORKING Discovery Audit and map: • Hardware • Software • Services

  42. Data Applications Services Group Systems Analysis

  43. Design • Systems architecture • Operational assessment • Test environment • Project plan and resource schedule • Training requirements

  44. Transition to Future State Operational Management Optimised Architecture Service Continuity Application Selection and Development Standards Data Centre Transformation Network Design Storage Design Training Requirements Systems Design Systems Management Migration Plan Test Environment and Strategy

  45. Implementation • Methodology • Call on best practice • Operational management • Cultural change • Keep people informed

  46. john.pollard@unisys.com

More Related