1 / 26

Catastrophic Failure and Business Recovery WTC Lessons Learned

Catastrophic Failure and Business Recovery WTC Lessons Learned. Kelly Polanski VP, Availability Solutions Sept 11, 2002. Our Company. Our Offerings. Our Customers. Headquarters in Mountain View, CA Pro-forma 2001 Revenues: $308M Distribution in 50+ Countries via:

cliff
Download Presentation

Catastrophic Failure and Business Recovery WTC Lessons Learned

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Catastrophic Failure and Business Recovery WTC Lessons Learned Kelly Polanski VP, Availability Solutions Sept 11, 2002

  2. Our Company Our Offerings Our Customers • Headquarters in Mountain View, CA • Pro-forma 2001 Revenues: $308M • Distribution in 50+ Countries via: • Worldwide direct sales force • 25 OEM’s and Strategic Partners • 300+ Channel Partners • Revenue Mix: Direct – 25% Channel – 75% • Scalable, Industrial- strength Products • Information Management Solutions • Content and Messaging • Automated Availability • Information Protection • Business-critical Support, Services, and Education • Global Fortune 1000 Focus • Departmental to Enterprise • 29,000 Customers • Over 100,000 NetWorker Licenses • Major Verticals: • Financial • Telco • Government • Healthcare • Education • Utilities Legato At A Glance

  3. Catastrophic Failure and Business Recovery The New Ways In Which Businesses are Vulnerable to Attack How have changes in business exposed new vulnerabilities to disaster? How have changes in IT infrastructure affected business vulnerability?

  4. Requirement for Data and ApplicationAvailability Capability to DeliverAvailability Challenges of Global Operations • Chronic IT Understaffing • 7x24x365 Operations • No Data Backup Window • Increased Complexity • No Applications and Systems Maintenance Window • Increased Exposure to Disaster and Failure

  5. Network Recovery Proved Critical to Business Operations • Bank of New York processes at least 50% of all government securities in the US. The destruction of the network communications infrastructure would not allow [the bank] to clear government securities, and also left many brokers unable to access the bank’s systems. This affected the government’s ability to infuse liquidity into the financial system, to add stability and relief. … has global ramifications, affecting many international institutions and businesses.” • IDC Flash, October 2001

  6. Network Recovery Proved Critical to Business Operations • “Given the geographical distribution of our electronic network NASDAQ's facilities remained intact and open throughout the ordeal. We did lose connectivity with firms who provide 40% of our daily order flow, however, thus our "recovery" effort was entirely focused on bringing them back up - particularly those forced to use disaster recovery sites.” • Hardwick SimmonsChief Executive OfficerThe NASDAQ Stock Market

  7. Catastrophic Failure and Business Recovery Considering “Comprehensive” Disaster Protection What must be protected to ensure your business survives? Lessons Learned in the wake of the WTC disaster.

  8. Rebuild Infrastructure Restart Applications Restore Data Defining Disaster Recovery versus Business Continuance Disaster Recovery = Recovering the Failed Site

  9. Recovery Transfer Operations Transfer Operations Defining Disaster Recovery versus Business Continuance Business Continuance = Hosting the Business through Recovery Hot- or Warm-Site

  10. Step 1: Getting Recovery Started “We’ve been on site 24 by seven, working with a couple of teams of people shifting through the days. We’ve got about 18 customers that were directly affected, twelve of which have required some real, in-depth technical support.” Legato quoted by KCBS, Sept 2001 • Legato CRITSIT Team Activated • DR Command Center established in NJ and staffed with 18 technical personnel by end of Day Two

  11. One-Dimensional Protection isNot Sufficient • What Worked: • Data Restore from Tape • What Was a Challenge: • Replacing and Reconfiguring Data Center Systems and Storage Hardware • Locating the Tapes • Replacing Department and Desktop Systems • Running the Business Until Recovery & After

  12. Data is Not Your Only Business-Critical Resource • “It’s not just that you simply install the backup software and then restore your data. There’s a lot more to it than that,” [Legato CRITSIT team member] said. “We have some vendors that we work with that extract data off of damaged disk drives, but in this case, there are no disk drives to go get. This stuff is gone. It’s unbelievable. There’s no recovery of equipment whatsoever. We are focusing on restoring data that was stored off site on tapes.” KCBS, Sept 2001

  13. Rebuilding the Servers • Most customers had not created printouts and other documentation recording system configurations & recovery procedures • Those that had, had not archived them off-site • No bare-metal recovery protection in place • Customers found that they needed to recover departmental servers and even desktop systems • Caused improvisation and trial-&-error recovery, which added confusion, stress, and time

  14. Rebuilding Storage • Legato is coordinating with several other companies to make the retrieval project a success. “We’re all working together to create new data centers for these guys.” • KCBS, Sept 2001 • Partners EMC, Compaq, NetApp, IBM, and Sun worked with Legato

  15. Locating the Tapes • Time wasted to find the exact location of tens of thousands of tapes • Time wasted to sort and read tapes to discover which were the most recent • In some cases, less recent data was recovered than desired just to get the application servers restarted • Among 18 customers with whom Legato worked, the CRITSIT team estimated about 30% of data was lost because it had not been backed up or because it had not been rotated off-site quickly enough

  16. Hard Questions AboutDR Planning • “In the past, customers have contracted for traditional disaster recovery services mostly as an insurance policy against a natural disaster destroying their physical facility… The heightened awareness regarding possible forms of terrorism… is leading organizations to discuss what constitutes “protection”… and what they currently own and operate that absolutely must be salvaged for the business to survive.” • IDC Flash, October 2001

  17. Potential Affects Require More Comprehensive Safeguards • Complete Loss of Infrastructure • Hot-site or warm-site recovery • Distributed operations • Vacated Data Centers • Remote operations • Contaminated Data Centers • Chemical and biological disaster planning • Telecommunication Outages • Verizon estimated between 9000 and 14000 businesses were left without service (IDC Flash) • Internet fared better

  18. Chilling Reality FBI Seals American Media Building. Boca Raton, FLA October 2001

  19. Key Lessons Learned • Avoid systems with single points of failure • Favor distributed networks • Consider the risks associated with big city concentrations where damage to a single industry can be significant • Fully and regularly test all back-up sites - take nothing for granted • Document system configurations • Document disaster recovery procedures • Safeguard documentation just as you safeguard data • Document and track tape media • Consider fully which systems are business-critical, and protect them all • Consider hot- and warm-site hosting services • Consider working with professional DR consultants

  20. Highest 2 1 • Planning for worst case scenarios only (1) • Planning for the most likely scenarios only (2) Impact None 0% Probability 100% Key Mistakes to Avoid • Insufficient or no testing • Little or no management buy in • Lack of ownership of the issue • Focusing planning on contingencies after they happen, rather than how to avoid them • Selecting too narrow a range of tools to execute the plan

  21. Disaster Recovery: Key Questions • Infrastructure Recovery • What are your key systems, networks, and storage? • How long can you afford to have them down? • Can a NEW system administrator rebuild them ? • Data Recovery • Is all key data protected to tape regularly? • Are tapes rotated frequently by policy and is that policy enforceable? • Can a NEW system administrator quickly find the tapes required to restore key operations? • Application Recovery • What is the process of restarting and operating your application environment? • Would you benefit from automation tools?

  22. Business Continuance: Key Questions • Data Protection • Is all key data protected site-to-site? • Is one secondary copy sufficient, or do you need to protect against original-site data corruption? • Is one secondary site sufficient, or do you need: • Site within 60KM to ensure synchronized data • Site at a greater distance with asynchronous data • Operational Recovery • What is the process of restarting and operating your application environment? • Would you benefit from automation tools? • Outsourcing • Do you have sufficient secondary infrastructure? • Do you have, or want to have, expertise?

  23. Rebuild Infrastructure Restart Applications Restore Data Disaster Recovery Software Options Disaster Recovery = Recovering the Failed Site Bare Metal Recovery Automation Tape & Media Tracking Automation Backup/Recovery Software Availability Management Software

  24. Recovery Defining Disaster Recovery versus Business Continuance Business Continuance = Hosting the Business through Recovery Hot- or Warm-Site Data Replication Availability Management Software

  25. Disaster Recovery and Business Continuance: Key Requirements • Does it fit your environment? • Servers, Applications, Data, Storage, Networks • Does it offer remote management? • Is it easily automated? • Does it all work together with other solutions that you need? • Does it integrate with solutions from other vendors that you have selected? • Is it well supported by the vendor? • Can you get on-site support from the vendor in the event of a catastrophic disaster? • Can you get informed consulting to help you plan disaster protection using the tools that you’ve selected? Deploy?

More Related