1 / 22

Disaster Recovery

Disaster Recovery. Broad Team – UCSD, UCOP, and others! (special credit to Kris Hafner & Elazar Harel) Presenter - Paul Weiss – Executive Director UCOP/IR&C Paul.weiss@ucop.edu. March 9-11, 2009 • Long Beach, CA • cenic09.cenic.org. Agenda. Business view and background as to how and why

yehudi
Download Presentation

Disaster Recovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Disaster Recovery Broad Team – UCSD, UCOP, and others! (special credit to Kris Hafner & Elazar Harel) Presenter - Paul Weiss – Executive DirectorUCOP/IR&C Paul.weiss@ucop.edu March 9-11, 2009 • Long Beach, CA • cenic09.cenic.org

  2. Agenda • Business view and background as to how and why • The services portfolio • Technical details • Network implications • Lessons learned, going forward RIDING THE WAVES OF INNOVATION • cenic09.cenic.org

  3. Situation as of 2Q2006 • UCSD had almost no DR plan in place • UCOP used IBM contract in Colorado • Cost $200k / yr + $600k/month if ever used • Had insufficient gear and network reserved, cautiously estimate would be > 50% more cost if updated appropriately • 40 hrs of testing / year limit, difficult to schedule • RPO (Recovery Point Objective) <= 7 days • RTO (Recovery Time Objective) <= 3 days • Required UCOP personnel to activate and operate • Past testing indicated decent mainframe recovery plan in place, limited distributed system capability RIDING THE WAVES OF INNOVATION • cenic09.cenic.org

  4. DR Concept • UCOP required shorter RPO & RTO • Found trusted partner (UCSD) • Willingness to be “married” • Technical choices • Change management – ongoing • One “team” • Common principles • Use the WAN “stupid” RIDING THE WAVES OF INNOVATION • cenic09.cenic.org

  5. Keys to Approach • Buy enough storage, synchronize data in real or near real time, avoid loading data during an actual DR event • Mainframe – CBU option and buy memory • Other servers – buy sufficient gear to have capacity available to run at either location without having to repurpose servers during event • Must be able to test and retest – DR is not STATIC! The decision to do it! RIDING THE WAVES OF INNOVATION • cenic09.cenic.org

  6. Advantages of this Approach • Costs for UCOP are comparable to old DR plan • Costs for UCSD are <50% of a vendor solution • Capability is dramatically improved • RTO and RPO < 1 day (and will be far less) • Can test as often as needed (we need it!) • Equipment is there and operational • More services can be “easily” added (and have!) after the initial investment and can optimize over time • UC personnel “on other side” will assist in case of disaster, long term goal is to recover without any personnel from down location immediately available RIDING THE WAVES OF INNOVATION • cenic09.cenic.org

  7. Initial Critical Success Factors • UCOP assigned .5 FTE staff dedicated to drive effort • One Team – UCOP and UCSD • Agree to basic principles, including $$$ • Fight scope creep • Engage procurement personnel • Communicate, communicate, communicate • Test, Test, Test • The WAN! RIDING THE WAVES OF INNOVATION • cenic09.cenic.org

  8. Current UCOP to UCSD DR Portfolio • All Mainframe services (including 9 (and soon to be 10) PPS instances & UCRS) • AYSO and all Benefits services • Endowment and Investment Accounting System • Active Directory • VPN • Email & File sharing • Web Servers • Banking/Treasury Systems • Loan Programs • Risk Services RIDING THE WAVES OF INNOVATION • cenic09.cenic.org

  9. The Picture - Part I UCOP UCSD RIDING THE WAVES OF INNOVATION • cenic09.cenic.org

  10. Current UCSD to UCOP DR Portfolio • All Mainframe services (including HR, financial and student transactional backend systems) • All Web Based systems for HR/PPS, Financial, Student, Telecommunications billing, etc. • Google search appliances • Multi terabyte data warehouse • Multi terabyte production data for all mainframe and open systems • Dev and QA testing data and LPAR’s for mainframe applications • Stand Alone systems for Intl. Student tracking, Audit, Coeus, and DARS systems RIDING THE WAVES OF INNOVATION • cenic09.cenic.org

  11. Future UCSD to UCOP DR Portfolio Portal/CMS backup for campus, business and student portals Single Sign-on, roles, affiliates authentication/authorization failover VPN Active Directory Domain controllers Core MTA (Ironport for now) Blackberry Mailing lists Mailbox machines 11 RIDING THE WAVES OF INNOVATION • cenic09.cenic.org

  12. The Picture - Part II UCOP UCSD RIDING THE WAVES OF INNOVATION • cenic09.cenic.org

  13. Then it got interesting As positive word got out, more locations and functional areas realized that DR was achievable So… RIDING THE WAVES OF INNOVATION • cenic09.cenic.org

  14. Other DR services in place or committed too • UC Effort Reporting System (3Q2009) • UCOP Office of Technology Transfer Informix DB • UCOP IDP Shibboleth Server • UC Replacement Security Number (RSN) • UCOP TSM Server • UC Pathways (3Q2009) • UCSD Med Mainframe, PPRC • UCSB Distributed DNS Server • UCLA Continuing Education of the Bar • UCSD External Relations • UCDC File Server • Irvine Secondary DNS and Web Server • SD Coastal Data Information Program RIDING THE WAVES OF INNOVATION • cenic09.cenic.org

  15. And a Special Case! UCSB mainframe load Four Steps: DR from UCSB to UCOP utilizing PPRC Do failover test to UCOP, if fully successful, keep production at UCOP DR from UCOP to UCSD - trivial Turn off UCSB mainframe 15 RIDING THE WAVES OF INNOVATION • cenic09.cenic.org

  16. The Picture - Part III UCI San Diego Coastal UCOP UCSD UCSD External Relations UCSDMC UCDC UCSB UCLACEB RIDING THE WAVES OF INNOVATION • cenic09.cenic.org

  17. Services being Considered • UCOP California Institute for Energy and Environment • UCLA Med PPRC And what’s next? Broader discussions are now occurring, not just w/ UCOP, but between more and more UC players – nice “halo” effect with many leveraging the WAN! RIDING THE WAVES OF INNOVATION • cenic09.cenic.org

  18. Technical Details • SD & OP (and SB & SDMC) purchased comparable HW • IBM SAN & Cisco SAN switches, supports global mirroring (PPRC – Peer to Peer remote copy) • Mainframe – memory upgrade and CBU option – must have sufficient capacity on both sides to support total load • Worked through CENIC and local network teams to set up appropriate links for PPRC to ensure throughput • Wrote (and are writing) special monitoring tools • Setup remote tape capabilities so we don’t have to use outside vendor for offsite storage on tape copies • You need to remember that this hardware needs to be in normal refresh cycle just like hardware on your primary floor RIDING THE WAVES OF INNOVATION • cenic09.cenic.org

  19. Network concerns • Frame size • For lowtraffic, default end to end of 1500 bytes – works fine • OP/SD (more traffic) had to move into “jumbo frames” – 2300 bytes seems to work • On HPR today, need to move to DC • @ OP – likely upgrade to 10Gb, at 1 Gb now • Must refine SLA’s & due diligence • Acceptable catch up (RPO issue) • Better understanding of traffic RIDING THE WAVES OF INNOVATION • cenic09.cenic.org

  20. Network Layout

  21. Implications due to “Success” • OP WAN capacity connection upgrade • Change management is a lot more complicated • Some technical “lock in” • Insufficient documentation and test plans – even now. • Better monitoring tools required • Org processes can be stressed RIDING THE WAVES OF INNOVATION • cenic09.cenic.org

  22. Lessons Learned • WAN is an underutilized/unrecognized asset • Geography is less of an inhibitor then many believe • This project will never be completed • Can/should continuously optimize this over time (examples – virtualization, better sharing) • Adding DR capability is easier after initial heavy lifting - e.g. Mainframe RIDING THE WAVES OF INNOVATION • cenic09.cenic.org

More Related