1 / 15

Cloudpocalypse We put “fail” in failover

Cloudpocalypse We put “fail” in failover. Vlad Mazek, MCSE CEO, Own Web Now Corp vlad@ownwebnow.com facebook.com/ vladmmd @ vladmazek Cell: (407) 536-VLAD. Agenda. Summary of events What to tell your clients about the outage Our current network design What failed?

espen
Download Presentation

Cloudpocalypse We put “fail” in failover

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CloudpocalypseWe put “fail” in failover Vlad Mazek, MCSE CEO, Own Web Now Corp vlad@ownwebnow.com facebook.com/vladmmd @vladmazek Cell: (407) 536-VLAD

  2. Agenda • Summary of events • What to tell your clients about the outage • Our current network design • What failed? • What we are doing to address it

  3. Power Infrastructure

  4. So what failed? ATS Automatic Transfer Switch Electrical switch that reconnects electric power source from it’s primary source to a standby source.

  5. Summary of Events • 12:04 Power failure • 1:34 ATS replacement advised by DC • 2:00 Partial power restored • 4:10 First ETA issued, 6:30 PM • 4:30 Emergency systems start coming online • 4:46 DC offers additional details on the problem • 5:10 Restored Exchange 2010 clusters • 7:10 DC restores power

  6. How this really felt

  7. How this really felt

  8. How this really felt

  9. How this really felt

  10. How this really felt

  11. Impact • This is the first major issue with the Dallas DC in over a decade • We moved our critical systems to Dallas from California and Florida due to the weather and power issues • This has adjusted our roadmap for service delivery

  12. Agenda • Extend LiveArchive to a second DC • Extend Exchange 2010 hosting to additional data centers • Improve our communications across partner networks • Facebook: ExchangeDefender • Twitter: @xdnoc @ExchangDefender

  13. What can I tell my clients? • Power issues happen. • There will be a partial refund. • There is no additional support cost. • The company is going to improve the solution. • The uptime record thus far has been impressive. • Complex systems lead to complex problems and aren’t you glad you don’t have to worry about it?

  14. What next? • Look for an email from me in the morning. • Advise customers about LiveArchive. • Stay tuned for network enhancements. • Keep the issue in perspective: This isn’t Microsoft’s fault or general negligence/incompetence, it’s a massive failure.

  15. Something funny… You know why I don’t trust the cloud? It’s still powered by guys who’s butt cracks show when they squat to fix an electrical issue.

More Related