1 / 30

Image: xkcd.com

Dependable Cloud Architecture. @ mikewo. Mike Wood. http://mvwood.com. Image: xkcd.com. Tack. @ mikewo. Mike Wood. http://mvwood.com. Questions. “Failure is always an option.”. Image: Discovery Channel, Fair Use. What are we looking for?. Protection From:. Loss of Facilities.

randi
Download Presentation

Image: xkcd.com

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dependable Cloud Architecture @mikewo Mike Wood http://mvwood.com Image: xkcd.com

  2. Tack @mikewo Mike Wood http://mvwood.com Questions

  3. “Failure is alwaysan option.” Image: Discovery Channel, Fair Use

  4. What are we looking for? Protection From: Loss of Facilities Network Failure Hardware Failure Data Corruption Check out: http://bit.ly/wazbizcont Images: Office ClipArt & Godzilla Releasing Corp (Fair Use)

  5. Human Error Image: FOX, Fair Use

  6. What we’re trying to achieve Monitoring Resilient Solutions Image: Cohdra

  7. Cost vs Risk $1, … ,000.00 99.999% To get more 9’s here add more 0’s here. Image: Office ClipArt

  8. Monitoring Image: NASA

  9. Functional Transparency Logging Messages Hardware Health Dependent Services Health Image: Office ClipArt

  10. Telemetry

  11. Analyze your Data Image: NASA

  12. Resilience Image: Office ClipArt

  13. Remember: Failure is always an option. Common Points of Failure • Machine\application crashes • Throttling (exceeding capacity) • Connectivity\Network • External service dependencies Focus less on the uptime of hardware and more about how the solution handles it WHEN something fails!

  14. Try/catch != Resilient privatevoidcreateFile() { stringfileName = @"c:\workingDirectory\someFileName.txt"; try { File.Create(fileName); } catch(DirectoryNotFoundException ex) { Trace.WriteLine( String.Format("Unable to create {0}. {1}", fileName, ex)); throw; } } }

  15. Image: Michael Wood Decompose your system…

  16. Capacity Buffering Content Delivery Networks (CDN’s) Distributed Application Cache Local Content Cache Enables recovery during outages or spikes in load Image: jepler

  17. Always carry a spare 0% Capacity, redirect all load 75% Capacity, half of our load 100% of load, 150% Capacity 75% Capacity, half of our load SYSTEM FAILURE!!! • 50% more capacity then needed • Can absorb of temporary spikes • Time to react if need to add capacity • Over allocated, but still functioning • Degrade, but don’t fail Image: Kevin Rosseel

  18. Request Buffering Queues Retry Policies Async Workloads Image: Joe Shlabotnik

  19. Dept. of Redundancy Dept. • Have a backup, somewhere else • More than one? Cost to benefit Ratio? • Ready State • Hot = full capacity • Warm = scaled down, but ready to grow • Cold = mothballed, starts from zero Image: Mr. White

  20. Redundancy - Its about probability 95% uptime 95% uptime 95% uptime 95% uptime 1 box : 5% downtime or 438hrs per year (that’s 18 ½ days!) 2 boxes : 5/100 * 5/100 = 25/10,000 = 0.25% downtime or 22hrs per year 4 boxes : 5/100 * 5/100 * 5/100 * 5/100 = 625/100,000,000 0.000625% downtime or 3.285 MINUTES per year

  21. Total Outage duration = • Time to Detect • + Time to Diagnose • + Time to Decide • + Time to Act Image: Office ClipArt

  22. Dynamic Addressing & Configuration

  23. What about your data? Image: barrymieny

  24. Image: Michael Wood Availability via Degradation

  25. Virtualization and Automation Images: Gizmodo

  26. Images: Orion Pictures owns Terminator Franchise

  27. The “HI” Point Check out:http://bit.ly/wazinternals Images: Office Clip Art

  28. Image: NASA

  29. “Don't be too proud of this technological terror you've constructed…” • DO: • Root cause analysis • Read other root cause analysis • Plan for failure • ADMIT: • Your Solution WILL failat some point • You can learn from others just as well as yourself • DON’T: • Get cocky • Stick your head in the sand Images: LucasFilm, Fair Use

  30. Tack Questions @mikewo Mike Wood http://mvwood.com http://bit.ly/CloudFailSafe

More Related