1 / 12

A First Look at Problems in the Cloud

A First Look at Problems in the Cloud. Theophilus Benson Sambit Sahu Aditya Akella Anees Shaikh. Cloud Computing. self-service. New delivery and consumption model for IT several service models emerging: infrastructure, platform, SaaS Infrastructure-as-a-Service

sumana
Download Presentation

A First Look at Problems in the Cloud

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A First Look at Problems in the Cloud Theophilus Benson Sambit Sahu Aditya Akella Anees Shaikh

  2. Cloud Computing self-service • New delivery and consumption model for IT • several service models emerging: infrastructure, platform, SaaS • Infrastructure-as-a-Service • easily acquire and release virtual servers, storage, bandwidth • no set up cost for users and on-demand usage model • Provider manages the virtual infrastructure resources • New model = new challenges for problem diagnosis • virtualized and abstracted resources • limited visibility into the infrastructure layer locationindependence usage-basedpricing on-demandconsumption rapid elasticity What is required to support problem determination in the cloud environment

  3. Problems in the Cloud • Our objective: develop an understanding of the nature of problems experienced by customers of an IaaS cloud • Understand the types of problems arising in deployment and operation • how do these change over time (longitudinal study) • How users go about solving their problems today • which problems require more help from cloud providers • which problems are persistently difficult to solve • Develop a preliminary characterization of the forum-based cloud support model • see paper for details identify useful mechanisms and best practices forefficient problem resolution in the cloud

  4. Data-driven approach • Study actual user problems and experiences • based on open support forum of a large IaaS Cloud provider • 3 years of message threads • August 2006 through December 2009 • 9,575 reported problems (threads) • Too many for manual classification • Each thread consists of a number of related forum postings • Free text problem symptoms and suggested solutions • Need to impose structure for automated grouping • User id and timestamps • Text analytics to classify reported problems • IR techniques to group problems into clusters • 91% of problems mapped to 20 clusters Problem Description Timestamp User Id/Name Debugging Suggestion 1 Message Body …. Debugging Suggestion N Problem Root cause

  5. Observed problem taxonomy ImageManagement Connectivity Performance VirtualInfrastructure Application Imagebundling Generalconnectivity Instance notresponding Attach/detachvirtual storage Email serversetup Email serversetup Storage and imagemigration Firewall Instance stuckin terminating Virtual loadbalancer Windowslicensing Update/installkernel in img Connecting toapplications Storageperformance DNS andvirtual IP LAMP setup API errors(javaexceptions) Miscconnectivity Connectionperformance Linux SSH key-pair Miscbundling

  6. Size of the Different Classes • Roughly even split across 1st 4 classes • Fewer application problems fewer (typically out of scope for IaaS)

  7. Evolution of Problem Classes • Virtualization grows: release of new features • E.g 3st Quarter 2008: virtual storage • Users unfamiliar with new feature • Maintenance shrinks: release of new tools • E.g 1st Quarter 2007: image manipulation feature • Improves on existing techniques • Connectivity remains stable

  8. Problem Difficulty • Users consult operators for difficult problems • Operators involvement needed for 20-65% of the threads • Performance & virtualization requires the most intervention • Performance: virtualization abstracts away details needed to debug • Virtualization: user can’t change state of provider’s infrastructure

  9. Evolution of Problem Difficulty • Over time less help is needed from the operators • Forum builds up a DB of solutions • In a few cases users still need help from the operators • Resource abstraction limits user control/visibility • The virtualization class is the exception • New features are added: users need help understanding them

  10. Observations Summary • Large fraction of problems solved through self-diagnosis by users • but for some problem classes, significant number require operator involvement • Limited user visibility causes more guesswork and operator involvement • difficult to determine root cause of instance unreachable, unresponsive, etc. • Additional user control for some services could streamline resolution • e.g., inspect or change storage volume state • New features often introduce new problems • proactive release of companion debugging tools

  11. Some Research Issues • How much information to expose to users? • Protect provider infrastructure details • Expose only information relevant to user • Reduce collection/storage overhead • Securing additional user controls • Multitenancy complications • Preventing misuse at the infrastructure level • Making the controls scalable

  12. Conclusion • Empirical study of problems in large provider • Discovered 5 class of problems • Most persistent problems: virtualization & performance • Problems become less difficult as the forum builds up a database of solutions • Suggestion for more effective resolution • Expose more information • Develop specific debug tools for new release

More Related