1 / 0

Lilliput meets Brobdingnagian : Data Center Systems Management through Mobile Devices

3 rd International Workshop on Dependability of Clouds, Data Centers and Virtual Machine Technology (DCDV) Held in conjunction with Dependable Systems and Networks (DSN) Budapest, Hungary June 18, 2013.

merry
Download Presentation

Lilliput meets Brobdingnagian : Data Center Systems Management through Mobile Devices

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3rd International Workshop on Dependability of Clouds, Data Centers and Virtual Machine Technology (DCDV) Held in conjunction with Dependable Systems and Networks (DSN) Budapest, Hungary June 18, 2013

    Lilliput meets Brobdingnagian: Data Center Systems Management throughMobile Devices

    Jan Rellermeyer, Thomas Osiecki, Michael Kistler, Ahmed Gheith SaurabhBagchi, FahadArshad
  2. System Management Workflow Patch Something is wrong!
  3. Systems Management: A Changed View Patch Filtering
  4. So What Exactly Are the Changes? Platform being used for doing the systems management Mobile devices Server Small screen Resource constrained Outside organization’s security perimeter Lowerdependability Large screen Resource rich Within organization’s security perimeter High dependability
  5. So Exactly Are the Changes? Layered systems management to flat hierarchy Filtering
  6. Case Study: IBM Research’s IBM Remote Project IBM Blade Centers User Interface visualization of complex data relevance first drill-down UI Communication direct connection to the managed machines refresh rate vs. power consumption
  7. Case Study: IBM Remote Project
  8. Research Challenges Due To The Changes Platform being used for doing the systems management: Server to Mobile Devices How do we optimize the scarce resources of the systems management platforms? Primarily, battery and communication bandwidth. How do we handle the fact that the platforms will be insecure and fault-intolerant for parts of their operation? How do we visualize the (hopefully) rare failure event in a deluge of systems monitoring data?
  9. Research Challenges Due To The Changes Layered systems management to flat hierarchy Can we avoid chaos due to the looser coordination? Can we leverage overlap between interests to cut down on traffic to individual mobile devices?
  10. Solution Directions for Question 1 Platform being used for doing the systems management: Server to Mobile Devices How do we optimize the scarce resources of the systems management platforms? Primarily, battery and communication bandwidth. Minimize number of messages, while still receiving enough to reliably detect failures Use publish-subscribe or other push mechanism, in preference to pull mechanism BUT: Most hardware management modules do not support push Use an intermediate server for aggregation and filtering Apply principles of rare event detection Non-events occur with much higher frequency than events of interest BUT: Requires model of events: time distribution, correlation, etc.
  11. Solution Directions for Question 1 Platform being used for doing the systems management: Server to Mobile Devices How do we handle mismatch in dependability characteristics (between target platform and management platform)? Mobile device can be physically compromised and OS-level protection can be bypassed Mobile devices are often employee owned Application security and server-side security need to be built in Periodic authentications, not one-time authentications Biometric-based authentication
  12. Solution Directions for Question 1 Platform being used for doing the systems management: Server to Mobile Devices How do we visualize the needle in the haystack? Needle: Outages, failures, or behavior that is indicative of an imminent failure Haystack: Deluge of monitored data about target platforms Screen real estate is limited First off, deliver only a small superset of relevant messages Push notification, such as, through Google Cloud Messaging (GCM) Drill-down views, starting with summary alert view for all machines in data center Followed up with root cause analysis techniques that run on servers
  13. Solution Directions for Question 2 Layered systems management to flat hierarchy, OR Crowdsourcing systems management Tight vertical integration of different software layers implies different domain experts will be concurrently involved in problem troubleshooting Relevant features of social media will be used Example: At IBM, you can “friend” specific Blade Centers and have “circles” of administrators Role-based Access Control (RBAC) can be used for security control of different software layers Fine-grained roles can be assigned RBAC solutions exist for sophisticated management of these roles, such as, hierarchies, overlaps, and trasience
  14. Solution Directions for Question 2 Layered systems management to flat hierarchy, OR Crowdsourcing systems management Overlap between interests of multiple mobile devices and their geographical proximity Commonalities of interest can be used to cut down on cellular bandwidth usage Commonalities can exist due to proximal geographic location or overlap among system administration responsibilities Distribute information to a subset of mobile devices and then use local communication (Bluetooth, Wi-Fi) to disseminate information among proximal devices
  15. Case Study: IBM Remote Health view (left) broken into critical, non-critical, and system-level health messages Event log view (right) is filtered to show only warnings and errors
  16. Related Work Much work on managing mobile devices – opposite direction than what we are discussing in this paper Some work on mobile agents for managing servers [18 – NOMS02, 19 – Software07] Sophistication lies in designing a dynamic set of agents whose monitoring policies can be changed on the fly Some commercial prototypes for monitoring and control of target end points from mobile devices UCSand for Android devices [21] for Cisco Unified Systems monitoring and control PCMonitor [22] from MMSoft Design Ltd. VMWarevCenter Mobile Access [23] is a virtual appliance on the server side for managing a datacenter from mobile devices Recent offering from HP [18]
  17. Take-away Lessons A changed vision of systems management is happening – mobile clients being used to manage large masses of physical and virtual servers This throws open some technical challenges Management to be done through resource-constrained mobile devices which have lower dependability than target devices Crowd-sourcing of systems management, rather than linear flow of control through hierarchies of sysadmins These challenges are being addressed in multiple projects at commercial organizations, including in the IBM Remote project at IBM Research
  18. Presentation available at:Dependable Computing Systems Lab (DCSL) web siteengineering.purdue.edu/dcsl

More Related