myops
Download
Skip this Video
Download Presentation
MyOps

Loading in 2 Seconds...

play fullscreen
1 / 23

MyOps - PowerPoint PPT Presentation


  • 115 Views
  • Uploaded on

MyOps. An Operational Framework for PlanetLab Deployments. Outline. Objective of MyOps Current status Future ideas Questions at any time. Example of Feedback. Objective : Close Operational Cycle. System - Provides service (slice) Monitoring - Feedback from running system

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' MyOps' - cyrah


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
myops

MyOps

An Operational Framework for PlanetLab Deployments

outline
Outline
  • Objective of MyOps
  • Current status
  • Future ideas
  • Questions at any time
objective close operational cycle
Objective : Close Operational Cycle
  • System - Provides service (slice)
  • Monitoring - Feedback from running system
  • Operator - Interpret feedback into tasks
  • Management - Control running system
challenges break down
Challenges: Break-down
  • System may not deliver service
  • Monitoring not observe useful metrics
  • Operator may not know
    • how to interpret observations
    • how to control the system
    • what the service goals are
  • Management may not control system
requirements for operational systems
Requirements for Operational Systems
  • Satisfy Minimal Conditions
    • Physical Integrity
    • Interconnectivity
    • Controllable
    • Provide a Service
  • Two requirements
    • Reliably reach the final condition
    • When failures occurs, repair or report automatically
  • Two approaches in MyOps
    • Precise bootstrap stages (not discussed)
    • Operational monitoring & management in platform
monitoring types
Monitoring Types

Open-loop monitoring

  • Identify the unknown
  • More information, fine-grained

Operational monitoring (closed-loop)

  • Correctness
  • Less information, coarse-grained
  • Actionable
management types
Management Types

Open-loop management

  • Bootstrap/Deploy from the ground up
  • Inefficient, coarse-grained
  • No feed-back

Operational management (closed-loop)

  • Tweak the system to correct behavior
  • More efficient, fine-grained
example
Example
  • Observe: Node is Off-Line
  • Control: Attempt to Power-On
  • Observe: Node is On-line but Failed to boot
  • Observe: Failed to boot Error
  • Control: Create ticket & Send email to local contact
  • Time passes
  • Control: Disable slice creation
  • Observe: Local contact responds
  • Observe: Node is Power-on and Running
  • Control: Re-enable slice creation
  • Contro: Close ticket
history of planetlab operations
History of PlanetLab Operations

Open-loop Monitoring with Open-loop Management

  • Collect fine-grained statistics using CoMon
  • Act with coarse-grained operations (e.g. Reinstall)
  • Manual bridge between the two

Moving towards Closed-loop Operations

  • Collect targeted metrics
  • Take directed, problem-specific actions
  • Automate actions based on policy
planetlab operations
PlanetLab Operations
  • Close the monitor/management cycle
  • Direct automation of common operations
  • Indirect through remote contacts and incentives
myops architecture
MyOps Architecture
  • Collection from Node
  • Translated by policy to Automated action
myops architecture1
MyOps Architecture
  • Collection from Node
  • Send notice to Local contact to take action
myops architecture2
MyOps Architecture
  • When there is no response
  • Indirect influence with incentives
collection
Collection
  • Operational monitoring specific targets, such as:
    • Boot status, Filesystem status
    • DNS - internal and external
    • RPMs
    • System services, etc
  • Periodic collection
    • Coarse-grained collection at a human-timescale
    • Time-series of events and status
policy
Policy
  • Constraints over a time-series of events
  • To satisfy a constraint
    • Automated action
    • Send notice
    • Apply incentive
  • Policy defines
    • Preferred status of system
    • Frequency of actions
    • Magnitude of incentives
automation
Automation
  • Automatic correction of common bootstrap problems
    • Communication errors with MyPLC
    • Corrupt filesystem repair
    • Retry when state is unknown
    • PCU Reboot
    • Reinstall
  • Automation Notices
    • Bad disk
    • Minimal hardware
    • Bad DNS
    • Bad node configuration
notices incentives
Notices & Incentives
  • Notices are indirect paths to node management
    • Node down / online / specific problem (i.e. DNS, disk)
    • Site down / online
    • Privilege reduced / restored
    • PCU errors
  • The incentives on MyPLC
    • Sites 10 slices
    • Disable slice creation
    • Disable running slices
validation of notices incentives
Validation of Notices & Incentives

A

B

C

D

E

Kernel Bug

Fix

Fix2

Notice Bug

Fix

future ideas
Future Ideas
  • Generalize Configuration
    • Collect from multiple sources
    • Expose policy
    • Act on multiple targets
  • Self-monitoring
  • Positive Incentives
    • Special access to services
    • Additional resources (Slices, Bandwidth, CPU, etc)
ad