1 / 14

Process Management & Monitoring WG

Process Management & Monitoring WG. Quarterly Report June 13, 2002. Components. Process Management Process Manager Checkpoint Manager Monitoring Job Monitor System/Node Monitors Meta Monitoring Data Migration. “Next Steps” From February 2002. Continue to work with the RMWG

lin
Download Presentation

Process Management & Monitoring WG

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Process Management & Monitoring WG Quarterly Report June 13, 2002

  2. Components • Process Management • Process Manager • Checkpoint Manager • Monitoring • Job Monitor • System/Node Monitors • Meta Monitoring • Data Migration PMWG Quarterly Report

  3. “Next Steps”From February 2002 • Continue to work with the RMWG • Continue the interface work for: • Process Manager • Checkpoint Manager • Begin the interface work for: • Job Manager • Monitors • Prototyping and refinement PMWG Quarterly Report

  4. Group Progress • Prototyping before refining interfaces • Job Manager fell into RMWG scope • Today’s demo set as milestone • Node Monitor provides RM components with data needed for scheduling • Process Manager executes jobs as requested by RM components • Conference calls on alternate weeks PMWG Quarterly Report

  5. Component Progress • Checkpoint Manager (LBNL) • Process Manager (ANL) • Monitoring (NCSA) PMWG Quarterly Report

  6. Checkpoint Manager • Defined as a separate component • Process Manager could register as CM • Requirements document published • Current status summary • Early prototype checkpoint capability • Design still evolving • Working with the LAM/MPI team PMWG Quarterly Report

  7. Checkpoint Manager • Serial (intranode) checkpoints • Checkpoint job(s) on a single node • Parallel (internode) checkpoints • Checkpoint a multi-node job • Scalable Systems Checkpoint Manager • XML interfaces PMWG Quarterly Report

  8. Checkpoint Manager • Serial (intranode) checkpoints • System-level for best coverage • Handles serial or parallel jobs • Provides hooks for runtime libraries • Based on vmadump • Full requirements in a technical report • Early prototype exists PMWG Quarterly Report

  9. Checkpoint Manager • Parallel (internode) checkpoints • Works with job control system • Cooperates with the runtime libraries • Working with LAM/MPI team to prototype • Aiming for SC02 demo of prototype • NPBs as optimistic goal • Runtime interfaces due May ‘03 PMWG Quarterly Report

  10. Checkpoint Manager • Scalable Systems Checkpoint Manager • Will provide Scalable Systems interface to the parallel checkpoint capability • Interface only roughly defined • Interface refinement to follow • XML Interfaces due May ‘03 PMWG Quarterly Report

  11. Process ManagerWork at ANL • Narayan Desai… PMWG Quarterly Report

  12. MonitoringWork at NCSA • Mike Showerman… PMWG Quarterly Report

  13. Data Migration • Still no work done here  PMWG Quarterly Report

  14. Next Steps • Prototyping will continue • Interfaces will stabilize • Checkpoint Manager • Process Manager • Monitors • Monitoring data… PMWG Quarterly Report

More Related