1 / 20

Abdul Aziz Habib Ammari Pearl Thomas Vamsi Krishna

The Prospero Resource Manager: A Scalable Framework for Processor Allocation in Distributed Systems. Abdul Aziz Habib Ammari Pearl Thomas Vamsi Krishna. Introduction. Poor performance of conventional techniques (Parallel Vs Distributed) Prospero Resouce Manager (PRM)

dylan
Download Presentation

Abdul Aziz Habib Ammari Pearl Thomas Vamsi Krishna

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Prospero Resource Manager: A Scalable Framework for Processor Allocation in Distributed Systems Abdul Aziz Habib Ammari Pearl Thomas Vamsi Krishna

  2. Introduction • Poor performance of conventional techniques (Parallel Vs Distributed) • Prospero Resouce Manager (PRM) • Resource management techniques should scale: • numerically • geographically • admisintratively

  3. Introduction- cont’d • Prospero Perspective: Multiple Resource Managers • System Manager • Job Manager and • Node Manager

  4. Program Execution 5 Program Loading Common Libraries 4 Task to Processor Mapping 3 Processor Selection/Allocation 2 Configuration of Environment 1 Contemporary Approaches • Phases of execution • Distributed Environment: List of available nodes • Locus, NEST, Sprite, and V support processors allocation and remote program loading

  5. Contemporary Approaches - cont’d • Locus: environment of initiating process • NEST: advertise availability • Sprite: shared file as a centralize database • V: server selects least loaded node • UCLA Benevolent Bandit Laboratory (BBL) • DQS and Lsbatch • Parallel Virtual Machine (PVM) and Net-Express

  6. Scalable Resource Management • Virtual System Model: new model for organizing large distributed systems • Access of a subset of resources • Hiding the mapping of resources to physical locations • Partition of the resource management functions • System manager • Job manager • Node manager

  7. Scalable Resource Management (con’t) • System managers • Managing subsets of resources (processors) • Hierarchical concept (layers of system managers) • Maintaining all information about resources • Reacting to status updates (node managers) and resources requests (job managers) • Assigning suitable resources upon requests, notifying job manager, node managers responsible for each resource (only a subset of the requested resources can be assigned)

  8. Scalable Resource Management (con’t) • Job manager • Agent for tasks in a job • One job manager per job • Part of a job and aware of requirement and communication patterns of the managed tasks • Support fault-tolerant and real-time applications • debugging and performance tuning

  9. Scalable Resource Management (con’t) • Identification of job’s resource requirements (job initiated) • Locating system managers and sending allocation requests • Monitoring the execution of the program

  10. Scalable Resource Management (con’t) Node manager • Receiving messages from the system manager (identifying job managers to load, execute programs) • Notifying the job manager about events (termination and failure of tasks) • Informing the system manager about availability of the node for assignment • Caching information needed to direct messages for other tasks to the node on which the task runs

  11. Implementation : Introduction • Prospero Resource Manager (PRM) Implementation • - Runs on a collection of work stations (Sun-3, HP 9000/700 etc.) • - Workstations connected by LAN/WAN • - Supports heterogeneous execution environment • - The system manager can manage nodes of more than one processor type • - Enables the user to place constraints (type, location etc) through job configuration options. • - Also supports parallel and remote sequential applications

  12. Program Loading and I/O • PRM supports explicit loading of files when the nodes assigned to jobs don’t share common file system • - Performed by transferring the executables to the node’s local file system • - File I/O task handles access to files on the user’s local system • - A task has exclusive read/write access to a shared file • Terminal I/O task supports interactive execution • - Users can customize the task for job initialization functions such as interactive inputs and assigning inputs to appropriate task

  13. Communication Libraries • Communication Library Functions • - Provides routines for sending, receiving and broadcasting tag messages • - Commonly used routines made available through set of macros & functions • - Provides routines for message passing, buffer manipulation, process control • data packing and unpacking • Approach • - ARDP protocol is used to transmit and receive sequence packets

  14. Job Manager Supporting program development • Supports debugging of parallel applications • - Check point and replay approaches used • - Programs can be restored to their past states • - Tasks maintains a log of communications activities • - Task monitor exist for each task • - Individual task can be replayed in isolations

  15. Performance • Communication Latencies • PVM library over ARDP Vs PVM ver 3.2.6 • Resource Allocation performance of PRM • Test Bed • SPARC-10s connected to ethernet • Exclusive machines • SunOS 4.1.3 with improved time facility • pvm_send() & pvm_recv()

  16. Wide Area Network Simulation • Latency of 0msec, 10msec, and 100msec • USC, USC-ISI, ISI-MIT Table 1 : Average Time (in msecs) to execute a pvm_send() – pvm_recv() pair Table 2: Average time (in msecs) to execute a pvm_mcast() and matching pvm_recv() pair

  17. Resource Allocation Results Table 3 : Allocation time as a function of the number of nodes allocated Table 4: Allocation time as a function of the number of system managers from which resources are requested. A total of 8 nodes were allocated in each case.

  18. Future Directions • Alternative job managers • fault-tolerant and real time applications • Node manager • part of kernel • compiler generated resource list • preemptive scheduling of tasks • Integrated set of tools for developing and executing parallel and distributed applications • Security

  19. Conclusion • Prospero : A different approach to resource management. • Scalable • provides framework for development and execution of parallel and distributed applications

  20. Questions ? & Comments!

More Related