1 / 22

LIBI: The Lightweight Infrastructure-Bootstrapping Infrastructure

LIBI: The Lightweight Infrastructure-Bootstrapping Infrastructure . Joshua Goehner and Dorian Arnold University of New Mexico (In collaboration with LLNL and U. of Wisconsin. A Little Preaching to the Choir. Key statistics from Top500 (Nov. 2011) 149/500 ( 30% ) more than 8,192 core

talia
Download Presentation

LIBI: The Lightweight Infrastructure-Bootstrapping Infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LIBI: The Lightweight Infrastructure-Bootstrapping Infrastructure Joshua Goehner and Dorian Arnold University of New Mexico (In collaboration with LLNL and U. of Wisconsin

  2. A Little Preaching to the Choir • Key statistics from Top500 (Nov. 2011) • 149/500 (30%) more than 8,192 core • In 2006, this number was 12/500 (2.4%) • 7 systems: more than 64K cores • 6 systems: between 64K and 128K cores • 3 systems: more than 128K cores • Later this year: LLNL’s Sequoia w/ 1.6Mcores • 2018?: Exascale systems w/ 107 or 108 cores Software systems must scale as well!

  3. Software Infrastructure-bootstrapping • Before bootstrapping: • Program image on storage device • Set of (allocated) computer nodes • After bootstrapping: • Application processes started on computer nodes • Application’s configuration complete • ready for primary operation Given a node allocation, start infrastructure's composite processes and propagate necessary startup information.

  4. “I gotta get my application up and running!”

  5. (1) “First, I start all the processes on the appropriate nodes”

  6. (2) “Next, I must disseminate some initialization information”

  7. Bootstrapping is complete when theinfrastructure is ready for steady-state usage.

  8. Sequential instantiation e.g. rsh or ssh-based Point-to-point propagation Basic Bootstrapping 70 seconds to start 2K processes!

  9. More Scalable Approaches • Infrastructure-specific, scalable mechanisms • Still limited by sequential operations • Not portable to other infrastructures • Using high-performance resource managers • Myriad interfaces • No communication facilities • Generic bootstrapping infrastructures • LaunchMON: targets tools with wrapper for existing RMs • ScELA: sequentially launch agents that create local processes

  10. MRNet Sequential-ish Bootstrapping • Parent creates children • Localfork()/exec() • Remotersh-based mechanism • Integrated instantiation and information propagation • MRNet’s“standard”

  11. MRNet XT (hybrid) process launch • Bulk-launch 1 process per node • Process launches collocatedprocesses • Information propagated insimilar fashion

  12. LIBI Approach Large Scale Distributed Software • LIBI: Lightweight infrastructure-bootstrapping infrastructure • Name is no longer a work in progress  • Generic service for scalable distributed software infrastructure bootstrapping • Process launch • Scalable, low-level collectives Debuggers System Monitors Applications Performance Analyzers Overlay Networks LIBI LaunchMON Job Launchers Communication Services SLURM OpenRTE COBO rsh/ssh MPI ALPS

  13. LIBI API • session: set of processes (to be) deployed • session master manages other members • session front-end interacts with session master • LIBI currently supports only master/member communication • host-distribution: where to create processes • <hostname, num-processes> • process distribution: how/where to create processes • <session-id, executable, arguments, host-distribution, environment>

  14. LIBI API (cont’d) launch(process-distribution-array) • instantiate processes according to input distributions [send|recv]UsrData(session-id, msg) • communicate between front-end and session master broadcast(), scatter(), gather(), barrier() • communicate between session master and members

  15. Example LIBI Front-end front-end( ){ LIBI_fe_init(); LIBI_fe_createSession(sess); proc_dist_req_t pd; pd.sessionHandle = sess; pd.proc_path= get_ExePath(); pd.proc_argv= get_ProgArgs(); pd.hd= get_HostDistribution(); LIBI_fe_launch(pd); //test broadcast and barrier LIBI_fe_sendUsrData(sess1, msg, len ); LIBI_fe_recvUsrData(sess1, msg, len); //test scatter and gather LIBI_fe_sendUsrData(sess1, msg, len); LIBI_fe_recvUsrData(sess1, msg, len); return 0; }

  16. Example LIBI-launched Application session_member() { LIBI_init(); //test broadcast and barrier LIBI_recvUsrData(msg, msg_length); LIBI_broadcast(msg, msg_length); LIBI_barrier(); LIBI_sendUsrData(msg, msg_length) //test scatter and gather LIBI_recvUsrData(msg, msg_length); LIBI_scatter(msg, sizeof(rcvmsg), rcvmsg); LIBI_gather(sndmsg, sizeof(sndmsg), msg); LIBI_sendUsrData(msg, msg_length); LIBI_finalize(); }

  17. LIBI Implementation Status Large Scale Distributed Software • LaunchMON-based runtime • Tested SLURM or rsh launching • COBO PMGR service Debuggers System Monitors Applications Performance Analyzers Overlay Networks LIBI LaunchMON Job Launchers Communication Services SLURM OpenRTE COBO rsh/ssh MPI ALPS

  18. LIBI Early Evaluation • Goal to demonstrate ease-of-integration and performance/scalability enhancements • Microbenchmark • Time to instantiate a set of processes • Time to broadcast 128 bytes followed by barrier • Time to scatter and gather 128 bytes

  19. LIBI Microbenchmark Results

  20. LIBI Microbenchmark Results

  21. MRNet/LIBI Integration • MRNet uses LIBI to launch all MRNet processes • Parse topology file and setup/call LIBI_launch() • Session front-end gathers/scatters startup information • Parent listening socket (IP/port)

  22. Some Future Work • STAT over MRNet over LIBI performance results • Basic, scalable rsh-based mechanism • Hybrid (XT-like) launch approach • Mechanisms to alleviate filesystem contention • Like our scalable binary relocation service • More flexible process and host distributions • Instantiating different images in same session • Integrating allocating and launching • Integrated scalable communication infrastructure

More Related