1 / 22

Virtual mpirun

Virtual mpirun. Jason Hale Engineering 692 Project Presentation Fall 2007. Rational. Compute cycles = money Mimosa (250 nodes): $.06 per CPU hour Wasted CPU Cycles -> Wasted Money Wasted User Time -> Less Research Not all parallel computations run efficiently

rainer
Download Presentation

Virtual mpirun

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Virtualmpirun Jason HaleEngineering 692Project Presentation Fall 2007

  2. Rational • Compute cycles = money • Mimosa (250 nodes): $.06 per CPU hour • Wasted CPU Cycles -> Wasted Money Wasted User Time -> Less Research • Not all parallel computations run efficiently • Goal of a Supercomputing Center:Have users run on the max number of CPUS/Nodes they can utilize efficiently

  3. MCSR Initiatives to Improve Utilization • g03sub • Enhanced (virtualized?) wrapper for users submitting Gaussian calculations • Back-end Processes to poll PBS batch scheduler to compute utilization of parallel jobs; post to DB & Web; e-mail inefficient Users • Amber Alert System

  4. These Systems Don’t Work for Mimosa Cluster • PBSPro can’t accumulate CPU usage times from parallel processes distributed across compute nodes • Idea: Create a monitor process that will follow parallel processes to nodes, monitor their CPU performance, and report back. • Virtualization: Users will not know about the process. They will launch a virtual mpirun (or g03sub), not realizing that is not the “real” one, and it will launch the real one along with the monitor

  5. Running an MPI Program on a Cluster Head Node myscript.pbs monitor.exe myprogram.c myscript.pbs myprogram.exe cc myprogram.c –o myprogram.exe monitor.exe myprogram.exe myprogram.exe monitor.exe monitor.exe qsub myscript.pbs Virtual mpirun mpirun –np 4 monitor.exe & mpirun –np 4 myprogram.exe myprogram.exe myscript.pbs #PBS –l nodes=4 mpirun –np 4 myprogram.exe monitor.exe myprogram.exe Compute Nodes

  6. Design Goals • Collect CPU utilization stats on cluster calculations • No changes to user end processes • No significant performance degradation • No side effects (Leave No Trash Behind) • Monitor even non-MPI parallel codes (Gaussian 03) • Generality and robustness for reuse potential

  7. Components • monitor (new C++ MPI program) • mpirun • New wrapper around existing mpirun • Calls existing monitor and “real” mpirun • g03sub • existing batch script to launch Gaussian jobs on cluster • MCSR’s version previously “virtualized” • modify to now call monitor program also

  8. monitor.exe myprogram.exe monitor.exe myprogram.exe monitor.exe myprogram.exe monitor.exe myprogram.exe Worker Processes Manager Process

  9. monitor.exe myprogram.exe monitor.exe myprogram.exe monitor.exe myprogram.exe monitor.exe myprogram.exe Worker Processes Manager Process

  10. monitor.exe myprogram.exe Worker Process Logic Manager Process Worker Processes monitor.exe myprogram.exe While (NoTerminationMessageFromMaster) Sleep Wakeup Create Process Times File Read Process Times File If (ActiveProcesses) Update Process Times Data Structure SendCPUTimeMessageToMaster Else SendIdleMessageToMaster End If/ElseEnd WhileTerminate monitor.exe myprogram.exe monitor.exe myprogram.exe

  11. monitor.exe myprogram.exe Worker Process Logic Manager Process Worker Processes monitor.exe /tmp/ps_file myprogram.exe While (NoTerminationMessageFromMaster) Sleep Wakeup Create Process Times File Read Process Times File If (ActiveProcesses) Update Process Times Data Structure SendCPUTimeMessageToMaster Else SendIdleMessageToMaster End If/ElseEnd WhileTerminate monitor.exe /tmp/ps_file myprogram.exe monitor.exe /tmp/ps_file myprogram.exe

  12. monitor.exe myprogram.exe Worker Process Logic Manager Process Worker Processes monitor.exe /tmp/ps_file myprogram.exe While (NoTerminationMessageFromMaster) Sleep Wakeup Create Process Times File Read Process Times File Delete Process Times File If (ActiveProcesses) Update Process Times Data Structure SendCPUTimeMessageToMaster Else SendIdleMessageToMaster End If/ElseEnd WhileTerminate monitor.exe /tmp/ps_file myprogram.exe monitor.exe /tmp/ps_file myprogram.exe

  13. monitor.exe myprogram.exe Worker Process Logic Manager Process Worker Processes monitor.exe /tmp/ps_file myprogram.exe While (NoTerminationMessageFromMaster) Sleep Wakeup Create Process Times File Read Process Times File Delete Process Times File If (ActiveProcesses) Update Process Times Data Structure SendCPUTimeMessageToMaster Else SendIdleMessageToMaster End If/ElseEnd WhileTerminate monitor.exe /tmp/ps_file myprogram.exe monitor.exe /tmp/ps_file myprogram.exe pid cputime123 06s 124 12s 130 29s = 47s total

  14. monitor.exe myprogram.exe Worker Process Logic Manager Process Worker Processes monitor.exe myprogram.exe While (NoTerminationMessageFromMaster) Sleep Wakeup Create Process Times File Read Process Times File Delete Process Times File If (ActiveProcesses) Update Process Times Data Structure SendCPUTimeMessageToMaster Else SendIdleMessageToMaster End If/ElseEnd WhileTerminate 9 monitor.exe myprogram.exe 47 monitor.exe myprogram.exe pid cputime123 06s 124 12s 130 29s = 47s total

  15. monitor.exe myprogram.exe Worker Process Logic Manager Process Worker Processes monitor.exe Idle While (NoTerminationMessageFromMaster) Sleep Wakeup Create Process Times File Read Process Times File Delete Process Times File If (ActiveProcesses) Update Process Times Data Structure SendCPUTimeMessageToMaster Else SendIdleMessageToMaster End If/ElseEnd WhileTerminate monitor.exe myprogram.exe monitor.exe myprogram.exe

  16. monitor.exe myprogram.exe Manager Process Logic Manager Process While (Active Processes)MONITOR_LOCAL_PROCESSESIf (LocalActiveProcesses) UpdateGlobalCPUTimeStructure UpdateActiveProcessesStructureElse UpdateActiveProcessStructureEndIfForEachSlave WaitForMessage If (CPUMessage) UpdateGlobalCPUTimeStructure UpdateActiveProcessStructure Else If (IdleMessage) UpdateActiveProcessStructure End IfEnd For EndWhile WKR cputime0 25s 1 35s 2 09s 3 47s

  17. Test MPI Script • Parallel Ultimate Virtual Collapse Program • Reads a list of integers from a file • Distributes the integers to all available worker nodes • Each worker computers the ultimate collapse of its numbers • Control the length of processing time by: • Number of numbers in the list (1,000,000) • The size of the numbers in the list (1 to 7 digits) • Control the parallel efficiency by: • The order of the numbers in the list. • Larger numbers grouped together – fewer nodes to most of the work • Large numbers evenly distributed – nodes do about the same work

  18. Project Status • Test Program is Written (Ultimate Collapse) • Monitor program: Partially Complete; Some Work Remains Sleep/WakeupCreate Process Times FileRead Process Times FileDelete Process Times FileIf (ActiveProcesses)Update Process Times Data StructureSendCPUTimeMessageToMaster Else SendIdleMessageToMaster End If/ElseTerminate

  19. ps syntax from monintor.cpp string psCommand (" ps -u " + username + " --no-headers -o pid,cputime,etime,comm,user,c,pcpu | grep -v ps | grep -v sh | grep mpirun | grep -v mon.exe | grep –v grep >> " + myFileName);system(psCommand.c_str());

  20. Example /tmp/ps_file from node pid cputime etime comm user c pcpu 32765 00:00:00 02:32 a.out jghale 0 0.0 32764 00:00:00 02:32 a.out jghale 0 0.0 305 00:00:00 02:32 a.out jghale 0 0.0 300 00:02:31 02:32 a.out jghale 99 99.8

More Related