1 / 110

PGENESIS Tutorial – GUM’02

PGENESIS Tutorial – GUM’02. Greg Hood Pittsburgh Supercomputing Center. What is PGENESIS?. Library extension to GENESIS that supports communication among multiple processes – so nearly everything available in GENESIS is available in PGENESIS

laquinta
Download Presentation

PGENESIS Tutorial – GUM’02

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PGENESIS Tutorial – GUM’02 Greg Hood Pittsburgh Supercomputing Center

  2. What is PGENESIS? • Library extension to GENESIS that supports • communication among multiple processes – • so nearly everything available in GENESIS is • available in PGENESIS • Allows multiple processes to perform multiple • simulations in parallel • Allows multiple processes to work together • cooperatively on a single simulation • Runs on workstations or supercomputers

  3. History • PGENESIS developed by Goddard and Hood • at PSC (1993-1998) • Current contact: pgenesis@psc.edu

  4. Tutorial Outline • Installation • What PGENESIS provides • Using PGENESIS for parallel parameter searching • Using PGENESIS for simulating large networks more quickly • Scaling up for large runs • A comparison of PGENESIS with alternatives

  5. PGENESIS Installation

  6. Installation: Requirements • At least 1 Unix-like computer on which GENESIS will run. • Same account name on all computers. • If multiple machines are to be used together, then it is best if they are all on the same network segment (e.g. same 100Mbit/s Ethernet switch).

  7. Installation: GENESIS 1. Install regular (serial) GENESIS: • Make sure you have configured serial GENESIS to include all libraries that you will ever want to use with PGENESIS. b. make all; make install c. make nxall; make nxinstall if you want an Xodus-less version of PGENESIS

  8. Installation: ssh 2. Configure ssh to allow process startup across machines without password entry: • You probably already have ssh/sshd. If not, download from http://www.openssh.org and install according to instructions. • Run ssh-keygen –t rsa on each machine from which you will launch PGENESIS to generate private/public keys. • Append all of the public keys (stored in ~/.ssh/id_rsa.pub) to ~/.ssh/authorized_keys on all host on which you want to run PGENESIS processes. • Test: ssh remote_host_name remote_command should not ask you for a password.

  9. Installation: PVM 3. Install PVM message passing library • Download from http://www.csm.ornl.gov/pvm • Modify .bashrc to set PVM_ROOT to where PVM was installed: export PVM_ROOT=/usr/share/pvm3 • Modify .bashrc to set PVM_RSH to the ssh executable: export PVM_RSH=/usr/bin/ssh • Build PVM (“cd $PVM_ROOT; make”) • Test PVM % pvm pvm> add otherhost pvm> halt

  10. Installation: PGENESIS 3. Install PGENESIS package • Download from http://www.genesis-sim.org • cp Makefile.dist Makefile • Edit Makefile • make install • make nxinstall for Xodus-less version

  11. Installation: Simple • Cluster of similar machines • Shared filesystem • Home directory is located on shared filesystem

  12. Installation: Complex • Heterogeneous cluster • Novel processor/OS • No shared filesystems • Custom libraries linked into GENESIS Recommended approach: • Install on each machine independently and make sure PGENESIS works locally before trying to use all machines together

  13. The "pgenesis" Startup Script (1) Purpose: checks that the proper PVM files are in place, starts the PVM daemon, then starts the appropriate PGENESIS executable. Basic syntax: pgenesis scriptname.g

  14. The "pgenesis" Startup Script (2) Options: -config <filename> where <filename> contains a list of hosts to use -debug <mode> where <mode> is one of the following: tty dbx gdb -nox do not use Xodus -v verbose mode -help list the valid pgenesis script flags

  15. PGENESIS Functionality

  16. How PGENESIS Runs in Parallel • Workstation: typically one process starts and then spawns n-1 other processes • mapping of processes to processors is often 1 to 1, but may be many to 1 during debugging

  17. How PGENESIS Runs in Parallel • Massively parallel machines: • all n processes are started simultaneously by the operating system • mapping of processes to processors is nearly always 1 to 1 • On both: • every process runs same script • this is not a real limitation

  18. Nodes and Zones • Each process is referred to as a "node". • Nodes may be organized into "zones". • A node is fully specified by a numeric string of the form “<node>.<zone>”. • Simulations within a zone are kept synchronized in simulation time. • Each node joins the parallel platform using the paron command. • Each node should gracefully terminate by calling paroff

  19. Every node in its own zone • Simulations on each node are not coupled temporally. • Useful for parameter searching. • We refer to nodes as “0.0”, “0.1”, “0.2”, …

  20. All nodes in one zone • Simulations on each node are coupled temporally. • Useful for large network models • Zone numbers can be omitted since we are dealing with only one zone; we can thus refer to nodes as “0”, “1”, “2”, …

  21. Hybrid schemes Parameter searching on large network models Example: The network is partitioned over 8 nodes; we run 16 simulations in parallel to do parameter searching on this model, thus using a total of 128 nodes.

  22. Nodes have distinct namespaces /elem1 on node 0 refers to an element on node 0 /elem1 on node 1 refers to an element on node 1 To avoid confusion we recommend that you use distinct names for elements on different nodes within a zone.

  23. GENESIS Terminology GENESISComputer Science Object = Class Element = Object Message = Connection Value = Message

  24. Who am I? PGENESIS provides several functions that allow a script to determine its place in the overall parallel configuration: mytotalnode - # of this node in platform mynode - # of this node in this zone myzone - # of this zone ntotalnodes - # of nodes in platform nnodes - # of nodes in this zone nzones - # of zones npvmcpu - # of processors in configuration mypvmid - PVM task identifier for this node (all numbering starts at 0)

  25. Styles of Parallel Scripts • Symmetric – Each node executes the same script commands. • Master/Worker – One node (usually node 0) coordinates processing and issues commands to the other nodes.

  26. Explicit Synchronization barrier - causes thread to block until all nodes within the zone have reached the corresponding barrier barrier -wait at default barrier barrier 7 -wait at named barrier barrier 7 100000 -timeout is 100000 seconds barrierall - causes thread to block until all nodes in all zones have reached the corresponding barrier barrierall -wait at default barrier barrierall 7 -wait at named barrier barrierall 7 100000 -timeout is 100000 sec

  27. Implicit Synchronization Two commands implicitly execute a zone-wide barrier: step - implicitly causes the thread to block until all nodes within the zone are ready to step (this behavior can be disabled with “setfield /post sync_before_step 0”) reset - implicitly causes the thread to block until all nodes have reset These commands require that all nodes in the zone participate, thus the barrier.

  28. Remote Function Calls (1) An "issuing" node directs a procedure to run on an "executing" node. Examples: some_function@2 params... some_function@all params... some_function@others params... some_function@0.4 params... some_function@1,3,5 params...

  29. Remote Function Calls (2) • Each remote function call causes the creation of a new thread on the executing node. • All parameters are evaluated on the issuing node. Example: if called from node 1, some_function@2 {mynode} will execute some_function 1 on node 2

  30. Remote Function Calls (3) When does the executing node actually perform the remote function call, since we don't use hardware interrupts? • While waiting at barrier or barrierall. • While waiting for its own remote operations to complete, e.g. func@node, raddmsg • When the simulator is sitting at the prompt waiting for user input. • When the executing script calls clearthread or clearthreads.

  31. Threads A thread is a single flow of control within a PGENESIS script being executed. • When a node starts, there is exactly one thread on it – the thread for the script. • There may potentially be many threads per node. These are stacked up, with only the topmost actually executing at any moment. clearthread – yield to one thread awaiting execution (if one exists) clearthreads – yield to all threads awaiting execution

  32. Asynchronous Calls (1) The async command allows a script to dispatch an operation on a remote node without waiting for its completion. Example: async some_function@2 params...

  33. Asynchronous Calls (2) One may wait for an async call to complete, either individually, future = {async some_function@2 ...} ... // do some work locally waiton {future} or for an entire set: async some_function@2 ... async some_function@5 ... ... waiton all

  34. Asynchronous Calls (3) Asynchronous calls may return a value. Example: int future = async myfunc@1 // start thread on node 1 … // do some work locally int result = waiton {future} // wait for thread's result Thus the term "future" - it is a promise of a value some time in the future. waiton calls in that promise.

  35. Asynchronous Calls (4) • async returns a value which is only to be used as the parameter of a waiton call, and waiton must only be called with such a value. • Remote function calls from a particular issuing node to a particular executing node are guaranteed to be performed in the sequence they were sent. • There is no guaranteed order among calls involving multiple issuing or executing nodes.

  36. Advice about Barriers (1) • It is very easy to reach deadlock if barriers are not handled correctly. PGENESIS tries to warn you by printing a message that it is waiting at a barrier. • Examples of incorrect barrier usage: • Each node executes: barrier {mynode} • Each node executes: barrier@all • A single node executes: barrier@others; barrier; However: async barrier@others; barrier will work!

  37. Advice about Barriers (2) • Guideline: if your script is operating in the symmetric style (all nodes execute all statements), never use barrier@ • If your script is operating in the master-worker style, master must ensure it calls a function on each worker that executes a barrier before it enters the barrier • barrier; async barrier@others will not work.

  38. Commands for Network Creation Several new commands permit the creation of "remote" (internode) messages: raddmsg /local_element /remote_element@2 \ SPIKE rvolumeconnect /local_elements \ /remote_elements@2 \ -sourcemask ... -destmask ... \ -probability 0.5 rvolumedelay /local_elements -radial 10.0 rvolumeweight /local_elements -fixed 0.2 rshowmsg /local_elements

  39. Parallel I/O: Display How can one display from more than one node? • Use an xview object. • Add an index field to the displayed elements. • Use the ICOORDS and IVAL1 ... IVAL5 messages instead of the COORDS and VAL1 .. VAL5 messages: raddmsg /src_elems /xview_elem@0 \ ICOORDS io_index_field x y z raddmsg /src_elems /xview_elem@0 \ IVAL1 io_index_field Vm

  40. Interaction with Xodus • Xodus introduces another degree of parallelism via the X11 event processing mechanism. PGENESIS periodically instructs the X Server to process any X events. Some of those events may result in some script code being run. • Race condition: processing order is unpredictable. • Safe 1: ensure all affected nodes are at a barrier (or equivalent) • Safe 2: ensure mouse/keyboard events do not cause remote operations that require the participation of another node.

  41. Parallel I/O: Writing a File How can one write a file from more than one node? • Use a par_asc_file or par_disk_out object. • Add an index field to the source elements. • raddmsg /src_elems \ /par_asc_file_elem@0 \ SAVE io_index_field Vm

  42. Tips for Avoiding Deadlocks • Use lots of echo statements. • Use barrier IDs. • Do not execute barriers remotely (e.g., barrier@all). • Remember that step usually does an implicit barrier. • Have each node do its own step command, or have one controlling node do a step@all. (similarly for reset) • Do not use the stop command. • Keep things simple.

  43. Motivation • Parallel control of setup can be hard. • Parallel control of simulation can be hard. • Debugging parallel scripts is hard.

  44. How PGENESIS Fits into Schedule • Schedule controls the order in which GENESIS objects get updated. • At beginning of step, all internode data is transferred. • There will be equivalence to serial GENESIS only if remote messages do not pass from earlier to later elements in the schedule.

  45. How PGENESIS Fits into Schedule addtask Simulate /##[CLASS=postmaster] -action PROCESS addtask Simulate /##[CLASS=buffer] -action PROCESS addtask Simulate /##[CLASS=projection] -action PROCESS addtask Simulate /##[CLASS=spiking] -action PROCESS addtask Simulate /##[CLASS=gate] -action PROCESS addtask Simulate /##[CLASS=segment][CLASS!=membrane]\ [CLASS!=gate][CLASS!=concentration] -action PROCESS addtask Simulate /##[CLASS=membrane] -action PROCESS addtask Simulate /##[CLASS=hsolver] -action PROCESS addtask Simulate /##[CLASS=concentration] \ -action PROCESS addtask Simulate /##[CLASS=device] -action PROCESS addtask Simulate /##[CLASS=output] -action PROCESS

  46. Adding Custom "C" Code Uses: • data analysis • interfacing • custom objects PGENESIS allows user's custom libraries to be linked in, similarly to GENESIS We recommend that you first incorporate your custom library into serial GENESIS, before trying to use it with PGENESIS.

  47. Modifiable Parameters • /post/sync_before_step – boolean (default: 1) • /post/remote_info – boolean (default 1) enables rshowmsg • /post/perfmon – boolean (default 0) enables performance monitoring • /post/msg_hang_time – float (default 120.0) seconds before giving up on remote operation • /post/pvm_hang_time – float (default 3.0) seconds between printing dots while waiting for a message • /post/xupdate_period – float (default 0.01) seconds between checking for X events when at barrier

  48. Limitations of PGENESIS • No rplanarweight, rplanardelay – use corresponding 3-D routines rvolumeweight, rvolumedelay • Cannot delete remote messages • getsyncount, getsynindex, getsyndest no longer return the correct values.

  49. Parameter Searching with PGENESIS

  50. Model Characteristics The following are prerequisites to use PGENESIS for optimization on a particular parameter searching problem: • Model must be expressed in GENESIS. • Decide on the parameter set. • Have a way to evaluate the parameter set. • Have some range for each of the parameter values. • The evaluations over the parameter-space should be reasonably well-behaved. • Stopping criterion

More Related