1 / 13

Parallel Simulation etc. 601.46

Parallel Simulation etc. 601.46. Roger Curry Presentation on Load Balancing. Load Balancing. Goal is to ensure that simulation time advances at approximately the same rate across all LPs An LP lagging behind slows down the simulation An LP too far in the future cannot do any useful work

Download Presentation

Parallel Simulation etc. 601.46

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Simulation etc.601.46 Roger Curry Presentation on Load Balancing

  2. Load Balancing • Goal is to ensure that simulation time advances at approximately the same rate across all LPs • An LP lagging behind slows down the simulation • An LP too far in the future cannot do any useful work • Partitioning • Static schemes • Dynamic schemes

  3. Partitioning • one-to-one mapping of simulation objects to LPs could result in unnecessary overhead for parallel simulation. • Solution: partition simulation objects into groups and assign each group to an LP. • Partitioning usually attempts to • Minimize load imbalances • Minimize inter-processor communication • Maximize lookahead

  4. Static partitioning techniques • For SMP inter-processor communication is not such an issue • Finding the optimal partition is NP-hard, most techniques are heuristics which attempt to find good partitions. • Simulated Annealing (SA) • Difficult to devise appropriate cost functions • Can take a long time to find a good solution • Graph algorithms (max-flow, min-cut)

  5. Static partitioning packages • METIS, SCOTCH are two graph partitioning packages available. • These packages attempt to minimize inter-processor communication (edge weights), and evenly distribute the work load (node weights). • Use inverse of lookahead for edge weights.

  6. Static partitioning (cont. ) • Workload requirements are usually unknown prior to simulation. Solutions: • Pre-simulation • Load-estimation • IP-TN traffic load estimation For traffic T, where T=1,…,n and n is the number of traffics Let R be the route used by traffic T for every hop hi in R hi.wg = hi.wg + ( n * ( rate(T) / total_rate ) ) where hi.wg is the weight of host i, rate(T) is the rate of traffic T, and total_rate is the sum of all traffic rates endfor endfor

  7. Static partitioning (cont. ) • SCOTCH and METIS do a pretty good job (Figure 5), unfortunately they don’t eliminate low lookahead cycles! • Most conservative synchronization protocols perform poorly if there are low lookahead cycles and few events (this limits parallelism). • Solution : Merging algorithm to eliminate these cycles (before, or after). (Figure 3).

  8. Dynamic partitioning (structures) CMB – SMP, and others CCTKit, Taskit

  9. Dynamic schemes • Centralized queue (CMB-SMP, Taskit) • LPs (or Tasks) can migrate between processors via the central scheduling queue. • Requires that the number of LPs (or Tasks) be (significantly) greater than the number of processors. • Trap: If a LP has few events it can execute, then cost of accessing a (lockable) global queue makes this strategy quite expensive.

  10. Dynamic schemes • Distributed queues (CCTKit) • Most distributed queue implementations do not allow for LPs to migrate between processors; they are simply assigned during the static partitioning phase. • There is definitely a possibility of moving LPs between different scheduling queues (Rob, has this been done yet?). • The advantage of distributed queues (in terms of parallel simulation, is that they don not require a lock since they are only accessed by a single processor.

  11. Dynamic schemes • The synchronization protocol (or at least the scheduling algorithm) is tightly coupled with dynamic load balancing in general. • In CCTkit, only scheduling LPs (Tasks) that have work to do ensures that when a LP executes it will at least be able to advance its simulation time (not necessarily by a lot). • Naive synchronization protocols can end up scheduling LPs with nothing to do!

  12. Future research directions? • Task migration between Processors. • LP migration between Tasks. • Hierarchical scheduling / load balancing • Apply different synchronization techniques to different parts of the model. • Try to extract relevant structure from graph to determine a good partitioning.

  13. Conclusions • A load balancing strategy needs to take into account both static and dynamic solutions. • Determining an optimal number of Tasks or LPs may not be that important if we can obtain consistently good performance with a varying number of LPs.

More Related