Parallel Simulation etc. 601.46

Parallel Simulation etc.601.46 Roger Curry Presentation on Load Balancing

Load Balancing • Goal is to ensure that simulation time advances at approximately the same rate across all LPs • An LP lagging behind slows down the simulation • An LP too far in the future cannot do any useful work • Partitioning • Static schemes • Dynamic schemes

Partitioning • one-to-one mapping of simulation objects to LPs could result in unnecessary overhead for parallel simulation. • Solution: partition simulation objects into groups and assign each group to an LP. • Partitioning usually attempts to • Minimize load imbalances • Minimize inter-processor communication • Maximize lookahead

Static partitioning techniques • For SMP inter-processor communication is not such an issue • Finding the optimal partition is NP-hard, most techniques are heuristics which attempt to find good partitions. • Simulated Annealing (SA) • Difficult to devise appropriate cost functions • Can take a long time to find a good solution • Graph algorithms (max-flow, min-cut)

Static partitioning packages • METIS, SCOTCH are two graph partitioning packages available. • These packages attempt to minimize inter-processor communication (edge weights), and evenly distribute the work load (node weights). • Use inverse of lookahead for edge weights.

Static partitioning (cont. ) • Workload requirements are usually unknown prior to simulation. Solutions: • Pre-simulation • Load-estimation • IP-TN traffic load estimation For traffic T, where T=1,…,n and n is the number of traffics Let R be the route used by traffic T for every hop hi in R hi.wg = hi.wg + ( n * ( rate(T) / total_rate ) ) where hi.wg is the weight of host i, rate(T) is the rate of traffic T, and total_rate is the sum of all traffic rates endfor endfor

Static partitioning (cont. ) • SCOTCH and METIS do a pretty good job (Figure 5), unfortunately they don’t eliminate low lookahead cycles! • Most conservative synchronization protocols perform poorly if there are low lookahead cycles and few events (this limits parallelism). • Solution : Merging algorithm to eliminate these cycles (before, or after). (Figure 3).

Dynamic partitioning (structures) CMB – SMP, and others CCTKit, Taskit

Dynamic schemes • Centralized queue (CMB-SMP, Taskit) • LPs (or Tasks) can migrate between processors via the central scheduling queue. • Requires that the number of LPs (or Tasks) be (significantly) greater than the number of processors. • Trap: If a LP has few events it can execute, then cost of accessing a (lockable) global queue makes this strategy quite expensive.

Dynamic schemes • Distributed queues (CCTKit) • Most distributed queue implementations do not allow for LPs to migrate between processors; they are simply assigned during the static partitioning phase. • There is definitely a possibility of moving LPs between different scheduling queues (Rob, has this been done yet?). • The advantage of distributed queues (in terms of parallel simulation, is that they don not require a lock since they are only accessed by a single processor.

Dynamic schemes • The synchronization protocol (or at least the scheduling algorithm) is tightly coupled with dynamic load balancing in general. • In CCTkit, only scheduling LPs (Tasks) that have work to do ensures that when a LP executes it will at least be able to advance its simulation time (not necessarily by a lot). • Naive synchronization protocols can end up scheduling LPs with nothing to do!

Future research directions? • Task migration between Processors. • LP migration between Tasks. • Hierarchical scheduling / load balancing • Apply different synchronization techniques to different parts of the model. • Try to extract relevant structure from graph to determine a good partitioning.

Conclusions • A load balancing strategy needs to take into account both static and dynamic solutions. • Determining an optimal number of Tasks or LPs may not be that important if we can obtain consistently good performance with a varying number of LPs.

Parallel Simulation etc. 601.46

Parallel Simulation etc. 601.46

Presentation Transcript

Parallel Discrete Event Simulation

Parallel Simulation

Parallel and Distributed Simulation

Parallel and Distributed Simulation

Parallel and Distributed Simulation

Parallel and Distributed Simulation

Parallel and Distributed Simulation

Parallel and Distributed Simulation

Parallel and Distributed Simulation

Parallel and Distributed Simulation

Parallel Simulation System

Parallel and Distributed Simulation

Parallel and Distributed Simulation

Parallel and Distributed Simulation

Parallel and Distributed Simulation

Parallel platforms, etc.

Parallel and Distributed Simulation

Parallel and Distributed Simulation

Parallel and Distributed Simulation

Parallel and Distributed Simulation

Parallel Discrete Event Simulation