What Mum Never Told Me about Parallel Simulation

1 / 45

# What Mum Never Told Me about Parallel Simulation - PowerPoint PPT Presentation

##### What Mum Never Told Me about Parallel Simulation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. What Mum Never Told Me about Parallel Simulation Karim Djemame Informatics Research Lab. & School of Computing University of Leeds

2. Plan of the Lecture Goals • Learn about issues in the design and execution of Parallel Discrete Event Simulation (PADS) Overview • Discrete Event Simulation – a Review • Parallel Simulation – a Definition • Applications • Synchonisation Algorithms • Conservative • Optimistic • Synchronous • Parallel Simulation Languages • Performance Issues • Conclusion

3. Why Simulation? • Mathematical models too abstract for complex systems • Building real systems with multiple configurations too expensive • Simulation is a good compromise!

4. Discrete Event Simulation (DES) • a DES system can be viewed as a collection of simulated objects and a sequence of event computations • Changes in state of the model occur atdiscretepoints in time • The passage of time is modelled using asimulation clock • Event scheduling is the most well used • provides locality in time: each event describes related actions that may all occur in a single instant • The model maintains a list of events(Event List)that • have been scheduled • have not occurred yet

5. Processing the Event List on a Uni-processor Computer • An event contains two fields of information - the event it represents (eg. arrival in a queue) - time of occurrence: time when the event should happen - also timestamp e1 e2 en 7 9 20 ... EVL time event • The event list - contains the events - is always ordered by increasingoccurrence of time • The events are processed sequentially by asingle processor

6. Event-Driven Simulation Engine e1 e2 en 7 9 20 ... EVL e1 e2 en 7 9 20 ... EVL e3 14 e2 e3 en 9 14 20 ... EVL (1) (2) (3) • Remove 1st event (lowest time of occurrence) from EVL • Execute corresponding event routine; modify state (S) accordingly • Based on new S, schedule new future events

7. Why change? It ’s so simple! • Models becomes larger and larger • The simulation time is overwhelming or the simulation is just untractable • Example: • parallel programs with millions of lines of codes, • mobile networks with millions of mobile hosts, • Networks with hundreds of complex switches, routers • multicast model with thousands of sources, • ever-growing Internet, • and much more...

8. Some Figures to Convince... • ATM network models • Simulation at the cell-level, • 200 switches • 1000 traffic sources, 50Mbits/s • 155Mbits/s links, • 1 simulation event per cell arrival. More than 26 billions events to simulate 1 second! 30 hours if 1 event is processed in 1us • simulation time increases as link speed increases, • usually more than 1 event per cell arrival, • how scalable is traditional simulation?

9. Motivation for Parallel Simulation • Sequential simulation very slow • Sequential simulation does not exploit the parallelism inherent in models So why not use multiple processors ? • Variety of parallel simulation protocols • Availability of parallel simulation tools to achieve a certain speedup over the sequential simulator

10. Processing the Event List on a Multi-Processor Computer In parallel • The events are processed by many processors. Example: Time Event 2 14 Event 3 9 Event 1 7 p1 p2 Processors • Processor 1 generates event 3 at 9 to be processed by processor 2 • Processor 2 has already processed event 2 at 14 • Problem: - the future can affect the past ! - this is the causality problem

11. Causal Dependencies • Scheduled events in timestamp order e1, 7 e2, 9 e3, 14 e4, 20 e5, 27 e6, 40 EVL • Sequence ordered by causal dependencies e1, 7 e2, 9 e4, 20 e6, 40 EVL e3, 14 e5, 27 • Causal dependencies mean restrictions • The sequence of events (e1, e2, e4, e6) can be executed in parallel with (e3, e5) • If any event were simulated with e1: violation of causal dependencies

12. Parallel Simulation - Principles • Execution of a discrete event simulation on a parallel or distributed system with several physical processors • The simulation model is decomposed into several sub-models (Logical Processes, LP) that can be executed in parallel • spatial partitioning • LPs communicate by sending timestamped messages • Fundamental concepts • each LP can be at a different simulation time • local causality constraint: events in each LP must be executed in time stamp order

13. logical process (LP) h packet t event parallel Parallel Simulation – example 1

14. Parallel Simulation – example 2 LP LP LP LP LP • Logical processes (LPs) modelling airports, air traffic sectors, aircraft, etc. • LPs interact by exchanging messages (events modelling aircraft departures, landings, etc.)

15. Synchronisation Mechanisms • Synchronisation Algorithms • Conservative: avoids local causality violations by waiting until it ’s safe to proceed a message or event • Optimistic: allows local causality violations but provisions are done to recover from them at runtime • Synchronous: all LPs process messages/events with the same timestamp in parallel

16. PDES Applications • VLSI circuit simulation • Parallel computing • Communication networks • Combat scenarios • Health care systems • Road traffic • Simulation of models • Queueing networks • Petri nets • Finite state machines

17. Conservative Protocols Architecture of a conservative LP The Chandy-Misra-Bryant protocol The lookahead ability

18. c1=tB1 tB2 tB1 LPB LPA c2=tC3 tC5 tC4 tC3 LPC LPD tD4 c3=tD3 Architecture of a Conservative LP • LPs communicate by sending non-decreasing timestamped messages • each LP keeps a static FIFO channel for each LP with incoming communication • each FIFO channel (input channel, IC) has a clock ci that ticks according to the timestamp of the topmost message, if any, otherwise it keeps the timestamp of the last message

19. A Simple Conservative Algorithm • each LP has to process event in time-stamp order to avoid local causality violations The Chandy-Misra-Bryant algorithm while (simulation is not over) { determine the ICi with the smallest Ci if (ICi empty) wait for a message else { remove topmost event from ICi process event } }

20. event min IC 2 1 1 3 4 2 5 3 3 BLOCK 6 1 7 2 Safe but Has to Block LPB LPA LPC LPD IC1 10 6 3 IC2 7 4 1 5 IC3 9

21. 4 4 4 4 4 6 S sends all messages to B Blocks and Even Deadlocks! A merge point S M BLOCKED B

22. 4 4 1 4 4 10 5 2 6 7 How to Solve Deadlock: Null-Messages Use of null-messages for artificial propagation of simulation time A S 10 10 10 M UNBLOCKED B What frequency?

23. 4 12 8 12 LP C sends a null-message with time stamp 4 LP A sends a null-message with time stamp 8 LP B sends a null-message with time stamp 12 LP C can process event with time stamp 7 How to Solve Deadlock: Null-Messages a null-message indicates a Lower Bound Time Stamp minimum delay between links is 4 LP C initially at simulation time 0 11 9 7 A B 10 C

24. The Lookahead Ability • Null-messages are sent by an LP to indicate a lower bound time stamp on the future messages that will be sent • null-messages rely on the « lookahead » ability • communication link delays • server processing time (FIFO) • lookahead is very application model dependent and need to be explicitly identified

25. Conservative: Pros & Cons • Pros • simple, easy to implement • good performance when lookahead is large (communication networks, FIFO queue) • Cons • pessimistic in many cases • large lookahead is essential for performance • no transparent exploitation of parallelism • performances may drop even with small changes in the model (adding preemption, adding one small lookahead link…)

26. Optimistic Protocols Architecture of an optimistic LP Time Warp

27. LPB LPA tB2 tC4 tC5 tD4 tB1 tC3 LPC LPD Architecture of an Optimistic LP • LPs send timestamped messages, not necessarily in non-decreasing time stamp order • no static communication channels between LPs, dynamic creation of LPs is easy • each LP processes events as they are received, no need to wait for safe events • local causality violations are detected and corrected at runtime • Most well known optimistic mechanism: Time Warp

28. processed! LPD LPB LPC LPC LPB LPD LPB LPD 32 36 28 25 22 18 13 11 Processing Events as They Arrive LPA LPB what to do with late messages? LPC LPD LPA

29. TimeWarp Do, Undo, Redo

30. TimeWarp Rollback - How? • Late messages (stragglers) are handled with a rollback mechanism • undo false/uncorrect local computations, • state saving: save the state variables of an LP • reverse computation • undo false/uncorrect remote computations, • anti-messages: anti-messages and (real) messages annihilate each other • process late messages • re-process previous messages: processed events are NOT discarded!

31. Need for a Global Virtual Time • Motivations • an indicator that the simulation time advances • reclaim memory (fossil collection) • Basically, GVT is the minimum of • all LPs ’ logical simulation time • timestamp of messages in transit • GVT garantees that • events below GVT are definitive events • no rollback can occur before the GVT • state points before GVT can be reclaimed • anti-messages before GVT can be reclaimed

32. Time Warp - Overheads • Periodic state savings • states may be large, very large! • copies are very costly • Periodic GVT computations • costly in a distributed architecture, • may block computations, • Rollback thrashing • cascaded rollback, no advancement! • Memory! • memory is THE limitation

33. Optimistic Mechanisms: Pros & Cons • Pros • exploits all the parallelism in the model, lookahead is less important • transparent to the end-user • can be general-purpose • Cons • very complex, needs lots of memory • large overheads (state saving, GVT, rollbacks…)

34. performance mixed conservative conservative optimistic optimistic Mixed/Adaptive Approaches • General framework that (automatically) switches to conservative or optimistic • Adaptive approaches may determine at runtime the amount of conservatism or optimism messages

35. Synchronous Protocols • Architecture of a synchronous LP

36. Synchronous Protocols TOUS pour UN et UN pour TOUS! The Three Musketeers Alexandre Dumas (1802 – 1870)

37. A Simple Synchronous Algorithm • avoids local causality violations • LP: same data structures of a single sequential simulator • Global clock shared among all LPS – same value • Some data structures are private My min timestamp is 8 My min timestamp is 5 LPB LPA My min timestamp is 10 My min timestamp is 12 LPC LPC Global clock = 5

38. A Simple Synchronous Algorithm Clock = 0; while (simulation is not over) { t = minimum_timestamp(); clock = global_minimum(); simulate_events(clock); synchronise(); } Basic operations 1. Computation of Minimum timestamp – reduction operation 2. Event Consumption 3. Message distribution 4. Message Reception – barrier operation

39. Synchronous Mechanisms: Pros & Cons • Pros • simple, easy to implement • good performance if parallelism exploited with a moderate synchonisation cost • Cons • pessimistic in many cases • Worst case: simulator behaves like the sequential one • performance may drop if cost of LPs synchronisation (reduction, barrier) is high

40. PDES Languages • PDES Simulation Languages • a number of PDES languages have been developed in recent years • PARSEC • Compose • ModSim • etc • Most of these languages are general purpose languages • PARSEC • Developed at UCLA Parallel Computing Lab. • Availability - http://pcl.cs.ucla.edu/projects/parsec/ • Simplicity • Efficient event scheduling mechanism.

41. Georgia Tech Time Warp (GTW) • Optimistic discrete event simulator developed by PADS group of Georgia Institute of Technology • http://www.cc.gatech.edu/computing/pads/tech-parallel-gtw.html • Support small granularity simulation • GTW runs on shared-memory multiprocessor machines • Sun Enterprise, SGI Origin • TeD: Telecommunications Description Language • language that has been developed mainly for modeling telecommunicating network elements and protocols • Jane: simulator-independent Client/Server-based graphical interface and scripting tool for interactive parallel simulations • TeD/GTW simulations can be executed using the Jane system

42. BYOwS ! • BYOwS :BuildYourOwnSimulator • Choose a programming language • C, C++, Java • Learn basic MPI • MPI: Message Passing Interface • Point-to-Point Communication • Available on the school Linux machines • Implement a simple PDES protocol • Case study: a simple queueing network

43. Parallel Simulation Today • Lots of algorithms have been proposed • variations on conservative and optimistic • adaptives approaches • Few end-users • Compete with sequential simulators in terms of user interface, generability, ease of use etc. • Research mainly focus on • applications, ultra-large scale simulations • tools and execution environments (clusters) • Federated simulations • different simulators interoperate with each other in executing a single simulation • battle field simulation, distributed multi-user games

44. Parallel Simulation - Conclusion • Pros • reduction of the simulation time • increase of the model size • Cons • causality constraints are difficult to maintain • need of special mechanisms to synchronize the different processors • increase both the model and the simulation kernel complexity • Challenges • ease of use, transparency.

45. References • Parallel simulation • R. Fujimoto, Parallel and Distributed Simulation Systems, John Wiley & Sons, 2000 • R. Fujimoto, Parallel Discrete Event Simulation, Communications of the ACM, Vol. 33(10), Oct. 90, pp31-53 • Parallel Simulation – Links http://www.cs.utsa.edu/research/ParSim/