1 / 13

Parallel Discrete Event Simulation (PDES) at ORNL

Parallel Discrete Event Simulation (PDES) at ORNL. Kalyan S. Perumalla, Ph.D. Modeling & Simulation Group Computational Sciences & Engineering. PDES: Selected application areas. Global and local events. Emergencies. Network simulation Internet protocols, security, P2P designs, …

huseby
Download Presentation

Parallel Discrete Event Simulation (PDES) at ORNL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Discrete Event Simulation (PDES) at ORNL Kalyan S. Perumalla, Ph.D. Modeling & Simulation Group Computational Sciences & Engineering

  2. PDES: Selected application areas Global and local events Emergencies • Network simulation • Internet protocols, security, P2P designs, … • Traffic simulation • Emergency planning/response, environmental policy analysis, urban planning, … • Social dynamics simulation • Operations planning, foreign policy, marketing, … • Sensor simulations • Wide area monitoring, situational awareness, border surveillance, … • Organization simulations • Command and control, business processes, … Protection and awareness systems Current and future defense systems

  3. High-performance PDES kernel requirements • Global time synchronization • Total time-stamped ordering of events • Paramount for accuracy • Fast synchronization • Scalable, application-independent, time-advance mechanisms • Critical for real-time and as-fast-as-possible execution • Support for fine-grained events • Minimal overhead relative to event processing times • Application computation is typically only 5 µs to 50 µs per event • Conservative, optimistic, and mixed modes • Need support for the principal synchronization approaches • Useful to choose mode on per-entity basis at initialization • Desirable to vary mode dynamically during simulation • General-purpose API • Reusable across multiple applications • Accommodates multiple techniques • Lookahead, state saving, reverse computation, multicast, etc.

  4. 70 35 60 30 50 25 40 20 Aggregate events/s (M) µs per event 15 30 10 20 5 10 0 0 0 128 256 384 512 640 768 896 1024 128 256 512 1024 LP = 1 thousand, MSG = 1 million LP = 1 million, MSG = 100 million LP = 1 million, MSG = 1 billion Number of processors Number of processors µsik—unique PDES “micro-kernel” Unique mixed-mode kernel • The only scalable mixed-mode kernel in the world • Supports conservative, optimistic, and mixed modes in a single kernel Used in a variety of applications • DES-based vehicular traffic models • DES-based plasma physics models • DES-based neurological models • Largest Internet simulations • Some recent results of fine-grained PDES benchmark (phold) • Among the largest/fastest scalability results in parallel discrete event simulation LP = 1 thousand, MSG = 1 million LP = 1 million, MSG = 100 million LP = 1 million, MSG = 1 billion

  5. 17000 50 Conservative Optimistic 13000 40 Conservative Mixed Speedup 30 9000 µs per event Mixed 20 5000 Optimistic 10 1000 0 0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 14 16 18 Number of processors (thousands) Number of processors (thousands) 1.4 600 Conservative Mixed Optimistic 1.2 Conservative 500 1.0 400 Mixed 0.8 Parallel efficiency Events/s (millions) 300 Optimistic 0.6 200 0.4 100 0.2 0 0 2,048 4,096 8,192 16,384 0 2 4 6 8 10 12 14 16 18 Number of processors Number of processors (thousands) µsik scaled to more than 104 processors • Some recent results of fine-grained PDES benchmark • On Blue Gene Watson (BGW) at IBM TJ Watson Research Center • Well-known PHOLD benchmark, with 1 million logical processes, 10 million pucks • The largest and fastest scalability results in PDES recorded to date

  6. µsik micro-kernel internals Micro-kernel User LPs Future event listProcessed event listLocal virtual time LP PEL→t LVT FEL→t Pc LP LP ECTS Q LP LP Committable • When update kernel Qs? • New LP added or deleted • LP executes an event • LP receives an event Pp EPTS Q Kernel LPs Processable KP LP = Logical processKP = Kernel processECTS = Earliest committable time stampEPTS = Earliest processable time stampEETS = Earliest emittable time stampPEL = Processed event listFEL = Future event listLVT = Local virtual time Pe KP KP EETS Q KP Emittable

  7. libSynk: µsik’s synchronization core µsik TM µsik Process µsik Process µsik Process TM Red TM Null RM RM Bar libSynk FM OS/Hardware FM Myr FM TCP Network FM ShM FM MPI X Y Implies X uses Y

  8. µsik micro-kernel capabilities • μsik is currently able to support the following: • Lookahead-based conservative and/or optimistic execution • Reverse computation-based optimistic execution • Checkpointing-based optimistic execution • Resilient optimistic execution (zero rollbacks) • Constrained, out-of-order execution • Preemptive event processing • Any combinations of the above • Automated, network-throttled flow control • User-level event retraction • Process-specific limits to optimism • Dynamic process addition/deletion • Shared and/or distributed memory execution • Process-oriented views • It accommodates addition of the following: • Synchronized multicast • Optimistic dynamic memory allocation • Automated load-balancing

  9. SimDriver SimComm Tiny Viz Script Interpreter Tython Python Reflected Simulation Script External Event Buffer SensorNet: Parallel simulation/ immersive test-bed SAF Weather Model TOSSIM Comm. Models Plume Models SCIPUFF Weather Simulator Sensor Net Simulator Operations Simulator PlumeModels LiveDevices Environ-mentalModel Proxy CB Sensor Network Testbed Framework Current Envisioned • Seamless integrated testbed to incorporate a variety of important simulations, stimulations, and live devices • Achieves unified capabilities and significant fidelity for test and evaluation of CB sensor device-based designs, concepts, and operations

  10. Sensor Network Layout 0 1 6 11 16 21 Base Station 2 7 12 17 22 3 8 13 18 23 Source Release 4 9 14 19 24 5 10 15 20 25 40 No Winds Winds Turbulent 30 20 10 0 Mean Concentration (10-6 kgs/m3) 40 No Winds Winds Turbulent 30 20 10 0 300 400 500 600 Time (s) SensorNet: Simulation-based analysis for plume tracking • Environmental phenomenon exhibits high variability. • Phenomenon drives the sensor network’s computation and communication. • Trace gathered at base station of sensed phenomenon reflects high variability. • Communication effects induce unpredictable gaps in series. • Accurate, integrated simulation of phenomenon and communication captures complex interdependencies. 50.5 No Winds Kgs/cm3 48.5 1.00E-02 46.6 1.00E-03 1.00E-04 44.6 1.00E-05 1.00E-06 42.6 1.00E-07 Surface Dosage CHEM at T = 17.0 h 1.00E-08 40.6 50.5 Light Winds 48.5 46.6 Latitude 44.6 42.6 Surface Dosage CHEM at T = 17.0 h 40.6 50.5 Turbulent Winds 48.5 46.6 44.6 Surface Dosage CHEM at T = 17.0 h 42.6 40.6 -96.0 -91.2 -86.4 Longitude

  11. SCATTER: Ultra-scale PDES-based mobility simulations Our approach: SCATTER • DES models • vs time-stepped • Parallel execution • vs sequential • Scalability to high-performance computing • 102–103 CPUs • Important behaviors • kinetic + non-kinetic • Scalable tool for transportation and energy/event/emergency research • Regional scale: multiple states • 106–107 intersections • Current tool capabilities • At most 104 intersections • Faster than real time is very useful Faster Network flow methods SCATTER Good for rough estimates of evacuation delay, . . . Speed OREMS Desirable Fidelity Lower accuracy Higher accuracy CORSIM Good for emissions estimates, traffic analysis, . . . MITSIM TRANSIMS Slower

  12. Very low event processing overhead (ms)! SCATTER: Benchmark performance Speedup over real-time Significantly faster than real time with 1 million vehicles! 10000 Nodes = 9 1000 Real time/simulated time 100 Nodes = 1089 10 1 10000 100000 1000000 Number of vehicles Parallel Speedup Event processing speed 3.5 1000 3.0 Nodes = 9 2.5 Significant speedup with parallel execution! 100 2.0 Nodes = 1089 µs/event Speedup over sequential run 1.5 10 1.0 0.5 0.0 1 1 2 3 4 10000 100000 1000000 Number of processors Number of vehicles

  13. Contact Kalyan S. Perumalla Modeling & Simulation GroupComputational Sciences & Engineering (865) 241-1315 perumallaks@ornl.gov 13 Perumalla_PDES_SC07

More Related