1 / 45

Meta-Simulation Design and Analysis for Large Scale Networks

Meta-Simulation Design and Analysis for Large Scale Networks. David W Bauer Jr Department of Computer Science Rensselaer Polytechnic Institute. OUTLINE. Motivation Contributions Meta-simulation ROSS.Net BGP4-OSPFv2 Investigation Simulation Kernel Processes Seven O’clock Algorithm

leane
Download Presentation

Meta-Simulation Design and Analysis for Large Scale Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Meta-Simulation Design and Analysis for Large Scale Networks David W Bauer Jr Department of Computer Science Rensselaer Polytechnic Institute

  2. OUTLINE • Motivation • Contributions • Meta-simulation • ROSS.Net • BGP4-OSPFv2 Investigation • Simulation • Kernel Processes • Seven O’clock Algorithm • Conclusion

  3. Feature Interactions Protocol Stability and Dynamics Parameter Sensitivity High-Level Motivation: to gain varying degrees of qualitative and quantitative understanding of the behavior of the system-under-test “…objective as a quest for general invariant relationships between network parameters and protocol dynamics…”

  4. Meta-Simulation:capabilities to extract and interpret meaningful performance data from the results of multiple simulations • Individual experiment cost is high • Developing useful interpretations • Protocol performance modeling Experiment Design Goal: identify minimum cardinality set of meta-metrics to maximally model system

  5. OUTLINE • Motivation • Contributions • Meta-simulation • ROSS.Net • BGP4-OSPFv2 Investigation • Simulation • Kernel Processes • Seven O’clock Algorithm • Conclusion

  6. Optimization-based ED: 750 experiments Full-Factorial ED (FFED): 16384 experiments Contributions: Meta-Simulation: OSPF Problem: which meta-metrics are most important in determining OSPF convergence? Negligible metrics identified and isolated Search complete model space Step 1 Re-parameterize Step 2 Our approach within 7% of Full Factorial using 2 orders of magnitude fewer experiments Step 3 Re-scale

  7. Global perspective 20-25% better than local perspectives Contributions: Meta-Simulation: OSPF/BGP Ability: model BGP and OSPF control plane Problem: which meta-metrics are most important in minimizing control plane dynamics (i.e., updates)? • BO: BGP-caused OSPF update • OB: OSPF-caused BGP update All updates belong to one of four categories: • OO: OSPF-caused OSPF (OO) update • BO: BGP-caused OSPF update Minimize total BO+OB 15-25% better than other metrics Meta-Simulation Perspective: complete view of all domains OB: ~50% of total updates BO: ~0.1% of total updates • Optimized with respect to various metrics -- equivalent to a particular management approach. • Importance of parameters differ for each metric. • For minimal total updates: • Local perspectives are 20-25% worse than the global. • For minimal total interactions: • 15-25% worse can happen with other metrics • OB updates are more important than BO updates (i.e. ~0.1% vs. ~50%)

  8. Parallel Discrete Event Simulation Contributions: Simulation: Kernel Process Optimistic Simulation Allow violations of time-stamp order to occur, but detect them and recover Conservative Simulation Wait until it is safe to process next event, so that events are processed in time-stamp order • Benefits of Optimistic Simulation: • Not dependant on network topology simulated • As fast as possible forward execution of events

  9. Contributions: Simulation: Kernel Process Problem: parallelizing simulation requires 1.5 to 2 times more memory than sequential, and additional memory requirement affects performance and scalability • Decreased scalability as model size increases: • due to increased memory required to support model 4 Processors Used • Solution: Kernel Processes (KPs) • new data structure supports parallelism, increases scalability Model Size Increasing

  10. Contributions: Simulation: Seven O’clock Problem: distributing simulation requires efficient global synchronization Inefficient solution: barrier synchronization between all nodes while performing computation Efficient solution: pass messages between nodes, and sycnhronize in background to main simulation Seven O’clock Algorithm: eliminate message passing  reduce cost from O(n) or O(log n) to O(1)

  11. OUTLINE • Motivation • Contributions • Meta-simulation • ROSS.Net • BGP4-OSPFv2 Investigation • Simulation • Kernel Processes • Seven O’clock Algorithm • Conclusion

  12. Protocol parameters Protocol Design Protocol metrics ROSS.Net: Big Picture Goal: an integrated simulation and experiment design environment ROSS.Net (simulation & meta-simulation Modeling Protocol Models: OSPFv2, BGP4, TCP Reno, IPv4, etc Measured topology data, traffic and router stats, etc. Measurement Data-sets (Rocketfuel)

  13. ROSS.Net: Big Picture Meta-Simulation • Experiment design • Statistical analysis • Optimization heuristic search • Recursive Random Search • Sparse empirical modeling ROSS.Net Design of Experiments Tool (DOT) Input Parameters Output Metric(s) Parallel Discrete Event Network Simulation • Optimistic parallel simulation • ROSS • Memory efficient network protocol models Simulation

  14. Design of Experiments Tool (DOT) Design of Experiments Tool (DOT) Statistical or Regression Analysis (R, STRESS) Statistical or Regression Analysis (R, STRESS) Traditional Experiment Design (Full/Fractional Factorial) Optimization Search Metric(s) Metric(s) Parameter Vector Parameter Vector Empirical model Sparse empirical model • Small-scale systems • Linear parameter interactions • Small # of params • Large-scale systems • Non-Linear parameter interactions • Large # of params – curse of dimensionality ROSS.Net:Meta-Simulation Components

  15. Router topology from Rocketfuel tracedata took each ISP map as a single OSPF area Created BGP domain between ISP maps hierarchical mapping of routers Meta-Simulation: OSPF/BGP Interactions AT&T’s US Router Network Topology • 8 levels of routers: • Levels 0 and 1, 155Mb/s, 4ms delay • Levels 2 and 3, 45Mb/s, 4ms delay • Levels 4 and 5, 1.5Mb/s, 10ms delay • Levels 6 and 7, 0.5Mb/s, 10ms delay

  16. OSPF Intra-domain, link-state routing Path costs matter Border Gateway Protocol (BGP) Inter-domain, distance-vector, policy routing Reachability matters BGP decision-making steps: Highest LOCAL PREF Lowest AS Path Length Lowest origin type ( 0 – iBGP, 1 – eBGP, 2 – Incomplete) Lowest MED Lowest IGP cost Lowest router ID Meta-Simulation: OSPF/BGP Interactions OSPF domain eBGP connectivity iBGP connectivity

  17. Intra-domain routing decisions can effect inter-domain behavior, and vice versa. All updates belong to either of four categories: OSPF-caused OSPF (OO) update OSPF-caused BGP (OB) update – interaction BGP-caused OSPF (BO) update – interaction BGP-caused BGP (BB) update Meta-Simulation: OSPF/BGP Interactions OB Update Destination 10 8 Link failure or cost increase (e.g. maintenance)

  18. Intra-domain routing decisions can effect inter-domain behavior, and vice versa. Identified four categories of updates: OO: OSPF-caused OSPF update BB: BGP-caused BGP update OB: OSPF-caused BGP update – interaction BO: BGP-caused OSPF update – interaction Meta-Simulation: OSPF/BGP Interactions BO Update Destination eBGP connectivity becomes available These interactions cause route changes to thousands of IP prefixes, i.e. huge traffic shifts!!

  19. Three classes of protocol parameters: OSPF timers, BGP timers, BGP decision Maximum search space size 14,348,907. RRS was allowed 200 trials to optimize (minimize) response surface: OO, OB, BO, BB, OB+BO, ALL updates Applied multiple linear regression analysis on the results Meta-Simulation: OSPF/BGP Interactions

  20. Optimized with respect to OB+BO response surface. BGP timers play the major role, i.e. ~15% improvement in the optimal response. BGP KeepAlive timer seems to be the dominant parameter.. – in contrast to expectation of MRAI! OSPF timers effect little, i.e. at most 5%. low time-scale OSPF updates do not effect BGP. ~15% improvement when BGP timers included in search space Meta-Simulation: OSPF/BGP Interactions

  21. Varied response surfaces -- equivalent to a particular management approach. Importance of parameters differ for each metric. For minimal total updates: Local perspectives are 20-25% worse than the global. For minimal total interactions: 15-25% worse can happen with other metrics OB updates are more important than BO updates (i.e. ~0.1% vs. ~50%) Important to optimize OSPF Important to optimize OSPF Important to optimize OSPF Important to optimize OSPF Global perspective 20-25% better than local perspectives Meta-Simulation: OSPF/BGP Interactions Minimize total BO+OB 15-25% better than other metrics OB: ~50% of total updates BO: ~0.1% of total updates

  22. Meta-Simulation Conclusions: Number of experiments were reduced by an order of magnitude in comparison to Full Factorial. Experiment design and statistical analysis enabled rapid elimination of insignificant parameters. Several qualitative statements and system characterizations could be obtained with few experiments.

  23. OUTLINE • Problem Statement • Contributions • Meta-simulation • ROSS.Net • BGP4-OSPFv2 Investigation • Simulation • Kernel Processes • Seven O’clock Algorithm • Conclusion

  24. Simulation: Overview Parallel Discrete Event Simulation Logical Process (LPs) for each relatively parallelizable simulation model, e.g. a router, a TCP host Local Causality Constraint: Events within each LP must be processed in time-stamp order Observation: Adherence to LCC is sufficient to ensure that parallel simulation will produce same result as sequential simulation • Conservative Simulation • Avoid violating the local causality constraint (wait until it’s safe) • Null Message (deadlock avoidance) (Chandy/Misra/Byrant) • Time-stamp of next event • Optimistic Simulation • Allow violations of local causality to occur, but detect them and recover using a rollback mechanism • Time Warp Protocol (Jefferson, 1985) • Limiting amount of opt. execution

  25. GTW PEState GState[NPE] message PEState receive_ts event queue message src / dest_lp message cancel queue user data lplist[MAX_LP] free event list[ ][ ] tw_lp LPState pe message process ptr lp number init proc ptr type rev proc ptr proc ev queue head final proc ptr proc ev queue tail Event event queue ... lp number message cancel queue message lp_list free event list head free event list tail ROSS: Rensselaer’s Optimistic Simulation System ROSS Example Accesses GTW: Top down hierarchy lp_ptr = GState[LP[i].Map].lplist[LPNum[i]] ROSS: Bottom up hierarchy lp_ptr = event->src_lp; or pe_ptr = event->src_lp->pe; Key advantages of bottom up approach: • reduces access overheads • improves locality and processor cache performance tw_event tw_pe Memory usage only 1% more than sequential and independent of LP count.

  26. Snapshot of PE 0’s internal state at time 15.0 Processor 0 LP A LP B LP C FreeList[1] 5.0 5.0 5.0 10.0 10.0 10.0 15.0 15.0 15.0 FreeList[0] Processor 0 Snapshot of PE 0’s internal state after rollback of LP A and re-execute LP A LP B LP C FreeList[1] 5.0 5.0 10.0 10.0 15.0 15.0 5.0 10.0 15.0 FreeList[0] “On the Fly” Fossil Collection OTFFC works by only allocating events from the free list that are less than GVT. As events are processed they are immediately placed at the end of the free list.... Key Observation: Rollbacks cause the free list to become UNSORTED in virtual time. Result: event buffers that could be allocated are not. user must over-allocate the free list

  27. LP KP LP 8 7 3 1 LP KP . . . Kernel Processes LP 6 4 2 (Logical Processes) Contributions: Simulation: Kernel Process Fossil Collection / Rollback 9 5 PE 9 (Processing Element per CPU utilized)

  28. ROSS: Kernel Processes • Advantages: • significantly lowers fossil collection overheads • lowers memory usage by aggregation of LP statistics into KP statistics • retains ability to process events on an LP by LP basis in the forward computation. • Disadvantages: • potential for “false rollbacks” • care must be taken when deciding on how to map LPs to KPs

  29. ROSS: KP Efficiency Small trade-off: longer rollbacks vs faster FC Not enough work in system…

  30. ROSS: KP Performance Impact # KPs does not negatively impact performance

  31. ROSS: Performance vs GTW ROSS outperforms GTW 2:1 at best parallel ROSS outperforms GTW 2:1 in sequential

  32. Optimistic approach Relies on global virtual time (GVT) algorithm to perform fossil collection at regular intervals Events with timestamp less than GVT: Will not be rolled back Can be freed GVT calculation Synchronous algorithms: LPs stop event processing during GVT calculation Cost of synch. may be higher than positive work done per interval Processes waste time waiting Asynchronous algorithms: LPs continue processing events while GVT calculation continues in the background Goal: creating a consistent cut among LPs that divides the events into past and future the wall-clock time Simulation: Seven O’clock GVT Two problems: (i) Transient Message Problem, (ii) Simultaneous Reporting Problem

  33. Construct cut via message-passing Simulation: Mattern’s GVT Cost: O(log n) if tree, O(N) if ring • If large number of processors, then free pool exhausted waiting for GVT to complete

  34. Construct cut using shared memory flag Simulation: Fujimoto’s GVT Cost: O(1) Sequentially consistent memory model ensures proper causal order • Limited to shared memory architecture

  35. Sequentially consistent does not mean instantaneous Memory events are only guaranteed to be causally ordered Simulation: Memory Model Is there a method to achieve sequentially consistent shared memory in a loosely coordinated, distributed environment?

  36. Simulation: Seven O’clock GVT Key observations: • An operation can occur atomically within a network of processors if all processors observe that the event occurred at the same time. • CPU clock time scale (ns) is significantly smaller than network time-scale (ms). Network Atomic Operations (NAOs): • an agreed upon frequency in wall-clock time at which some event logically observed to have happened across a distributed system. • subset of the possible operations provided by a complete sequentially consistent memory model. Update Tables Update Tables Update Tables Update Tables Update Tables Update Tables Update Tables wall-clock time Compute GVT Compute GVT Compute GVT Compute GVT Compute GVT Compute GVT Compute GVT wall-clock time

  37. 7 LVT: 7 9 GVT: min(5,7) LVT: min(5,9) LVT: 5 5 10 GVT A B C D E

  38. Simulation: Seven O’clock GVT • Itanium-2 Cluster • r-PHOLD • 1,000,000 LPs • 10% remote events • 16 start events • 4 machines • 1-4 CPUs • 1.3 GHz • Round-robin LP to PE mapping Linear Performance

  39. Simulation: Seven O’clock GVT • Netfinity Cluster • r-PHOLD • 1,000,000 LPs • 10, 25% remote events • 16 start events • 4 machines • 2 CPUs, 36 nodes • 800 GHz

  40. Simulation: Seven O’clock GVT: TCP • Itanium-2 Cluster • 1,000,000 LPs • each modeling a TCP host (i.e. one end of a TCP connection). • 2 or 4 machines • 1-4 CPUs on each • 1.3 GHz • Poorly mapped LP/KP/PE Linear Performance

  41. Simulation: Seven O’clock GVT: TCP • Netfinity Cluster • 1,000,000 LPs • each modeling a TCP host (i.e. one end of a TCP connection). • 4-36 machines • 1-2 CPUs on each • Pentium III • 800MHz

  42. Simulation: Seven O’clock GVT: TCP • Sith Itanium-2 cluster • 1,000,000 LPs • each modeling a TCP host (i.e. one end of a TCP connection). • 4-36 machines • 1-2 CPUs on each • 900MHz

  43. Simulation: Seven O’clock GVT Summary • Seven O’Clock Algorithm • Clock-based algorithm for distributed processors • creates a sequentially consistent view of distributed memory • Zero-Cost Consistent Cut • Highly scalableand independent of event memory limits

  44. Summary: Contributions • Meta-simulation • ROSS.Net: platform for large-scale network simulation, experiment design and analysis • OSPFv2 protocol performance analysis • BGP4/OSPFv2 protocol interactions • Simulation • kernel processes • memory efficient, large-scale simulation • Seven O’clock GVT Algorithm • zero-cost consistent cut • high performance distributed execution

  45. Summary: Future Work • Meta-simulation • ROSS.Net: platform for large-scale network • incorporate more realistic measurement data, protocol models • CAIDA, Multi-cast, UDP, other TCP variants • more complex experiment designs  better qualitative analysis • Simulation • Seven O’clock GVT Algorithm • compute FFT and analyze “power” of different models • attempt to eliminate GVT algorithm by determining max rollback length

More Related