240 likes | 349 Views
This paper explores optimizations in simulator construction systems, focusing on enhancing the reusability of components. It discusses the Liberty Architecture Research Group's approach to architectural exploration using simulation tools. Given the challenges of a "reuse penalty," where reusable systems often result in slower performance, this research aims to mitigate these issues through innovative techniques, including two-tiered specifications, parameterized modular templates, and effective communication contracts. The findings show that custom simulation environments outperform traditional systems, emphasizing the importance of architecture in design iteration efficiency.
E N D
Optimizations for a Simulator Construction System Supporting Reusable Components David A. Penry and David I. August The Liberty Architecture Research Group Princeton University
Architecture Options Architectural Simulator Architectural Exploration • Architectural options are studied using simulators • More iterations = better decisions • Need fast path to simulator • Need fast simulator
Architecture Description Simulator Builder Architectural Simulator Instance Simulator Construction Systems • Reuse simulator infrastructure • But still must be able to reuse descriptions • Structural composition • Medium-grained components • Standard communication contracts • High parameterizability • Separation of concerns
The Reuse Penalty • Reusability leads to a speed penalty: • more component instances • more signals • more general code • Therefore: reusable systems are often slower How can we mitigate the reuse penalty?
Data Enable Ack Liberty Simulation Environment • Simulator construction system for high reuse • Two-tiered specifications • Leaf module templates in C • Netlisting language for instantiation and customization • Three-signal standard communications contract with overrides (control functions) • Code is generated
Contrast: SystemC • Simulator construction libraries (C++) • Partially supports reuse: + Structural composition + Module granularity varies ? Communications contracts by convention - Low parameterizability - Separation of concerns • Description is a C++ program
System C uses Discrete Event (DE) LSE uses Heterogenous Synchronous Reactive (HSR) Edwards (1997) Unparsed code blocks (black boxes) Values begin unresolved and resolve monotonically Chaotic scheduling A C A A A C C C A B B B B B B B A C C A C D D D D D D D Models of Computation
B A C D Potential HSR Benefits vs. DE • Static schedules possible • Lower per-signal overhead • Use of unresolved value to avoid redundant computation
Experimental methodology • Three models of a 4-way out-of-order microprocessor • SystemC using custom speed-optimized components • LSE model using custom speed-optimized components • LSE model using standard reusable components • 9 benchmarks (CPU 2000/MediaBench) • See paper for compiler, etc. Non-edge signals Model Signals Instances Custom SystemC 4 71 32 Custom LSE 3 138 48 Reusable LSE 11 489 423
Custom LSE vs. SystemC • Custom LSE outperforms custom SystemC • Reduction in overhead • Use of unresolved signal value • Static instantiation and code specialization • Dynamic schedule for both
Reuse Penalty • Reusable model suffers large reuse penalty (0.26) • Many more signals • Many more non-edge signals • More components • All dynamic schedules
A C D B Creating Static Schedules • Edward’s algorithm (1997) • Construct a signal dependency graph • Break into strongly-connected components (SCC). Schedule in topological order • Partition each SCC into a head and tail • Schedule tail recursively, then repeat head (any order)and tail’s schedule • Coalesce
1 2 A C 3 D 4 B Creating Static Schedules • Edward’s algorithm (1997) • Construct a signal dependency graph • Break into strongly-connected components (SCC). Schedule in topological order • Partition each SCC into a head and tail • Schedule tail recursively, then repeat head (any order)and tail’s schedule • Coalesce 2 1 3 4
1 2 A C 3 D 4 B Creating Static Schedules • Edward’s algorithm (1997) • Construct a signal dependency graph • Break into strongly-connected components (SCC). Schedule in topological order • Partition each SCC into a head and tail • Schedule tail recursively, then repeat head (any order)and tail’s schedule • Coalesce 2 1 3 b 4 a c Schedule: a b c
1 2 A C 3 D 4 B Creating Static Schedules • Edward’s algorithm (1997) • Construct a signal dependency graph • Break into strongly-connected components (SCC). Schedule in topological order • Partition each SCC into a head and tail • Schedule tail recursively, then repeat head (any order)and tail’s schedule • Coalesce T 2 H 1 3 b 4 a c Schedule: 1 b 4
1 2 A C 3 D 4 B Creating Static Schedules • Edward’s algorithm (1997) • Construct a signal dependency graph • Break into strongly-connected components (SCC). Schedule in topological order • Partition each SCC into a head and tail • Schedule tail recursively, then repeat head (any order)and tail’s schedule • Coalesce T 2 H 1 3 b 4 a c Schedule: 1 2 3 2 4
1 2 A C 3 D 4 B Creating Static Schedules • Edward’s algorithm (1997) • Construct a signal dependency graph • Break into strongly-connected components (SCC). Schedule in topological order • Partition each SCC into a head and tail • Schedule tail recursively, then repeat head (any order)and tail’s schedule • Coalesce T 2 H 1 3 B 4 A C Schedule: 1 2 3 2 4 A B C B (D) Choosing an optimal partition is exponential
A B C Dynamic sub-schedule embedding SCCs arise due to incomplete information • “Optimal” schedules are optimal w.r.t. information • “Optimal” schedule may be worse than dynamic When an SCC is “too big”, just schedule that section dynamically
A B C Dependency information enchancement • In practice, we see big SCCs • Peek in the black box • Simple parsing of communication overrides (control functions) • Can ask user to tell about internal dependencies • Not too painful because it is reused
Evaluation of Information Enhancement • Control function parsing more useful alone • Not principally through scheduling • It is important to have both kinds of enhancement
Reuse Penalty Revisited • Reuse penalty mitigated in part Reusable LSE model 6% faster than custom SystemC
Conclusions • A tradeoff exists between speed and reuse • The simulator construction system can help • Higher base speed makes reuse penalty less painful • Optimizations are possible with HSR model • Ability of scheduler adapt to information available is powerful • This adaptation is not possible with DE • You can have high reuse at reasonable speeds
Future Work • Release of LSE • Fall 2003 • http://liberty.princeton.edu • Hybrid model of computation • Embed HSR in DE, DE in HSR • Automatic extraction of HSR portions from DE
Other optimizations • Improved block coalescing • See paper • Code specialization • Implementation of APIs depends upon environment