1 / 13

Presenter : Ching -Hua Huang

National Sun Yat-sen University Embedded System Laboratory. Temporal Parallel Simulation: A Fast Gate-level HDL Simulation Using Higher Level Models Cited count : 3 Dusung Kim ; Ciesielski , M. ;  Dept. of Electr . & Comput . Eng., Univ. of Massachusetts, Amherst, MA, USA

mairi
Download Presentation

Presenter : Ching -Hua Huang

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. National Sun Yat-sen University Embedded System Laboratory Temporal Parallel Simulation: A Fast Gate-level HDL Simulation Using Higher Level Models Cited count : 3 DusungKim ; Ciesielski, M. ;  Dept. of Electr. & Comput. Eng., Univ. of Massachusetts, Amherst, MA, USA Kyuho Shim ; Seiyang Yang ; Dept. of Comput. Eng. Pusan National Univ., Busan, Korea  Design, Automation & Test in Europe Conference & Exhibition (DATE), 2011 Presenter :Ching-Hua Huang

  2. Abstract Simulation speedup offered by distributed parallel event-driven simulation is known to be seriously limited by the synchronization and communication overhead. These limiting factors are particularly severe in gate-level timing simulation. This paper describes a radically different approach to gate-level simulation based on a concept of temporal rather than conventional spatial parallelism. The proposed method partitions the entire simulation run into simulation slices in temporal domain and each slice is simulated separately. With each slice being independent from each other, an almost linear speedup is achievable with a large number of simulation nodes.

  3. Abstract (Cont.) This concept naturally enables “correct by simulation” methodology that explicitly maintains the consistency between the reference and the target specifications. Experimental results clearly show a significant simulation speed-up.

  4. What’s the problem 4 • The performance of hardware simulation • For complex designs becomes prohibitively low. • Limited by the synchronization and communication overhead. • Proposed method to solve above problem • A radically different approach to gate-level simulation based on a concept of temporal parallelism.

  5. Related Work [7] SimCluster partitions the design into separate modules and performs concurrent simulation [6] Parallel Discrete Event Simulation (PDES) [9] Principles of conservative parallel simulation [13] speed up [12] performance improvement Alarge gate-level decoder design improvement Developed the first Verilog distributed simulator Rollback-based synchronization lock-step based synchronization [This paper] Temporal Parallel Simulation: A Fast Gate-level HDL Simulation Using Higher Level Models [2] TPSim – GL timing simulation Thebasic idea of this approach and preliminary results for special cases were introduced.

  6. Proposed method – TPSim (1) State checking (2) State matching • TPSim (Temporal Parallel Simulation) • (1) Partitions the entire simulation into slices in temporal domain. • (2) Each slice is simulated separately. • It consists of two major steps: • Fast reference simulation • Performed on a high-level abstraction of the design. • To store essential state information. • Detailed, fine-grained target simulation • Performed on a lower level (gate-level) model. • It is applied in parallel to each simulation slice.

  7. Difficulties in Generalization of Temporal Parallelism (1) DataA[N-1:0] ReqB ClkB • Multiple Asynchronous Clocks • Multiple-clock design may not be 100% cycle-by-cycle consistent with the RTL simulation. • Proposed solution : Abstract delay annotation method • Allowed to overlap by the value equal to the longest delay in the design

  8. Difficulties in Generalization of Temporal Parallelism (2) • State Checkpointing in Event-driven Simulation • Finding correct placement for checkpoints is more difficult because of arbitrary delay between the event edges. • Proposed solution : Checkpoint window • The size of the checkpoint window is one clock-cycle equivalent • The correct value for Q could be reliably obtained at the end of each window • Overlap period must be increased accordingly so that it contains the entire target checkpoint window.

  9. Difficulties in Generalization of Temporal Parallelism (3) • State Matching • Maintain functional correctness of the restored target state. • During synthesis the design undergoes a number of logic transformations • Combinational and sequential logic optimization, retiming, and algebraic transformations • Proposed solution : A promising preliminary work in state matching has recently been published in [17]. • Handling testbench • Testbenchis a sequential process • It has no hardware “states” ,so it cannot be restarted at an arbitrary point of time. • Proposed solution : Testbench forwarding • Saved continuously during thereference simulation

  10. Before the experiment… • How many performance can TPSim improve ? • Slices • Multiple clock issue ? • Tool selection • Synthesis : Design Compiler • Cell library : 65nm technology library • Simulator : NC-Sim 8.2

  11. Experiment 1 – JPEG Encoder • This design was from OpenCores • Total gate count of GL design is 0.9M

  12. Experiment 2 – AES (Advanced Encryption Standard) • This design was from OpenCores • Total gate count of GL design is 25K

  13. Conclusions and My comments • Conclusions • This is accomplished by performing temporal partitioning of the simulation period. • This paper provides not only significant performance improvement but also a smarter method for simulation-based verification. • My comments • Because, I have some problem about the Performance gap between RTL and GL timing simulation. • This paper give me a other reference about this area.

More Related