1 / 26

Performance Driven Crosstalk Elimination at Compiler Level

Performance Driven Crosstalk Elimination at Compiler Level. TingTing Hwang Department of Computer Science Tsing Hua University, Taiwan. Definition of 4C Crosstalk. A 3C, 4C crosstalk data transmission sequence on a bus. aggressor. victim. aggressor. Worst-Case Delay Comparison (ps).

Download Presentation

Performance Driven Crosstalk Elimination at Compiler Level

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance Driven Crosstalk Elimination at Compiler Level TingTing Hwang Department of Computer Science Tsing Hua University, Taiwan

  2. Definition of 4C Crosstalk • A 3C, 4Ccrosstalk data transmission sequence on a bus aggressor victim aggressor

  3. Worst-Case Delay Comparison (ps) Summarized from DATE 2004 “Exploiting Crosstalk to Speed up On-chip Buses,” Chunjie Duan

  4. b b n Sender Encoder Decoder Receiver channel Copied from ICCAD 2001 “Bus Encoding to Prevent Crosstalk Delay,” Bret Victor Previous Work • Bus encoding (expand Boolean space) • Hardware overhead: Encoders/Decoders/additional wires

  5. Motivation • Previous work using codec design • Logic level – no information of data • Large area overhead (e.g., 128 bus width: 128 + 85) • Data sequences on an instruction bus • Known during compile time • To eliminate crosstalk data sequence: • Instruction re-scheduling • Register renaming

  6. Problem Definition and Target Architecture • Given a program, • Generate a 4C (3C-and-4C) crosstalk-free program (on an instruction bus) • Performed in compiler optimization

  7. Crosstalk Elimination in Compiler Optimization Binary executable program Step 1 Decomposing the input Rename R2 to R3 NOP Insertion Interchange I4 and I5 Crosstalk Free program to basic blocks Step 2 Basic blocks Instruction rescheduling Step 3 Register renaming Step 4 NOP insertion Crosstalk - free binary executable program

  8. Step 2: Instruction Re-scheduling • Instructions reordered under constraints of data dependency • Construct a weighted Instruction Adjacency Graph

  9. A 11 6 0 B C 1 6 1 D E 1 Instruction Adjacency Graph • Node : instruction • Edge : execution sequence • Weight : the number of crosstalk patterns • If the crosstalk sequence is from unchangeable bits, the weight is set to be larger • Opcode, functional code, constants

  10. Instruction Re-scheduling • A weighted Instruction Adjacency Graph • Model instruction re-scheduling as a Traveling Salesman Problem (TSP) on IAG • To find a minimum weighted path that contains each node once and only once

  11. A A 11 6 11 6 0 B C 0 B C 1 6 1 1 6 1 D D E 1 E 1 Original Sequence Weight: 18 Minimum weight sequence Weight: 8 Results of TSP

  12. Step 3: Register Renaming • Registers can be renamed as long as live in/out and system preservative registers are not renamed. • Weighted Register Adjacency Graph : RAG • Node : register • Edge between nodes RA and RB : registers RA and RB are adjacent with each other • Weight : frequency

  13. Register Adjacency Graph A ADD R2, R1, R0 101, 010, 001, 000 C XOR R4, R0, R2 000, 100, 000, 010 B MUL R1, R2, R0 010, 001, 010, 000 D BIS R3, R1, 4 011, 011, 001, 100 E BIS R5, R3, R4 011, 101, 011, 100 R0 1 1 3 1 R1 R2 4 1 2 1 1 R3 R4 1 R5

  14. 4C Crosstalk-free Cliques • In order to rename all registers at a time, a databasecontaining all kinds of 4C crosstalk-free cliques with 5-bit code is pre-constructed.

  15. Register Renaming Algorithm REGISTER-RENAMING ( ) • Construct RAG • Do clique partitioning on RAG • while ( RAG is not NULL) { • Select a clique with maximum weight • Reassign all registers in the clique • Remove the clique from RAG • }

  16. Example of Register Renaming Assumption: R0 and R1 are live in registers, R5 is live out register 000 R0 R0 1 1 1 3 1 A A’ 100 1 R1 R2 4 R1 R4 4 2 C C’ 1 001 100 2 1 111 0 1 R7 R3 R4 R6 B’ B 110 1 0 R5 R5 101

  17. Step 4: NOP Insertion • An NOP • Is inserted between two instructions that induce 4C crosstalk • Is crosstalk-free with all other instructions • Does not change program functionality • Takes a clock period to execute and one memory space to store -> overhead

  18. Benchmarking Results

  19. Static Instruction Count OverheadSPEC2000 (CINT) 4﹒C Crosstalk-free

  20. Dynamic Instruction Count OverheadSPEC2000 (CINT)

  21. Computation of Improved Performance Ratio • 0.10 um, bus length: 10mm • Cycle length • With 4C : 1 • Without 4C : 0.8

  22. Improved Total Performance Ratio:SPEC2000 (CINT)

  23. Thank you

  24. Static Instruction Count Overhead: DSPstone

  25. Dynamic Instruction Count Overhead : DSPstone

  26. Improved Total Performance Ratio : DSPstone

More Related