1 / 24

Exploiting Crosstalk to Speed up On-chip Buses

Exploiting Crosstalk to Speed up On-chip Buses. Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder. Outline. Introduction Classification of Cross-talk types The Story so far.. Eliminating 3C and 4C sequences Eliminating 4C sequences

jtan
Download Presentation

Exploiting Crosstalk to Speed up On-chip Buses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

  2. Outline • Introduction • Classification of Cross-talk types • The Story so far.. • Eliminating 3C and 4C sequences • Eliminating 4C sequences • Eliminating 2C sequences • Eliminating 1C sequences • Experimental Results • Conclusions

  3. v a a CI CI v CL CL CL a CI CI w a v a CI CI v a a CI CI v a a CI CI v CL CL CL CI CI v v CL CL CL CL CL CL a CL CL CL a CL CL CL a s t Introduction • Deep sub-micron process • Verified cross-talk trends • Accurate 3-D capacitance extraction • Delay variation 2.47:1 (200 mm wires, 10X drivers, 0.1 mm technology)

  4. Cross-talk vs Bus Data Pattern • When λ ~ 0.1μm, r = CI/CL ~ 10 (metal 4) • Effective total capacitance depends on bus data sequence : • Best case: 0 x CI • Worst case: 4 x CI 0·CI 0·CI 2·CI 2·CI Ctotal = 4 ·CI Ctotal = 0 ·CI

  5. Classification of Cross-talk • 4·C sequence: • 3·C sequence: • 2·C sequence: • 1·C sequence: • 0·C sequence: • Forbidden patterns (“010” and “101”) • Maximum bus data rate depends on total capacitance seen by any bit

  6. Previous work – Eliminating 3C & 4C Sequences • Simple approach: shielding • No 3C/4C sequences ; bus-width is doubled • Theorem: If no forbidden patterns are allowed on the bus, • Proof: see “Analysis and Avoidance of Cross-talk in Buses” – Duan, Tirumala, Khatri (Hot Interconnects August 2001). • So we simply encode the data on the bus to get rid of the forbidden patterns • Recurrence equation for asymptotic bus overhead • CODEC implementation to demonstrate practicality

  7. Eliminating 3C & 4C sequences • 44% asymptotic overhead • Look-Up Table, straightforward, can achieve minimum overhead (44%), but not practical • Our implementation • 62.5% overhead (higher than minimum) • Modular and straightforward • Break bus into 4-bit groups • Encode each group independently (4bit -> 5 bit) • Additional logic to handle across- group forbidden patterns

  8. Previous Work - Eliminating 4C sequences • Less aggressive: eliminating 4C sequences only • Less overhead(33%) • Simpler algorithm: • Divide the bus into 3 bit groups • When 4C sequence occurs, complement group data • Insert group complement indicator • Special handling for across-group 4C sequences (see paper for details) • 101 001 -> 010 010 • 1010 0010 -> 1011 0100

  9. Recovered sequence Recovered sequence Random sequence Random sequence encoder encoder decoder decoder driver driver receiver receiver CODEC Results • Compare waveform with and without coding • Random input sequence • Encoder/decoder delay ~250ps (memoryless) • Max data rate more than 2X compared to scheme with no encoding • Speedup is data pattern independent

  10. CODEC Results … 2 • Bus length 5mm, 10mm or 20mm • Driver strength 30X, 60X and 120X of minimum

  11. Further Speedup Possible? • Can we exploit crosstalk to further speed up the bus? • Eliminate 2C sequences • Eliminate 1C sequences • Simulation shows that eliminating 2C sequences results in a speedup of 2X – 4X over eliminating 3C/4C sequences • Note that we seek memory-less CODEC based techniques • Let’s look at eliminating 2C and 1C sequences next…

  12. Eliminating 2C sequences • How to guarantee a 2C free sequence? • Find a vector clique such that any pair of elements in this clique only exhibit 1C transitions between them • For an n bit bus, we need a k bit encoded bus (k > n) such that the new bus has a 2C free clique of cardinality greater than or equal 2n • Solution is memoryless (no need to “remember” the last transmit word) • Fast and simple CODEC implementation • We have an inductive method to construct 2C free cliques

  13. Constructing 2C free Cliques • Inductive method, extends a known clique Cn = {v} • Let v’ = v . vn • First set Cn+1 = {}, and Cn+1 <= Cn+1 U v’ • Definition: the 0-extended subset of Cn+1 is: • Definition: the 1-extended subset of Cn+1 is: • Constructing • Create a new vector and • Add the vector unless there exist a vector in S1 such that: and • Constructing : similar to • Finally where • Theorem: Both sets of the previous step are 2C free cliques. Proof - see paper

  14. Constructing 2C free Cliques … 2 • Some observations about the construction • Vectors ending with ’01’ and ’10’ can not co-exist in Cn • The first n-bits of any vector of Cn+1 is the same as some vector of Cn and the last two bits are “00” or “11”. • In other words, Cn+1 is at least as large as Cn • Because of (a), we know there will be no “011” or “100” in the same clique Cn+1 • So we can construct vectors of Cn+1 ending in “001” or “110” by add ‘1’ to vectors ending with “00” or add ‘0’ to vectors end with “11”. • However, we can not have both

  15. Constructing 2C free Cliques … 3 • Consider the construction of C4 from C3: • Quadratic number of tests required as described above. We can do better…

  16. Clique Extension Algorithm • Constructing Cn+1 from Cn using the 0-extended subset • Similar algorithm when we use the 1-extended subset • append ‘0’ to n-bit vectors ending with ‘0’ • append ‘1’ to n-bit vectors ending with ‘1’ • since we use the 0-extended subset of Cn+1 • If there is no n-bit vector ending with ’01’ • Append ‘1’ to vectors ending with ’00’ • If there is no n-bit vector ending with ’11’ • Append ‘1’ to vectors ending with ‘10’ • The new clique has no vectors ending with ’10’

  17. Clique Extension Algorithm … 2 • Simply perform both versions of the clique extension algorithm • Select the result according to the rule: • where • Some values of clique sizes:

  18. Area Overhead Trends • Asymptotic overhead is 146% • Lower for smaller bus sizes. • Suggests partitioning of bus into smaller sections

  19. A B C A B C 1C free Configurations • 1C free sequences have least delay (typically 50% of 2C free sequences) • Just send any data bit multiple times (3/5…) • No encoder/decoder needed (no extra codec delay) • Simulation shows it’s the fastest compared to any other techniques with similar area overhead: • 3x (or 5x) separation between wires • Widening the trace (3x): small R, bigger C A B C

  20. Bus configurations for 1C delay • We simulated the delay of several different bus configurations • Different configurations yield different delay and area trade-offs w w w w w variable w w w w w w w w w w variable w w w w w A: 3-wire group, fixed spacing within group, variable spacing between groups. B: similar to A but with a ground shielding between groups. w w w w w w w w w variable w w w w w w w w w variable variable variable D: 5-wire group, fixed spacing within group, variable spacing between groups. largest overhead C: no shielding wires, vary wire sizes and spacing

  21. 1C free Configurations • Circuit parameters are extracted using SPACE3D • Bus simulations • CODEC was not modeled • Spice3f5, 0.1μm BPTM model • Transmission line with inter-wire coupling • Quantify actual delay of 1C free bus vector sequences for the 4 configurations described • 20mm wire, 30X driver (IDEAL 1C free delay 153ps, 3C free delay 793ps)

  22. Delays for 1C free Configurations • Configuration C has significantly larger delay than others (3X) since it’s essentially a 3C free configuration (has no shielding) • All other configurations shows up to 2.5X speed up over 3C free bus. • For all configurations, the actually delays are larger than IDEAL 0C delay • This is caused by skew on the outer shielding wires • Transition of dynamic shields of any wire are slightly misaligned • Verified by intentionally skewing the delay on signals

  23. Conclusions • Inter-wire capacitance increasingly significant for DSM VLSI bus delays • We have developed an array of CODECs to trade off bus area overhead with delay • 4C free = 33% • 3C free = 62% • 2C free = 146% (asymptotic), up to 4X to 6X faster • Inductive algorithm for 2C free clique construction • Simulated several 1C free configurations for area overhead and delays (no CODECs) • 1C free techniques not as fast as expected

  24. Thank You!

More Related