Prediction of high performance on chip global interconnection
1 / 37

Prediction of High-Performance On-Chip Global Interconnection - PowerPoint PPT Presentation

  • Uploaded on

Prediction of High-Performance On-Chip Global Interconnection. Yulei Zhang 1 , Xiang Hu 1 , Alina Deutsch 2 , A. Ege Engin 3 James F. Buckwalter 1 , and Chung-Kuan Cheng 1 1 Dept. of ECE, UC San Diego, La Jolla, CA 2 IBM T. J. Watson Research Center, Yorktown Heights, NY

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Prediction of High-Performance On-Chip Global Interconnection' - holland

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Prediction of high performance on chip global interconnection

Prediction of High-Performance On-Chip Global Interconnection

Yulei Zhang1, Xiang Hu1, Alina Deutsch2, A. Ege Engin3

James F. Buckwalter1, and Chung-Kuan Cheng1

1Dept. of ECE, UC San Diego, La Jolla, CA

2IBM T. J. Watson Research Center, Yorktown Heights, NY

3Dept. of ECE, San Diego State Univ., San Diego, CA


  • Introduction

    • Technology trend

    • Current approaches

  • On-Chip Global Interconnection

    • Overview: structures, tradeoffs

    • Interconnect schemes

    • Global wire modeling

    • Performance analysis

  • Design Methodologies for T-line schemes

  • Prediction of Performance Metrics

    • Experimental settings

    • Performance metrics comparison and scaling trend

      • Latency

      • Energy per bit

      • Throughput

  • Signal Integrity

  • Conclusion

Introduction performance impact
Introduction – Performance Impact

  • Interconnect delay determines the system performance [ITRS08]

    • 542ps for 1mm minimum pitch Cu global wire w/o repeater @ 45nm

    • ~150ps for 10 level FO4 delay @ 45nm

[Ho2001] “Future of Wire”

Introduction power dissipation
Introduction – Power Dissipation

  • Interconnects consume a significant portion of power

    • 1-2 order larger in magnitude compared with gates

      • Half of the dynamic power dissipated on repeaters to minimize latency [Zhang07]

    • Wires consume 50% of total dynamic power for a 0.13um microprocessor [Magen04]

      • About 1/3 burned on the global wires.

Introduction different approaches and our contributions
Introduction – Different Approaches and Our Contributions

  • Different Approaches

    • Repeater Insertion Approach

      • Pros: High throughput density.

      • Cons: Overhead in terms of power consumption and wiring complexity.

    • T-line Approach [Zhang09]

      • Pros: Low latency.

      • Cons: low throughput density due to low bandwidth and large wire dimension

    • Equalized T-line Approach [Zhang08]

      • Pros: Low power, Low noise, Higher throughput than single-ended.

      • Cons: The area overhead brought by passive components.

  • We explore different global interconnection structures and compare their performance metrics across multiple technology nodes.

  • Contributions:

    • A simple linear model

    • A general design framework

    • A complete prediction and comparison

Multi dimensional design consideration
Multi-Dimensional Design Consideration

  • Preliminary analysis results assuming 65nm CMOS process.

  • Application-oriented choice

    • Low Latency

      T-TL or UT-TL -> Single-Ended T-lines

    • High Throughput


    • Low Power

      PE-TL or UE-TL

    • Low Noise

      PE-TL or UE-TL

    • Low Area/Cost


Differential T-lines

For each architecture, the more area the pentagon covers, the better overall performance is achieved.

On chip global interconnect schemes 1
On-Chip Global Interconnect Schemes (1)

  • R-RC structure

    • Repeater size/Length of segments

    • Adopt previous design methodology [Zhang07]

  • UT-TL structure

    • Full swing at wire-end

    • Tapered inverter chain as TX

  • T-TL structure

    • Optimize eye-height at wire-end

    • Non-Tapered inverter chain as TX

Repeated RC wires (R-RC)

Un-Terminatedand Terminated T-Line

(UT-TLand T-TL)

On chip global interconnect schemes 2
On-Chip Global Interconnect Schemes (2)

Un-Equalized andPassive-Equalized T-Line


  • Driver side: Tapered differential driver

  • Receiver side: Termination resistance, Sense-Amplifier (SA) + inverter chain

  • Passive equalizer: parallel RC network

  • Design Constraint: enough eye-opening (50mV) needed at the wire-end

Global wire modeling single ended differential on chip t lines
Global Wire Modeling – Single-Ended & Differential On-Chip T-lines

  • Orthogonal layers replaced by ground planes -> 2D cap extraction, accurate when loading density is high.

  • Top-layer thick wires used -> dimension maintains as technology scales.

  • LC-mode behavior dominant

Determine the bit rate

  • Smallest wire dimensions that satisfy eye constraint

  • Notice PE-TL needs narrower wire -> Equalization helps to increase density.

Global wire modeling rc wires and t lines
Global Wire Modeling – RC wires and T-lines

  • Distributed Π model composed of wire resistance and capacitance

  • Closed-form equations [Sim03] to calculate 2D wire capacitance

  • RC wire modeling

  • T-line 2D-R(f)L(f)C parameter extraction

  • T-line Modeling

    • R(f)L(f)C Tabular model -> Transient simulation to estimate eye-height.

    • Synthesized compact circuit model [Kopcsay02] -> Study signal integrity issue.

2D-C Extraction Template

2D-R(f)L(f) Extraction Template

Performance analysis definitions
Performance Analysis – Definitions

  • Normalized delay (unit: ps/mm)

    • Propagation delay includes wire delay and gate delay.

  • Normalized energy per bit (unit: pJ/m)

    • Bit rate is assumed to be the inverse of propagation delay for RC wires

  • Normalized throughput (unit: Gbps/um)

Performance analysis latency
Performance Analysis – Latency

  • Variables: technology-defined parameters

    • Supply voltage: Vdd (unit: V)

    • Dielectric constant:

    • Min-sized inverter FO4 delay: (unit: ps)

  • R-RC structure (min-d)

    • is roughly constant

    • FO4 delay scales w/ scaling factor S

  • T-line structures

    • Sum of wire delay and TX delay

    • Wire delay

    • TX delay improved w/ FO4 delay

Decreasing w/ technology scaling!

Increasing w/ technology scaling!

Performance analysis energy per bit
Performance Analysis – Energy per Bit

  • Same variables defined before

Constant !

  • R-RC structure (min-d)

    • Vdd reduces as technology scales

    • reduces as technology scales

  • T-line structures

    • Sum of power consumed on wire and TX.

    • Power of T-line

    • Power of TX circuit

    • FO4 delay reduces exponentially

Energy decreases w/ technology scaling!

Energy decreases w/ larger slope!!

Performance analysis throughput
Performance Analysis – Throughput

  • Same variables defined before

  • R-RC structure (min-d)

    • Assuming wire pitch

    • FO4 delay reduces exponentially

  • T-line structures

    • TX bandwidth

    • Neglect the minor change of wire pitch

    • K1 = 0, for UT-TL

    • FO4 delay reduces exponentially

Throughput increases by

20% per generation!

Throughput increases by

43% per generation !!

Design framework for on chip t line schemes
Design Framework for On-Chip T-line Schemes

  • Proposed framework can be applied to design UT-TL/T-TL/UE-TL/PE-TLby changing wire configuration and circuit structure.

  • Different optimization routines (LP/ILP/SQP, etc) can be adopted according to the problem formulation.

Experimental settings
Experimental Settings

  • Design objective: min-d

  • Technology nodes: 90nm-22nm

  • Five different global interconnection structures

  • Wire length:5mm

  • Parameter extraction

    • 2D field solver CZ2D from EIP tool suite of IBM

    • Tabular model or synthesized model

  • Transistor models

    • Predictive transistor model from [Uemura06]

    • Synopsys level 3 MOSFET model tuned according to ITRS roadmap

  • Simulation

    • HSPICE 2005

  • Modeling and Optimization

    • Linear or non-linear regression/SQP routine

    • MATLAB 2007

Performance metric normalized delay results and comparison
Performance Metric: Normalized Delay – Results and Comparison

  • Technology trends

    • R-RC ↑

    • T-line schemes ↓

  • T-line structures

    • Outperform R-RC beyond 90nm

    • Single-ended: lowest delay

  • At 22nm node

    • R-RC: 55ps/mm

    • T-lines: 8ps/mm (85%reduction)

    • Speed of light: 5ps/mm

  • Linear model

    • < 6% average percent error

Performance metric normalized energy per bit results and comparison
Performance Metric: Normalized Energy per Bit – Results and Comparison

  • Technology trends

    • R-RC and T-lines ↓

    • T-lines reduce more quickly

  • T-line structures

    • Outperform R-RC beyond 45nm

    • Differential: lowest energy.

    • Single-ended similar to R-RC.

      • T-TL > UT-TL

  • At 22nm node

    • R-RC: 100pJ/m

    • Single-ended: 60% reduction

    • Differential: 96% reduction

  • Linear model

    • < 12% average percent error

    • Error for T-TL and PE-TL

      • RL and passive equalizers.

Performance metric normalized throughput results and comparison
Performance Metric: Normalized Throughput – Results and Comparison

  • Technology trends

    • R-RC and T-lines ↑

    • T-lines increase more quickly

  • T-line structures

    • Outperform R-RC beyond 32nm

    • Differential better than single-ended

  • At 22nm node

    • R-RC: 12Gbps/um

    • T-TL: 30% improvement

    • UE-TL: 75% improvement

    • PE-TL: ~ 2X of R-RC

  • Linear model

    • < 7% average percent error

Signal integrity single ended t lines
Signal Integrity – single-ended T-lines

Worst-case switching pattern for peak noise simulation

Using w.c. pattern

Using single or multiple PRBS patterns

  • UT-TL structure

    • 380mV peak noise at 1V supply voltage w/ 7ps rise time

    • SI could be a big issue as supply voltage drops

  • T-TL less sensitive to noise

    • At the same rise time, ~ 50% reduction of peak noise

    • Peak noise ↓ as technology scales

Signal integrity differential t lines
Signal Integrity – differential T-lines

Worst-case switching pattern for peak noise simulation

  • More reliable

    • Termination resistance

    • Common-mode noise reduction

  • Peak noise

    • Within ~10mV range

  • Eye-Heights

    • UE-TL

      • Eye reduces as bit rate ↑

      • Harder to meet constraint.

    • PE-TL

      • > 70mV eye even at 22nm node

      • Equalization does help!


  • Compare five different global interconnections in terms of latency, energy per bit, throughput and signal integrity from 90nm to 22nm.

  • A simple linear model provided to link

    • Architecture-level performance metrics

    • Technology-defined parameters

  • Some observations from experimental results

    • T-line structures have potential to replace R-RC at future node

    • Differential T-lines are better thansingle-ended

      • Low-power/High-throughput/Low-noise

    • Equalizationcould be utilized for on-chip global interconnection

      • Higher throughput density, improve signal integrity

      • Even w/ lower energy dissipation (passive equalizations)

Introduction technology trend
Introduction – Technology Trend

Scaling trend of PUL wire resistance and capacitance

Copper resistivity versus wire width

  • On-Chip Interconnect Scaling

    • Dimension shrinks

      • Wire resistance increases -> RC delay

      • Increasing capacitive coupling -> delay, power, noise, etc.

    • Performance of global wires decreases w/ technology scaling.

Design methodology single ended t lines
Design methodology: single-ended T-lines

2D frequency-dependent

tabular Model

Inverter size,

number of stages,

Rload (if any)


Inverter chains



SPICE simulation to evaluate.

Optimization Routine:

1. Optimal cycle time

2. Sweep for optimal inverter chain

SPICE simulation to check in-plane crosstalk, etc

Design methodology differential t lines
Design methodology: differential T-lines

2D frequency-dependent

Tabular Model

Wire width;

Driver impedance;

RC equalizer (if any); Termination resistance.

Differential lines;

SA-based TX

Closed-form equation-based model

Evaluation based on models.

Optimization Routine:

1. Binary search for wire width

2. SQP for other var. optimization

SPICE simulation to check in-plane crosstalk, etc

Effects of driver impedance and termination resistance
Effects of driver impedance and termination resistance

  • Lowering driver impedance improves eye

  • Eye reduces as frequency goes up

  • Optimal termination resistance.

Effects of driver impedance and termination resistance on step response
Effects of driver impedance and termination resistance on step response

Optimal Rload

  • Larger driver impedance leads to slower rise edge and lower saturation voltage

  • Larger termination resistance causes sharper rise edge but with larger reflection

Crosstalk effects
Crosstalk effects step response

  • Three different PRBS input patterns, min-ddp solutions

  • T-line Scheme A: Delay increased by 9.6%, Power increased by 37%

  • T-line Scheme B: Delay increased by 2%, Power increased by 25.7%

Transceiver design
Transceiver Design step response

  • Sense amplifier (SA)

    • Double-tail latch-type [Schinkel 07]

    • Optimize sizing to minimize SA delay

  • Inverter chain

    • Number of stage

      • Fixed to 6

    • Sizing of each inverter

      • RS: output resistance of inverter chain

      • Sweep the 1st inverter size to minimize the total transceiver delay for given [Veye, RS]

Double-tail latch-type voltage sense amp.

@45nm tech node:

M1/M3: 45nm/45nm

M2/M4: 250nm/45nm

M5/M6: 180nm/45nm

M7/M8: 280nm/45nm

M9: 495nm/45nm

M10/M11: 200nm/45nm

M12: 1.58um/45nm

Transceiver modeling
Transceiver Modeling step response

  • Driver side

    • Voltage source Vswith output resistance Rs

    • Vs: full-swing pulse signal with rise time Tr=0.1Tc

    • Rs: output resistance of the last inverter in the chain.

  • Receiver side

    • Extract look-up table for TX delay and power

    • Fit the table using non-linear closed form formula

    • The relative error is within 2% for fitting models

Histogram of fitting errors at 45nm node

Transceiver delay map at 45nm node

Transceiver power map at 45nm node

Bit-rate: 50Gbps step response

Rs=11.06ohm, Rd=350ohm, Cd=0.38pF,


Conclusion cont
Conclusion (cont’) step response

Low-Latency Application (ps/mm)

Low-Energy Application (pJ/m)

Tech Node

Tech Node



High-Throughput Application (Gbps/um)

Low-Noise Application

Tech Node

Tech Node



Item in the table: score/value. Score: the higher, the better in terms of given metric, max. score is 5. The best structure in each column marked using red color.

Future works
Future Works step response

  • Explore novel global signaling schemes for high throughput and low energy dissipation.

    • Design, optimize > 50Gbps on-chip interconnection schemes

    • Architecture-level study to identify trade-offs

      • Wire configuration

        • Dimension optimization, ground plane, etc.

      • Un-interrupted architectures

        • Equalization implementation, TX/RX choice

      • Distributed architectures

        • Active or Passive compensation (RC equalizers, other networks, etc)

    • Novel high-speed transceiver circuitry design

    • Develop analysis and optimization capability to aid co-design and co-optimization of wire and transceiver circuit

    • Fabrication to verify analysis and demonstrate feasibility

Related publications
Related Publications step response

[Repeated RC Wire]

  • L. Zhang, H. Chen, B. Yao, K. Hamilton, and C.K. Cheng, “Repeated on-chip interconnect analysis and evaluation of delay, power and bandwidth metrics under different design goals,” IEEEInternational Symposium on Quality Electronic Design, 2007, pp.251-256.

  • Y. Zhang, L. Zhang, A. Deutsch, G. A. Katopis, D. M. Dreps, J. F. Buckwalter, E. S. Kuh and C.K. Cheng, “Design Methodology of High Performance On-Chip Global Interconnect Using Terminated Transmission-Line, ” IEEE International Symposium on Quality Electronic Design, 2009, pp.451-458.

  • Y. Zhang, L. Zhang, A. Tsuchiya, M. Hashimoto, and C.K. Cheng, “On-chip high performance signaling using passive compensation, ” IEEE International Conference on Computer Design, 2008, pp. 182-187.

  • Y. Zhang, L. Zhang, A. Deutsch, G. A. Katopis, D. M. Dreps, J. F. Buckwalter, E. S. Kuh, and C. K. Cheng, “On-chip bus signaling using passive compensation,” IEEE Electrical Performance of Electronic Packaging, 2008, pp. 33-36.

  • L. Zhang, Y. Zhang, A. Tsuchiya, M. Hashimoto, E. Kuh, and C.K. Cheng, “High performance on-chip differential signaling using passive compensation for global communication, ” Asia and South Pacific Design Automation Conference, 2009, pp. 385-390.

  • Y. Zhang, X. Hu, A. Deutsch, A. E. Engin, J. F. Buckwalter, and C. K. Cheng, “Prediction of High-Performance On-Chip Global Interconnection, ” ACM workshop on System Level Interconnection Prediction, 2009

[Un-Terminated/Terminated T-Line]

[Passive-Equalized T-Line]

[Overview and Comparison]