prediction of high performance on chip global interconnection
Skip this Video
Download Presentation
Prediction of High-Performance On-Chip Global Interconnection

Loading in 2 Seconds...

play fullscreen
1 / 37

Prediction of High-Performance On-Chip Global Interconnection - PowerPoint PPT Presentation

  • Uploaded on

Prediction of High-Performance On-Chip Global Interconnection. Yulei Zhang 1 , Xiang Hu 1 , Alina Deutsch 2 , A. Ege Engin 3 James F. Buckwalter 1 , and Chung-Kuan Cheng 1 1 Dept. of ECE, UC San Diego, La Jolla, CA 2 IBM T. J. Watson Research Center, Yorktown Heights, NY

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Prediction of High-Performance On-Chip Global Interconnection' - holland

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
prediction of high performance on chip global interconnection

Prediction of High-Performance On-Chip Global Interconnection

Yulei Zhang1, Xiang Hu1, Alina Deutsch2, A. Ege Engin3

James F. Buckwalter1, and Chung-Kuan Cheng1

1Dept. of ECE, UC San Diego, La Jolla, CA

2IBM T. J. Watson Research Center, Yorktown Heights, NY

3Dept. of ECE, San Diego State Univ., San Diego, CA

  • Introduction
    • Technology trend
    • Current approaches
  • On-Chip Global Interconnection
    • Overview: structures, tradeoffs
    • Interconnect schemes
    • Global wire modeling
    • Performance analysis
  • Design Methodologies for T-line schemes
  • Prediction of Performance Metrics
    • Experimental settings
    • Performance metrics comparison and scaling trend
      • Latency
      • Energy per bit
      • Throughput
  • Signal Integrity
  • Conclusion
introduction performance impact
Introduction – Performance Impact
  • Interconnect delay determines the system performance [ITRS08]
    • 542ps for 1mm minimum pitch Cu global wire w/o repeater @ 45nm
    • ~150ps for 10 level FO4 delay @ 45nm

[Ho2001] “Future of Wire”

introduction power dissipation
Introduction – Power Dissipation
  • Interconnects consume a significant portion of power
    • 1-2 order larger in magnitude compared with gates
      • Half of the dynamic power dissipated on repeaters to minimize latency [Zhang07]
    • Wires consume 50% of total dynamic power for a 0.13um microprocessor [Magen04]
      • About 1/3 burned on the global wires.
introduction different approaches and our contributions
Introduction – Different Approaches and Our Contributions
  • Different Approaches
    • Repeater Insertion Approach
      • Pros: High throughput density.
      • Cons: Overhead in terms of power consumption and wiring complexity.
    • T-line Approach [Zhang09]
      • Pros: Low latency.
      • Cons: low throughput density due to low bandwidth and large wire dimension
    • Equalized T-line Approach [Zhang08]
      • Pros: Low power, Low noise, Higher throughput than single-ended.
      • Cons: The area overhead brought by passive components.
  • We explore different global interconnection structures and compare their performance metrics across multiple technology nodes.
  • Contributions:
    • A simple linear model
    • A general design framework
    • A complete prediction and comparison
multi dimensional design consideration
Multi-Dimensional Design Consideration
  • Preliminary analysis results assuming 65nm CMOS process.
  • Application-oriented choice
    • Low Latency

T-TL or UT-TL -> Single-Ended T-lines

    • High Throughput


    • Low Power


    • Low Noise


    • Low Area/Cost


Differential T-lines

For each architecture, the more area the pentagon covers, the better overall performance is achieved.

on chip global interconnect schemes 1
On-Chip Global Interconnect Schemes (1)
  • R-RC structure
    • Repeater size/Length of segments
    • Adopt previous design methodology [Zhang07]
  • UT-TL structure
    • Full swing at wire-end
    • Tapered inverter chain as TX
  • T-TL structure
    • Optimize eye-height at wire-end
    • Non-Tapered inverter chain as TX

Repeated RC wires (R-RC)

Un-Terminatedand Terminated T-Line

(UT-TLand T-TL)

on chip global interconnect schemes 2
On-Chip Global Interconnect Schemes (2)

Un-Equalized andPassive-Equalized T-Line


  • Driver side: Tapered differential driver
  • Receiver side: Termination resistance, Sense-Amplifier (SA) + inverter chain
  • Passive equalizer: parallel RC network
  • Design Constraint: enough eye-opening (50mV) needed at the wire-end
global wire modeling single ended differential on chip t lines
Global Wire Modeling – Single-Ended & Differential On-Chip T-lines
  • Orthogonal layers replaced by ground planes -> 2D cap extraction, accurate when loading density is high.
  • Top-layer thick wires used -> dimension maintains as technology scales.
  • LC-mode behavior dominant

Determine the bit rate

  • Smallest wire dimensions that satisfy eye constraint
  • Notice PE-TL needs narrower wire -> Equalization helps to increase density.
global wire modeling rc wires and t lines
Global Wire Modeling – RC wires and T-lines
  • Distributed Π model composed of wire resistance and capacitance
  • Closed-form equations [Sim03] to calculate 2D wire capacitance
  • RC wire modeling
  • T-line 2D-R(f)L(f)C parameter extraction
  • T-line Modeling
    • R(f)L(f)C Tabular model -> Transient simulation to estimate eye-height.
    • Synthesized compact circuit model [Kopcsay02] -> Study signal integrity issue.

2D-C Extraction Template

2D-R(f)L(f) Extraction Template

performance analysis definitions
Performance Analysis – Definitions
  • Normalized delay (unit: ps/mm)
    • Propagation delay includes wire delay and gate delay.
  • Normalized energy per bit (unit: pJ/m)
    • Bit rate is assumed to be the inverse of propagation delay for RC wires
  • Normalized throughput (unit: Gbps/um)
performance analysis latency
Performance Analysis – Latency
  • Variables: technology-defined parameters
    • Supply voltage: Vdd (unit: V)
    • Dielectric constant:
    • Min-sized inverter FO4 delay: (unit: ps)
  • R-RC structure (min-d)
    • is roughly constant
    • FO4 delay scales w/ scaling factor S
  • T-line structures
    • Sum of wire delay and TX delay
    • Wire delay
    • TX delay improved w/ FO4 delay

Decreasing w/ technology scaling!

Increasing w/ technology scaling!

performance analysis energy per bit
Performance Analysis – Energy per Bit
  • Same variables defined before

Constant !

  • R-RC structure (min-d)
    • Vdd reduces as technology scales
    • reduces as technology scales
  • T-line structures
    • Sum of power consumed on wire and TX.
    • Power of T-line
    • Power of TX circuit
    • FO4 delay reduces exponentially

Energy decreases w/ technology scaling!

Energy decreases w/ larger slope!!

performance analysis throughput
Performance Analysis – Throughput
  • Same variables defined before
  • R-RC structure (min-d)
    • Assuming wire pitch
    • FO4 delay reduces exponentially
  • T-line structures
    • TX bandwidth
    • Neglect the minor change of wire pitch
    • K1 = 0, for UT-TL
    • FO4 delay reduces exponentially

Throughput increases by

20% per generation!

Throughput increases by

43% per generation !!

design framework for on chip t line schemes
Design Framework for On-Chip T-line Schemes
  • Proposed framework can be applied to design UT-TL/T-TL/UE-TL/PE-TLby changing wire configuration and circuit structure.
  • Different optimization routines (LP/ILP/SQP, etc) can be adopted according to the problem formulation.
experimental settings
Experimental Settings
  • Design objective: min-d
  • Technology nodes: 90nm-22nm
  • Five different global interconnection structures
  • Wire length:5mm
  • Parameter extraction
    • 2D field solver CZ2D from EIP tool suite of IBM
    • Tabular model or synthesized model
  • Transistor models
    • Predictive transistor model from [Uemura06]
    • Synopsys level 3 MOSFET model tuned according to ITRS roadmap
  • Simulation
    • HSPICE 2005
  • Modeling and Optimization
    • Linear or non-linear regression/SQP routine
    • MATLAB 2007
performance metric normalized delay results and comparison
Performance Metric: Normalized Delay – Results and Comparison
  • Technology trends
    • R-RC ↑
    • T-line schemes ↓
  • T-line structures
    • Outperform R-RC beyond 90nm
    • Single-ended: lowest delay
  • At 22nm node
    • R-RC: 55ps/mm
    • T-lines: 8ps/mm (85%reduction)
    • Speed of light: 5ps/mm
  • Linear model
    • < 6% average percent error
performance metric normalized energy per bit results and comparison
Performance Metric: Normalized Energy per Bit – Results and Comparison
  • Technology trends
    • R-RC and T-lines ↓
    • T-lines reduce more quickly
  • T-line structures
    • Outperform R-RC beyond 45nm
    • Differential: lowest energy.
    • Single-ended similar to R-RC.
      • T-TL > UT-TL
  • At 22nm node
    • R-RC: 100pJ/m
    • Single-ended: 60% reduction
    • Differential: 96% reduction
  • Linear model
    • < 12% average percent error
    • Error for T-TL and PE-TL
      • RL and passive equalizers.
performance metric normalized throughput results and comparison
Performance Metric: Normalized Throughput – Results and Comparison
  • Technology trends
    • R-RC and T-lines ↑
    • T-lines increase more quickly
  • T-line structures
    • Outperform R-RC beyond 32nm
    • Differential better than single-ended
  • At 22nm node
    • R-RC: 12Gbps/um
    • T-TL: 30% improvement
    • UE-TL: 75% improvement
    • PE-TL: ~ 2X of R-RC
  • Linear model
    • < 7% average percent error
signal integrity single ended t lines
Signal Integrity – single-ended T-lines

Worst-case switching pattern for peak noise simulation

Using w.c. pattern

Using single or multiple PRBS patterns

  • UT-TL structure
    • 380mV peak noise at 1V supply voltage w/ 7ps rise time
    • SI could be a big issue as supply voltage drops
  • T-TL less sensitive to noise
    • At the same rise time, ~ 50% reduction of peak noise
    • Peak noise ↓ as technology scales
signal integrity differential t lines
Signal Integrity – differential T-lines

Worst-case switching pattern for peak noise simulation

  • More reliable
    • Termination resistance
    • Common-mode noise reduction
  • Peak noise
    • Within ~10mV range
  • Eye-Heights
    • UE-TL
      • Eye reduces as bit rate ↑
      • Harder to meet constraint.
    • PE-TL
      • > 70mV eye even at 22nm node
      • Equalization does help!
  • Compare five different global interconnections in terms of latency, energy per bit, throughput and signal integrity from 90nm to 22nm.
  • A simple linear model provided to link
    • Architecture-level performance metrics
    • Technology-defined parameters
  • Some observations from experimental results
    • T-line structures have potential to replace R-RC at future node
    • Differential T-lines are better thansingle-ended
      • Low-power/High-throughput/Low-noise
    • Equalizationcould be utilized for on-chip global interconnection
      • Higher throughput density, improve signal integrity
      • Even w/ lower energy dissipation (passive equalizations)
introduction technology trend
Introduction – Technology Trend

Scaling trend of PUL wire resistance and capacitance

Copper resistivity versus wire width

  • On-Chip Interconnect Scaling
    • Dimension shrinks
      • Wire resistance increases -> RC delay
      • Increasing capacitive coupling -> delay, power, noise, etc.
    • Performance of global wires decreases w/ technology scaling.
design methodology single ended t lines
Design methodology: single-ended T-lines

2D frequency-dependent

tabular Model

Inverter size,

number of stages,

Rload (if any)


Inverter chains



SPICE simulation to evaluate.

Optimization Routine:

1. Optimal cycle time

2. Sweep for optimal inverter chain

SPICE simulation to check in-plane crosstalk, etc

design methodology differential t lines
Design methodology: differential T-lines

2D frequency-dependent

Tabular Model

Wire width;

Driver impedance;

RC equalizer (if any); Termination resistance.

Differential lines;

SA-based TX

Closed-form equation-based model

Evaluation based on models.

Optimization Routine:

1. Binary search for wire width

2. SQP for other var. optimization

SPICE simulation to check in-plane crosstalk, etc

effects of driver impedance and termination resistance
Effects of driver impedance and termination resistance
  • Lowering driver impedance improves eye
  • Eye reduces as frequency goes up
  • Optimal termination resistance.
effects of driver impedance and termination resistance on step response
Effects of driver impedance and termination resistance on step response

Optimal Rload

  • Larger driver impedance leads to slower rise edge and lower saturation voltage
  • Larger termination resistance causes sharper rise edge but with larger reflection
crosstalk effects
Crosstalk effects
  • Three different PRBS input patterns, min-ddp solutions
  • T-line Scheme A: Delay increased by 9.6%, Power increased by 37%
  • T-line Scheme B: Delay increased by 2%, Power increased by 25.7%
transceiver design
Transceiver Design
  • Sense amplifier (SA)
    • Double-tail latch-type [Schinkel 07]
    • Optimize sizing to minimize SA delay
  • Inverter chain
    • Number of stage
      • Fixed to 6
    • Sizing of each inverter
      • RS: output resistance of inverter chain
      • Sweep the 1st inverter size to minimize the total transceiver delay for given [Veye, RS]

Double-tail latch-type voltage sense amp.

@45nm tech node:

M1/M3: 45nm/45nm

M2/M4: 250nm/45nm

M5/M6: 180nm/45nm

M7/M8: 280nm/45nm

M9: 495nm/45nm

M10/M11: 200nm/45nm

M12: 1.58um/45nm

transceiver modeling
Transceiver Modeling
  • Driver side
    • Voltage source Vswith output resistance Rs
    • Vs: full-swing pulse signal with rise time Tr=0.1Tc
    • Rs: output resistance of the last inverter in the chain.
  • Receiver side
    • Extract look-up table for TX delay and power
    • Fit the table using non-linear closed form formula
    • The relative error is within 2% for fitting models

Histogram of fitting errors at 45nm node

Transceiver delay map at 45nm node

Transceiver power map at 45nm node


Bit-rate: 50Gbps

Rs=11.06ohm, Rd=350ohm, Cd=0.38pF,


conclusion cont
Conclusion (cont’)

Low-Latency Application (ps/mm)

Low-Energy Application (pJ/m)

Tech Node

Tech Node



High-Throughput Application (Gbps/um)

Low-Noise Application

Tech Node

Tech Node



Item in the table: score/value. Score: the higher, the better in terms of given metric, max. score is 5. The best structure in each column marked using red color.

future works
Future Works
  • Explore novel global signaling schemes for high throughput and low energy dissipation.
    • Design, optimize > 50Gbps on-chip interconnection schemes
    • Architecture-level study to identify trade-offs
      • Wire configuration
        • Dimension optimization, ground plane, etc.
      • Un-interrupted architectures
        • Equalization implementation, TX/RX choice
      • Distributed architectures
        • Active or Passive compensation (RC equalizers, other networks, etc)
    • Novel high-speed transceiver circuitry design
    • Develop analysis and optimization capability to aid co-design and co-optimization of wire and transceiver circuit
    • Fabrication to verify analysis and demonstrate feasibility
related publications
Related Publications

[Repeated RC Wire]

  • L. Zhang, H. Chen, B. Yao, K. Hamilton, and C.K. Cheng, “Repeated on-chip interconnect analysis and evaluation of delay, power and bandwidth metrics under different design goals,” IEEEInternational Symposium on Quality Electronic Design, 2007, pp.251-256.
  • Y. Zhang, L. Zhang, A. Deutsch, G. A. Katopis, D. M. Dreps, J. F. Buckwalter, E. S. Kuh and C.K. Cheng, “Design Methodology of High Performance On-Chip Global Interconnect Using Terminated Transmission-Line, ” IEEE International Symposium on Quality Electronic Design, 2009, pp.451-458.
  • Y. Zhang, L. Zhang, A. Tsuchiya, M. Hashimoto, and C.K. Cheng, “On-chip high performance signaling using passive compensation, ” IEEE International Conference on Computer Design, 2008, pp. 182-187.
  • Y. Zhang, L. Zhang, A. Deutsch, G. A. Katopis, D. M. Dreps, J. F. Buckwalter, E. S. Kuh, and C. K. Cheng, “On-chip bus signaling using passive compensation,” IEEE Electrical Performance of Electronic Packaging, 2008, pp. 33-36.
  • L. Zhang, Y. Zhang, A. Tsuchiya, M. Hashimoto, E. Kuh, and C.K. Cheng, “High performance on-chip differential signaling using passive compensation for global communication, ” Asia and South Pacific Design Automation Conference, 2009, pp. 385-390.
  • Y. Zhang, X. Hu, A. Deutsch, A. E. Engin, J. F. Buckwalter, and C. K. Cheng, “Prediction of High-Performance On-Chip Global Interconnection, ” ACM workshop on System Level Interconnection Prediction, 2009

[Un-Terminated/Terminated T-Line]

[Passive-Equalized T-Line]

[Overview and Comparison]