orion2 0 a fast and accurate noc power and area model for early stage design space exploration
Download
Skip this Video
Download Presentation
ORION2.0: A Fast and Accurate NoC Power and Area Model for Early-Stage Design Space Exploration

Loading in 2 Seconds...

play fullscreen
1 / 29

ORION2.0: A Fast and Accurate NoC Power and Area Model for Early-Stage Design Space Exploration - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

ORION2.0: A Fast and Accurate NoC Power and Area Model for Early-Stage Design Space Exploration. Andrew B. Kahng ¶ Bin Li ‡ Li-Shiuan Peh ‡ Kambiz Samadi ¶ ¶ University of California, San Diego ‡ Princeton University April 21, 2009. 1. Outline. Motivation ORION2.0 Framework

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' ORION2.0: A Fast and Accurate NoC Power and Area Model for Early-Stage Design Space Exploration' - vito


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
orion2 0 a fast and accurate noc power and area model for early stage design space exploration

ORION2.0: A Fast and Accurate NoC Power and Area Model for Early-Stage Design Space Exploration

Andrew B. Kahng¶

Bin Li‡

Li-Shiuan Peh‡

Kambiz Samadi¶

¶ University of California, San Diego

‡ Princeton University

April 21, 2009

1

outline
Outline
  • Motivation
  • ORION2.0 Framework
  • Dynamic Power Modeling
  • Leakage Power Modeling
  • Area Modeling
  • Validation and Significance Assessment
  • Conclusions

2

motivation
Motivation
  • Many-core chip  NoCs needed to interconnect many-core chips  Power-efficiency of NoCs is important
  • Performance was the primary concern
  • Now power efficiency is critical
    • 28% of total power in Intel 80-core Teraflops chip is due to interconnection networks (routers + links);
    •  Need rapid power estimation to trade off alternative architectures
  • Rapid power-area tradeoffs at the architectural level

Our Goal: Develop accurate models that are easily usable by system-level designer early in the design cycle

3

related work
Related Work
  • Real-chip power measurements (Isci et al. 03)
  • RTL-level NoC power estimations (A. Banerjee et al. 07, and N. Banerjee et al. 04)
    • Simulation time is slow
    • Requires detailed RTL modeling  not suitable for early-stage NoC design space exploration
  • Architectural-level power estimation
    • Interconnection network (Patel et al. 97); model is not instantiated with architectural parameters not suitable to explore tradeoffs in router microarchitecture
    • Uniprocessor power modeling (Wattch: Brooks et al. 00 and SimplePower: Ye et al. 00)
    • NoC power modeling (ORION 1.0: Wang et al. 02)
  • ORION 1.0
    • has been widely used
    • early-stage design space exploration for NoC power-performance tradeoff analysis

4

orion 1 0 modeling methodology
ORION 1.0 Modeling Methodology

Power models derived for major building blocks (FIFO, Crossbar, and arbiter)

For each component, a canonical structure is described in terms of architectural and technological parameters

Detailed analysis is performed to determine parameterized capacitance equations

Capacitance equations and switch activity estimation are combined to determine power consumption

Power models are based on detailed estimates of gate and wire capacitance and switching activity

5

limitations of orion 1 0
Limitations of ORION 1.0

Up to 8.1X diff.

10.3X diff.

6

outline1
Outline
  • Motivation
  • ORION2.0 Framework
  • Dynamic Power Modeling
  • Leakage Power Modeling
  • Area Modeling
  • Validation and Significance Assessment
  • Conclusions

7

orion 2 0 accurate noc router models

technology parameters

circuit implementation &

buffering scheme

  • interconnect parameters
  • device parameters
  • scaling factors for future
  • technologies
  • SRAM and register FIFO
  • MUX-tree and Matrix crossbar
  • different arbitration scheme
  • hybrid buffering scheme

grantI

reqI

architectural parameters

grantE

reqE

  • # of ports; # of buffers
  • # of xbar ports; # of VC
  • voltage, frequency

Arbiter

grantW

reqW

grantN

reqN

grantS

reqS

Request

Signals

Source

Source

Buf I

Control

Link

Link

Buf E

Link

Link

Buf W

inI

outI

Link

Link

Buf N

inE

outE

Link

Link

Buf S

Crossbar

inW

outW

inN

outN

inS

outS

ORION 2.0: Accurate NoC Router Models

ORION 2.0

  • Built on top of ORION 1.0
  • Uses our automatic/semi-automatic flows to obtain technology inputs
  • Provides significant accuracy improvement compared with ORION 1.0

Write

8

orion 2 0 improvements

Crossbar

  • Area
  • More accurate

router area model

  • Link area model
  • Links
  • Hybrid buffering
  • Leakage power

Clock

ORION 2.0 Improvements

Power Subcomponents

  • Buffer
  • SRAM-based
  • Flip-flop-based

Buffer

(SRAM-based)

Model Infrastructure

Arbiter

(dynamic power)

  • Application-specific
  • technology-level
  • adjustment
  • Updated capacitance
  • and transistor sizes
  • Arbiter
  • VC allocator model
  • Leakage power

Crossbar

Links

(dynamic power)

Area

(router)

ORION 1.0

ORION 2.0

9

model technology inputs
Model Technology Inputs
  • Inputs for power calculation
    • Leakage current values (obtained from Liberty (.lib) / SPICE)
    • Input capacitance for different repeater size (Liberty, Predictive Technology Models (PTM))
  • Inputs for area calculation
    • Wire dimensions (Interconnect Technology Format (ITF) / LEF / ITRS)
    • Cell area is available from Liberty and for future technologies, ITRS A-factors or proposed area models can be used
  • We also provide data for (1) high-performance (HP), and (2) low-power (LOP) device types for 90nm and 65nm
  • Scaling factors for 45nm and 32nm technologies were obtained from ITRS 2007 / MASTAR5.0

10

outline2
Outline
  • Motivation
  • ORION2.0 Framework
  • Dynamic Power Modeling
  • Leakage Power Modeling
  • Area Modeling
  • Validation and Significance Assessment
  • Conclusions

11

dynamic power modeling
Dynamic Power Modeling
  • Dynamic Power: Switching Capacitance
  • Clock power:
    • Pclk =  × Cclk× Vdd2× f
    • Cclk = Csram-fifo + Cpipeline-registers + Cregister-fifo + Cwiring
  • Physical Links: due to charging and discharging of capacitive load
    • Pd =  × Cload× Vdd2× f; Cload = Cground + Ccoupling + Cinput
  • Register-based FIFO: implemented as shift registers
  • Virtual channel allocator: added two models
  • Other components: we use ORION 1.0 models with updated transistor and technology parameters

12

clock power 1
Clock Power (1)

Clock power heavily depends on its distribution topology  we assume an H-tree topology

Cclk = Csram-fifo + Cpipeline-registers + Cregister-fifo + Cclock-wiring

Memory structures: precharge circuitry capacitive load on clock network:

due to precharge transistor Tc

Cchg = Cg(Tc) + Cd(Tc)

Csram-fifo = (Pr + Pw)× F × B × Cchg

where Pr, Pw, F, B are #read ports, #write ports, #buffers, and flit-width, respectively

Pipeline registers: due to different stages in a router

assume D-flip-flop (DFF) as the building block for pipeline registers

Cpipeline-register = Npipeline ×F × Cff, where Cff is DFF capacitance

Register-based FIFO: due to DFF capacitance used in registers

Cregister-fifo = F × B × Cff

13

clock power 2
Clock Power (2)

Wiring load: due to (1) wiring and (2) clock tree buffers

Example: 5-level H-tree clock distribution:

where, D, Cw are chip dimension and per-unit-length wire capacitance, respectively

capacitive contribution due to clock buffers requires estimation of number of buffer stages, k:

where Rint, Cint, Rd, and Cgate are clock tree network wire resistance, wire capacitance, drive resistance, and input gate capacitance of a minimum size inverter, respectively

where ρ, Carea, and Cfringe are resistivity, unit area, and unit fringe capacitances respectively

Cclock-wiring = kCgate + Cwire

Clock leakage power is due to clock buffers

14

repeater and wire power models
Repeater and Wire Power Models
  • Repeaters (buffers) are used in links and clock tree network
  • Leakage power has two main components: (1) sub-threshold leakage, and (2) gate-tunneling current
    • Depending on design conditions we will compute the leakage power at different temperature conditions:(1) 25◦C, (2) 80◦C, and (3) 110◦C
    • Both components depend linearly on device size

ps= (psn + psp) / 2

psn = k0n + k1n × wn

psp = k0p+ k1p × wp

  • Dynamic power can be calculated as:

pd = a × cl × vdd2 × f

cl = ci + cg + cc

    • pd, a, cl, vdd and f are dynamic power, activity factor, load capacitance, supply voltage and frequency, respectively
    • Load capacitance is composed of the input capacitance of the next repeater (ci), ground (cg) and coupling (cc) capacitances of the wire driven

15

interconnect optimization buffering
Interconnect Optimization: Buffering

Conventional delay-optimal buffering  unrealistic buffer sizes  high dynamic / leakage power  suboptimal

Our approach: iterative optimization of hybrid objective (power + delay)

Search for optimal number and size of repeaters

Can be extended for other interconnect optimizations (e.g., wire sizing and driver sizing)

Pareto-optimal frontier of the power-delay tradeoff of a 5mm interconnect in 90nm / 65nm

16

virtual channel allocator model
Virtual Channel Allocator Model
  • Provides three virtual channel (VC) allocation models
    • Traditional two-stage VC allocator model
      • Most widely used
      • Power consumption increases rapidly as number VCs increases
  • Add One-stage VC allocator model
    • Lower power consumption
    • Lower matching probability
  • Add VC selection model
    • Proposed by Kumar et al. "A 4.6Tbits/s 3.6GHz Single-cycle NoC Router with a Novel Switch Allocator in 65nm CMOS”, ICCD07
    • Low power and high performance

17

outline3
Outline
  • Motivation
  • ORION2.0 Framework
  • Dynamic Power Modeling
  • Leakage Power Modeling
  • Area Modeling
  • Validation and Significance Assessment
  • Conclusions

18

leakage power modeling
Leakage Power Modeling
  • Leakage Power:Subthreshold and Gate
    • From 65nm and beyond gate leakage becomes significant
    • I’sub(i,s) and I’gate(i,s) are subthreshold and gate leakage currents per unit transistor width for a specific technology
    • Wsub(i,s) and Wgate(i,s) are the effective widths of component i at input state s for subthreshold and gate leakage, respectively
    • Key circuit components INVx1, NAND2x1, NOR2x1, and DFF
    • Leakage currents are computed at different transistor junction temperatures: (1) 110◦C, (2) 80◦C, and (3) 25◦C
    • Same methodology as in ORION 1.0
    • Leakage current values are all obtained through SPICE simulation using foundry SPICE models

19

arbiter leakage power model
Arbiter Leakage Power Model
  • Three arbitration schemes: (1) matrix, (2) round-robin (RR), and (3) queuing
  • Example: matrix arbiter
    • with R requesters  one R×R matrix to keep the priorities
    • grant logic can be implemented as a tree of NOR and INV gates and the RxR matrix can be constructed using DFF
    • NOR2, INV, and DFF represent 2-input NOR gate, inverter gate, and DFF, respectively
    • Further details on modeling methodology in Chen et al. 2003

20

outline4
Outline
  • Motivation
  • ORION2.0 Framework
  • Dynamic Power Modeling
  • Leakage Power Modeling
  • Area Modeling
  • Validation and Significance Assessment
  • Conclusions

21

router area model
Router Area Model

Matrix Arbiter

  • As number of cores increases, the area occupied by communication components becomes significant (19% of total tile area in the Intel 80-core Teraflops Chip)
  • Gate area model by Yoshida et al. (DAC’04)
  • Link area model by Carloni et al. (ASPDAC’08)

Areaarbiter =

(AreaNOR2x12(R-1)R) +

(AreaDFF(R(R-1)/2)) +

(AreaINVx1R)

22

repeater and wire area models
Repeater and Wire Area Models
  • For existing technologies, the area of a repeater can be calculated as:

ar = τ0 + τ1 ×(wn + wp)

    • ardenotes repeater area, τ0and τ1 are coefficients using linear regression; wn, wparewidths of NMOS, and PMOS respectively
  • For future technologies, feature size (F), contacted pitch (CP), row height (RH), and cell width (CW) can be used to estimate the area:

NF = (wp + wn + 2 × F) / RH

CW = NF × (F + CP) + CP

ar = RH × CW

  • Wiring area can be calculated as:

aw = (n × (ww + sw) + sw) × L

    • aw denotes wire area, n is the bit width of the bus, and ww, sw, L are wire width, spacing and wire length

23

outline5
Outline

Motivation

ORION2.0 Framework

Dynamic Power Modeling

Leakage Power Modeling

Area Modeling

Validation and Significance Assessment

Conclusions

24

orion2 0 validations and results
ORION2.0: Validations and Results
  • Validation: Two Intel NoC Chips
    • (1) Intel 80-core Teraflops: high-performance many-core design
    • (2) Intel SCC: ultra low-power communication core
    • ORION2.0 offers significant accuracy improvement

ORION 2.0

Intel 80-core

ORION 1.0

25

impact on system level design
Impact on System-Level Design

Testcases

VPROC: video processor with 42 cores and 128-bit datawidth

dVOPD: dual video object plane decoder with 26 cores and 128-bit datawidth

……..

R2

R2

R1

R1

R1

R1

R1

R1

R2

R2

……..

R2

R1

R1

R1

  • System-level Impact: Communication-Driven Synthesis in COSI-OCC
    • AccurateORION 2.0 models lead to better-performing NoC
    • Relative power due to additional port not as high in ORION 2.0 vs. 1.0

26

conclusions
Conclusions
  • Accurate models can drive effective NoC design space exploration
  • ORION 1.0 is inaccurate for current and future technology nodes
  • Proposed accurate power and area models for network routers (ORION 2.0)
  • Presented a reproducible methodology for extracting inputs to our models
  • Maintained ORION 1.0 interface, while significantly improved the accuracy of models  switching to ORION 2.0 is easy!

27

orion 2 0 release
ORION 2.0 Release
  • ORION 2.0 Website: http://www.princeton.edu/~peh/orion.html

28

system level noc power modeling example
System-Level NoC Power Modeling Example

Polaris Toolchain

V. Soteriou, N. Eisley, H. Wang, B. Li, L.S. Peh, TVLSI’07

ad