Floorplan assisted data rate enhancement through wire pipelining a real assessment
Download
1 / 58

Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment - PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on

Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment. ISPD 2005 San Francisco, CA May 5th, 2005 Mario R. Casu - Politecnico di Torino and Luca Macchiarulo - University of Hawaii at Manoa. Outline. Communication concerns at the physical layer

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment' - sezja


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Floorplan assisted data rate enhancement through wire pipelining a real assessment

Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment

ISPD 2005 San Francisco, CA

May 5th, 2005

Mario R. Casu - Politecnico di Torino

and

Luca Macchiarulo - University of Hawaii at Manoa


Outline
Outline Pipelining: A Real Assessment

  • Communication concerns at the physical layer

  • Great Expectations of “Wire Pipelining”

    • No block Delay

    • Block delay limitation

  • Computation locality

  • Adaptive Communications

  • Floorplanning strategy for adaptive systems

  • Experimental results


Wire pipelining concept
Wire pipelining - concept Pipelining: A Real Assessment

  • Wire delay: substantial share of overall delay

  • Global wires difficult to deal with

  • Global wires scaling does not follow

    • Transistors

    • Local wiring

Del


Wire pipelining concept1
Wire pipelining - concept Pipelining: A Real Assessment

  • Introducing a latch/FF reduces the timing constraints

  • Similar to classical pipelining

Del’

Del’’


Critical length
Critical Length Pipelining: A Real Assessment

  • Maximal length for which the wire can be driven at a given frequency

    • Optimum number of buffers

    • Optimum buffer dimensions

    • Optimum wire sizing

Del=1/f


Wire pipelining
Wire Pipelining Pipelining: A Real Assessment

  • Above Critical length clocked elements are needed (pipeline stages)

Del>1/f


Wire pipelining techniques
“Wire Pipelining” techniques Pipelining: A Real Assessment

  • Problem: maintaining functionality with a minimum loss in performance.

  • Solutions:

    • Globally Asynchronous Locally Synchronous – GALS

    • Retiming

    • Regular Distributed Register (J. Cong)

    • c-slowing (S. Sapatnekar)

    • Latency Insensitive Protocols (L. Carloni)


Lips concept

Pearl Pipelining: A Real Assessment

Shell

Relay Station

LIPs: Concept


Shell relay station interaction
Shell – Relay Station Interaction Pipelining: A Real Assessment

valid

stop


Feedback topology
Feedback Topology Pipelining: A Real Assessment

τ

0

τ

τ

τ

0

0


Feedback topology1
Feedback Topology Pipelining: A Real Assessment

τ

τ

0

0

0

τ


Feedback topology2
Feedback Topology Pipelining: A Real Assessment

0

τ

0

τ

τ

0τ1

1


Feedback topology3
Feedback Topology Pipelining: A Real Assessment

τ

1

1

τ

1

0τ1τ

τ


Feedback topology4
Feedback Topology Pipelining: A Real Assessment

1

τ

1

1

τ

0τ1ττ

τ


Feedback topology5
Feedback Topology Pipelining: A Real Assessment

τ

2

τ

τ

τ

0τ1ττ2

2


Feedback topology performance
Feedback Topology: Performance Pipelining: A Real Assessment

  • Void data circulate in the loops: initially as many as relay stations (s)

  • “Period” of void-stop equal to the number of shells (s) and relay station (r) in the loop

  • Worst loop fixes thr.

  • T=s/(s+r)

  • Ta=2/4, Tb=2/5 T=2/5

τ

2

τ

a

b

τ

τ

0τ1ττ2

2


Classical floorplanning
Classical Floorplanning Pipelining: A Real Assessment

  • Problem: find a placement of (soft or hard) blocks that optimally fits a floorplan

  • Optimality is Whitespace, overall Wirelength, critical path, or a combination


Floorplanning for throughput ispd2004
Floorplanning for Throughput Pipelining: A Real Assessment[ISPD2004]

  • The optimal floorplan in our case is that which guarantees the maximum throughput compatible with given blocks’ dimensions

  • Maximum throughput is equivalent to the worst cost-to-time ratio loop


New heuristic throughput computation
New Heuristic Throughput Computation Pipelining: A Real Assessment

  • Heuristic:

    • Statically compute the shortest loop l(e) in which every edge appears

    • For every optimization iteration:

      • Cost(e)=1/l(e)*floor(length/Clength)

      • TotCost=Scost(e)


Throughput frequency trade off

DR0=1.1/L=1/L Pipelining: A Real Assessment

Throughput-frequency trade-off

f=1/L

T=1


Throughput frequency trade off1

DR=1/2.2/L=1/L Pipelining: A Real Assessment

Throughput-frequency trade-off

f=2/L

T=2/(2+2)=1/2

No advantage!


Throughput frequency trade off2

DR0=1/L.1=1/L Pipelining: A Real Assessment

Throughput-frequency trade-off

L/2

L

L

f=1/L

T=1


Throughput frequency trade off3

DR=2/L.3/5=6/5L Pipelining: A Real Assessment

Throughput-frequency trade-off

L/2

L/2

L/2

f=2/L

T=3/(3+2)

L/2

L/2


Data rate as the basic performance metric speed up
Data Rate as the basic performance metric – Speed-up Pipelining: A Real Assessment

  • Wire pipelining allows increased frequency

  • But it decreases the throughput according to the previous considerations

  • Real performance is given by DATA RATE=Thr*f

  • Advantage w.r.t. non-pipelined systems to be assessed through DR measures

  • Speed-Up SU=DR/DR0

  • L/(lm+lmax)<SU<L/lm

  • Floorplanning can be extremely beneficial if it can reduce the average branch length lm


Block delay effect
Block delay effect Pipelining: A Real Assessment

  • Blocks put a cap to the max frequency

    • fmax<1/max(di)

      i

  • We can measure delay in “length”, by using a proportionality factor

  • Block delay can enter in the picture if signals are latched at the input or output side only

L

ld


Block delay models
Block delay models Pipelining: A Real Assessment

  • We used two different models

    • Delay proportional to block edge

      • Rationale: complexity of logic is related to block size

      • Minimum constant of proportionality=1: delay is the same needed for the fastest signal to traverse the entire block

      • Optimistic assumption

    • Delay constant, related to technology and equal to 13FO4

      • Derived for assumption in the roadmap

      • More realistic for high performance design

      • More pessimistic (see below)

  • Probably the reality is somehow between the two cases


Speed up with block delay
Speed-up with block delay Pipelining: A Real Assessment

  • Taking the block delay into account modifies the previous considerations

  • max(Li+di)/(lm+dm+dmax)<SU<max(Li+di)/(lm+dm)

  • In general, much worse than previous case


Throughput driven floorplan experiments
Throughput driven floorplan experiments Pipelining: A Real Assessment

  • We used the floorplanner described in ISPD’04 to evaluate the optimal frequency (maximum DR)

  • On GSRC and MCNC benchmarks with input-output information

  • No block delay:

    • SU varies between 0.8 to 36%

    • Better on benchmarks with greater complexity

  • Block delay

    • Proportional to blocks’ edges: -7% to 44%

    • Equal to 13FO4: -11% to 12%

    • MCNC suite shows the worse behavior

  • High speed systems with highly optimized blocks lead to negligible or irrelevant SU, for an high increase of clock frequency.


Space for better performance
Space for better performance? Pipelining: A Real Assessment

  • Not all point to point connections are actually used at every clock cycle.

  • Ex. CPU to Cache communication.

Read cycle

Addr

Data-out

Data-in


Space for better performance1
Space for better performance? Pipelining: A Real Assessment

  • Not all point to point connections are actually used at every clock cycle.

  • Ex. CPU to Cache communication.

Write cycle

Addr

Data-out

Data-in


Space for better performance2
Space for better performance? Pipelining: A Real Assessment

  • Unused communication channel effectively break throughput-limiting loops

  • Pipelining without limitation can become possible

Stream Write cycle

Addr 1

τ

Data-out 1


Space for better performance3
Space for better performance? Pipelining: A Real Assessment

  • Unused communication channel effectively break throughput-limiting loops

  • Pipelining without limitation can become possible

Stream Write cycle

Addr 2

Addr 1

Data-out 2

Data-out 1


Space for better performance4
Space for better performance? Pipelining: A Real Assessment

  • Unused communication channel effectively break throughput-limiting loops

  • Pipelining without limitation can become possible

Stream Write cycle

Addr 3

Addr 2

Data-out 3

Data-out 2


Adaptive latency insensitive protocol
Adaptive Latency Insensitive Protocol Pipelining: A Real Assessment

  • Need a mechanism to allow discarding useless “packets” by blocks: Adaptive communication

  • Details out of the scope of the paper but

    • It is possible thorugh a simple modification of the original protocol

    • Requires the introduction of “oracles” predicting unused inputs for each block

    • We designed a functional implementation in synthesizable VHDL

    • We proved the correctness of the implementation (absence of deadlocks and correct signal sequencing)


Alip performance evaluation
ALIP performance evaluation Pipelining: A Real Assessment

  • The adaptiveness of the approach prevents a static prediction of performance

  • However, a few conclusion can be reached:

    • The performance is bounded above by static LIP

    • Performance in long sequences of input independence is equivalent to the simplified network with the channel removed

  • If the system experiences unfrequent “context switching” on its channels, such that at any given time the performance is static Thi, the average performance can be approximated as:

    • Th=Sai.Thi

    • ai: fraction of time with performance Thi


Alip performance evaluation example
ALIP performance evaluation - Example Pipelining: A Real Assessment

Ck=1

Valid Data=1

Stream Write cycle

Addr 1

τ

Data-out 1


Alip performance evaluation example1
ALIP performance evaluation - Example Pipelining: A Real Assessment

Ck=2

Valid Data=2

Stream Write cycle

Addr 2

Addr 1

Data-out 2

Data-out 1


Alip performance evaluation example2
ALIP performance evaluation - Example Pipelining: A Real Assessment

Ck=3

Valid Data=3

Stream Write cycle

Addr 3

Addr 2

Data-out 3

Data-out 2


Alip performance evaluation example3
ALIP performance evaluation - Example Pipelining: A Real Assessment

Ck=4

Valid Data=4

Read cycle

Addr 4

Addr 3

Data-out 3


Alip performance evaluation example4
ALIP performance evaluation - Example Pipelining: A Real Assessment

Ck=5

Valid Data=5

Read cycle

-----

Addr 4

τ

τ


Alip performance evaluation example5
ALIP performance evaluation - Example Pipelining: A Real Assessment

Ck=6

Valid Data=5

Read cycle

τ

-----

τ

Data-in4


Alip performance evaluation example6
ALIP performance evaluation - Example Pipelining: A Real Assessment

Ck=7

Valid Data=5

Read cycle

τ

τ

-----

Data-in4


Alip performance evaluation example7
ALIP performance evaluation - Example Pipelining: A Real Assessment

Ck=8

Valid Data=6

Read cycle

τ

Addr 5

τ

-----


Alip performance evaluation example8
ALIP performance evaluation - Example Pipelining: A Real Assessment

Ck=8

Valid Data=6

Throughput=3/4

Th1=1

Th2=1/2

a1=1/2

a2=1/2

Read cycle

τ

Addr 5

τ

-----


Adaptive communication performance evaluation assumptions
Adaptive communication performance evaluation - assumptions Pipelining: A Real Assessment

  • Assumption 1: No time lost in “context switching”

    • Unrealistic, but acceptable for burst communication, and consistent with experiments

  • Assumption 2: Channels behave in a statistically independent fashion

    • Only single clock cycle independence is important for our purposes

  • Under 1 and 2, we can compute channel activities and use them to weight the connections


Floorplanning for throughput adaptive case
Floorplanning for Throughput – adaptive case Pipelining: A Real Assessment

  • The optimal floorplan in our case is that which guarantees the maximum throughput compatible with given blocks’ dimensions

  • Maximum throughput is equivalent to the worst cost-to-time ratio loop, weighted by the loop activation ratio

  • It can be approximated by taking into account the channel activation ratio


New heuristic throughput computation1
New Heuristic Throughput Computation Pipelining: A Real Assessment

  • Heuristic:

    • Statically compute the shortest loop l(e) in which every edge appears

    • For every optimization iteration:

      • Cost(e)=1/l(e)*floor(length/Clength)*a(e)

      • TotCost=Scost(e)

  • The only change consists in the inclusion of the term a(e)


Experiments
Experiments Pipelining: A Real Assessment

  • GSRC/MCNC benchmarks

    • Burst mode

    • Uniformly distributed phases and activation times

    • Comparison between non-pipelined solution and adaptively pipelined (13FO4 case)

    • After optimization, a VHDL netlist is automatically generated and simulated to measure the real performance of the system (as opposed to the approximation from the floorplanner)

  • Results:

    • SU between 16 and 44%

    • Monotonous behavior in the legal interval

    • Limitations due mainly to FO4 delays


Experiments1
Experiments Pipelining: A Real Assessment

  • MPEG decoder

    • Strict data dependency

    • Optimization as in other cases

    • Simulation as before and with real channel utilization profiles

  • Results:

    • SU of 42% with block delay, 76% without

    • Real SU of 31% (effect of non-random correlation)


Conclusions and future work
Conclusions and future work Pipelining: A Real Assessment

  • Pure “blind” pipelining fails to achive available optimization, due to neglect of common information

  • Adaptive protocols can take advantage of the information available to the blocks

  • We will concentrate on

    • Automated extraction of information from the blocks

    • Power optimization (power/timing trade-offs)

    • Routing constraints effects


Thank you Pipelining: A Real Assessment


Shell relay station interaction1

a Pipelining: A Real Assessment

Shell – Relay Station Interaction

valid

stop


Shell relay station interaction2

b Pipelining: A Real Assessment

a

Shell – Relay Station Interaction

valid

stop


Shell relay station interaction3

c Pipelining: A Real Assessment

b

Shell – Relay Station Interaction

valid

stop


Shell relay station interaction4

d Pipelining: A Real Assessment

b

Shell – Relay Station Interaction

c

valid

stop


Feedforward equalization
Feedforward equalization Pipelining: A Real Assessment

  • Maximum performance can be recovered by equalizing various paths

  • Longest path computation to obtain the appropriate number of added relay stations



General performance evaluation
General Performance Evaluation Pipelining: A Real Assessment

  • Generic netlists of blocks are feedforward connections of loops

  • If feedforward connections are equalized, “worst” loop dominates throughput

  • Problem formulation: max cost-to-time ratio (polynomial time).


ad