1 / 41

Soha Hassoun Tufts University Medford, MA Thanks to: Carl Ebeling University of Washington

Fine Grain Incremental Rescheduling Via Architectural Retiming. Soha Hassoun Tufts University Medford, MA Thanks to: Carl Ebeling University of Washington Seattle, WA. Problem -- Clock period is too large. Example. Write Address. RAM. Read Address. Offset. Pipelining.

kaden-hood
Download Presentation

Soha Hassoun Tufts University Medford, MA Thanks to: Carl Ebeling University of Washington

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fine Grain Incremental Rescheduling Via Architectural Retiming Soha Hassoun Tufts University Medford, MA Thanks to: Carl Ebeling University of Washington Seattle, WA

  2. Problem -- Clock period is too large Example Write Address RAM Read Address Offset

  3. Pipelining Problems w/ consecutive dependent operations Write Address RAM Read Address Offset

  4. Latency = n Performance Bottleneck • Latency constrained paths

  5. Latency = n Performance Bottleneck • Latency constrained paths • Approach apply architectural retiming at the RT level

  6. Architectural Retiming Problem:too much work, too little time yk

  7. Architectural Retiming Problem:too much work, too little time D yk pipeline register

  8. N Architectural Retiming Problem:too much work, too little time D C yk pipeline register negative register

  9. N Architectural Retiming Problem:too much work, too little time D C yk pipeline register negative register precomputation prediction

  10. Outline • Precomputation • incremental rescheduling without resource constraints • Prediction • incremental rescheduling with resource constraints • Results

  11. yk xi C D i g h f h N Precomputation Function D t= C t+1

  12. yk xi C D i g h f h N Precomputation Function • D t= C t+1 • = f ( ... , xi t+1 , ... )

  13. yk xi C D i g h f h N Precomputation Function • D t= C t+1 • = f ( ... , xi t+1 , ... ) xi t+1= x´i t =g( ... , ykt , ... )

  14. yk xi C D i g h f h N f´ Precomputation Function • D t= C t+1 • = f ( ... , xi t+1 , ... ) xi t+1= x´i t =g( ... , ykt , ... ) Dt= f ( ... , g( ... , ykt , ... ) , ...) = f´( ... , ykt , ... )

  15. N Time n g Time n+1 f, h Incremental Rescheduling yk g h f h

  16. N f´ Time n g Time n+1 f, h Time n f ’ Time n+1 h Incremental Rescheduling yk g h f h

  17. PrecomputingWith Register Arrays Write Data Write Address Read Address Read Data Read Data

  18. N F PrecomputingWith Register Arrays Write Data Write Address Read Address Out Read Data

  19. Write Data Write Address Read Address Out N F Read Data PrecomputingWith Register Arrays • F t = Out t+1

  20. Write Data Write Address Read Address Out N F Read Data PrecomputingWith Register Arrays • F t = Out t+1 • = Arrayt+1 [Read Addresst+1 ]

  21. Synthesizing Bypass Paths Write Data Write Data Write Address Write Address Precomputed Read Address Read Address ? = Read Data Read Data

  22. RAM N Precomputing RAM Output RAM

  23. Z Prediction C D • What if ? • can’t precompute, • too many additional resources, or • performance is unsatisfactory gi f N

  24. Z Prediction C D • What if ? • can’t precompute, • too many additional resources, or • performance is unsatisfactory • Predict C one cycle before its arrival gi f N

  25. Schedule with Mispredictions R1 R2 C H t t+1 t-1 C c1 c2 h1 h2 H

  26. Schedule with Mispredictions R1 R2 C H t t+1 t-1 C c1 c2 h1 h2 H Negative Register Verify

  27. Schedule with Mispredictions R1 R2 C H t t+1 t-1 C c1 H Negative Register Verify

  28. h1 h2 Negative Register c2 Verify Schedule with Mispredictions R1 R2 C H t t+1 t-1 C c1 c2 H c2* c1* c1*=? c1 c2*=? c2

  29. Synthesis Issues in Prediction • Negative register as predicting FSM • use signal transition probabilities • incorporate don’t care conditions • Nullifying mispredictions • Two correction strategies • As-Soon-As-Possible restoration • As-Late-As-Possible correction • Add handshaking signals to coordinate with interface

  30. Related Work • Precomputation • Bypass Synthesis • lookahead [Kogge ‘81, …..] • Prediction / Speculative Execution • Most likely path, arbitrarily deep [Holtmann & Ernst ‘93,’95] • Pre-execution [Radivojevic & Brewer ‘94] • Possible multiple paths & arbitrarily deep [Lakshminarayana et al. ‘98] • Percolation scheduling [Potasman et al. ‘90]

  31. Results

  32. Architectural Retiming • Improves throughput while preserving functionality and sometimes latency • Bridge gap between HLS and logic optimizations • Unifies several sequential optimizations • bypass synthesis • lookahead transformation • branch prediction • fine-grain cross register optimizations

  33. Ph.D. Forum at DAC ‘99 • Goal • increase interaction between academia and industry • Format • students present work at poster session at DAC • researchers give feedback • Who’s eligible? • Students within 1 or 2 years of finishing Ph.D. thesis www.cs.washington.edu/homes/soha/forum

  34. The End

  35. Precomputing in Single-Register Cycles A B Original Circuit

  36. N Precomputing in Single-Register Cycles A B Original Circuit

  37. A B A' B' Precomputing in Single-Register Cycles A B N Lookahead -- A(n) is a function of B(n-2) [Kogge, ‘81], [Parhi & Messerschmidtt, ‘89]

  38. Precomputing RAM Output RAM RAM

  39. Precomputing RAM Output RAM RAM

  40. Speculative Execution Scope and Depth c1 c3 c2 c4 c6 c5

  41. Speculative Execution Scope and Depth

More Related