1 / 45

Physical Design for Reconfigurable Computing Systems using Firm Templates

Physical Design for Reconfigurable Computing Systems using Firm Templates. Department of Electrical & Computer Engineering Northwestern University. K. Bazargan R. Kastner M. Sarrafzadeh. Outline. Outline. FPGA: What and why? What is Reconfigurable Computing System (RCS)?

moe
Download Presentation

Physical Design for Reconfigurable Computing Systems using Firm Templates

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Physical Design for Reconfigurable Computing Systems using Firm Templates Department of Electrical & Computer Engineering Northwestern University K. Bazargan R. Kastner M. Sarrafzadeh

  2. Outline Outline • FPGA: What and why? • What is Reconfigurable Computing System (RCS)? • Application example • RCS: System components • Online placement: problem definition and our approach • Offline placement and scheduling • Flexible modules and firm templates • Conclusion and future work

  3. Outline Outline   • FPGA: What and why? • What is Reconfigurable Computing System (RCS)? • Application example • RCS: System components • Online placement: problem definition and our approach • Offline placement and scheduling • Flexible modules and firm templates • Conclusion and future work

  4. RFU CPU instructions The Architecture of a Reconfigurable System Data Memory Data Data CPU Control Data RFUOPs Instruction Memory (Program)

  5. Code DFG … => x = 3*a - b; (on CPU) => C = RFUOP1(x,5); (on RFU) => y = 4*x - c; for (i=0;i<3;i++){ t y => x+=RFUOP2(y); No room on RFU to run all in parallel ==> run in sequence ++y; x RFU } z = RFUOP1(x,3); => a = z - y; => (in parallel) b = RFUOP3(a,b); => c = a - b; => … => Execution of a Sample Program

  6. Outline Outline • FPGA: What and why? • What is Reconfigurable Computing System (RCS)? • Application example • RCS: System components • Online placement: problem definition and our approach • Offline placement and scheduling • Flexible modules and firm templates • Conclusion and future work 

  7. Application Example: Image Restoration The value of the center pixel in the next iteration: xk+1 = *y + xk - * (d**xk) y: the pixel value from the original degraded image xk: the pixel value from the previous iteration d**xkdenotes the weighted sum r1*  (eight neighbor pixels) + r0 * center pixel r1 r1 r1 r1 r1 r0 r1 r1 r1

  8. m n o Image Restoration (cont.) • Incentive: • Processing of large images using FPGA’s with limited resources • Strategy: • Segmentation of the image intosmaller sized images suitablefor the FPGA • Segments of size m x nare surrounded by an overlap of o.

  9. m n o Image Restoration: Data Flow Strategy • Data flow strategy • Pixels of individual segments are restored in parallel by hardware. • Restored segments are written back after the overlap is discarded MEMORY RFU

  10. Image Restoration Example Degraded Image Restored Image

  11. Outline Outline • FPGA: What and why? • What is Reconfigurable Computing System (RCS)? • Application example • RCS: System components • Online placement: problem definition and our approach • Offline placement and scheduling • Flexible modules and firm templates • Conclusion and future work 

  12. CPU instructions Program Manager Configuration Memory Instruction Mem. (Prog.) RFU Config. Bits RFUOPs Control Cache Manager Prefetch/Branch Prediction Unit Placement Engine RFU Manager System Components CPU Data Data Memory Data Data

  13. Outline Outline • FPGA: What and why? • What is Reconfigurable Computing System (RCS)? • Application example • RCS: System components • Online placement: problemdefinition and our approach • Offline placement and scheduling • Flexible modules and firm templates • Conclusion and future work 

  14. arrival departure • Output: • For each module, either • Rejected (not able to place) [penalty?] • Accepted: (x,y) accepted rejected Online Placement: Problem Definition • Input: • RFU dimensions (W, H) • List of RFUOP events: (w, h, arrival, departure)

  15. New module to be inserted Online Placement Current Placement + = ? • When a new RFUOP arrives, • Is there enough room? • If yes, which location is best? • Previous work • Bin-packing heuristics (1-D) - O(n2) • First Fit, Best Fit, Shelf, Look ahead, … • [Chazelle’83] The Bottom-Left heuristic. O(n2) • [Healy-Creavin’97] O(n2 lg n)

  16. Our Online Placement • Our approach: • Divide the empty space into explicit “empty rectangles” • When a new RFUOP arrives • Is there enough room? (any ER large enough?) • If yes, which location is best? (which ER is best?)  • Packing rule • Best Fit, Bottom Left, First Fit 

  17. Current Placement New module to be inserted A = ? B FF (First Fit) BL (Bottom Left) BF (Best Fit) P1 P2 Any of A or B could be chosen for placing the new module. Chooses the empty rect which is more to the bottom left Places the new module in the empty rectangle which causes less wasted space. y(P2) < y(P1)  Choose B Area() < Area( )  Choose A Heuristics for Choosing an Empty Rectangle +

  18. Our Online Placement • Our approach: • Divide the empty space into explicit “empty rectangles” • When a new RFUOP arrives • Is there enough room? (any ER large enough?) • If yes, which location is best?(which ER is best?)  • Managing the empty space • Keep empty rectangles explicitly, use “range tree” to store/access empty rects. • Efficient use of RFU real estate • KAMER: Keep all O(n2) maximal empty rectangles 

  19. Keeping All Empty Rectangles

  20. Our Online Placement • Our approach: • Divide the empty space into explicit “empty rectangles” • When a new RFUOP arrives • Is there enough room? (any ER large enough?) • If yes, which location is best?(which ER is best?) • Managing the empty space • Keep empty rectangles explicitly, use “range tree” to store/access empty rects. • Efficient use of RFU real estate • KAMER: Keep all O(n2) maximal empty rectangles • Fast but sub-optimal • Keep only O(n) empty rectangles • Shorter Seg. (SSEG), Square Empty Rects. (SQR), ...  

  21. Keeping O(n) Empty Rectangles - SSEG

  22.      Heuristics for Choosing a Segment A S1 C A C B B S2 D D   BER (Balanced Empty Rects) LSQR (Larger Rect Square) SSEG (Shorter Seg) Chooses the shorter of the two segments. Chooses the segment which creates less area difference. Chooses the segment which creates the larger rectangle closer to square. Area(B) - Area(A) > Area(D) - Area(C) S1 < S2 AspectRatio(B) > AspectRatio(D) A C S1 A C B B S2 D D     LER (Large Empty Rects) LSEG (Longer Seg) SQR (Square Rects) Chooses the segment which creates empty rectangles closer to squares. Chooses the longer of the two segments. Chooses the segment which creates the larger empty rectangle. Max{AR(A),AR(B)} < Max{AR(C),AR(D)} AR = AspectRatio S1 < S2 Area(B) > Area(D)

  23. How Good is a Placement? • Acceptance rate • percentage of modules accepted (placed) • Volume penalty • Area  complexity • Time-span in the system loop iterations • Penalty of rejecting a module penalty = volume = area * time • Input data • Randomly generated dimensions • Randomly generated enter/leave time

  24. Program snapshot

  25. Online Placement Results Percentage of accepted modules using different bin-packing and empty space partitioning rules

  26. Online Placement Results (cont.)

  27. Online Placement Results (cont.)

  28. Outline Outline • FPGA: What and why? • What is Reconfigurable Computing System (RCS)? • Application example • RCS: System components • Online placement: problem definition and our approach • Offline placement and scheduling • Flexible modules and firm templates • Conclusion and future work 

  29. t y x 3-D Floorplanning DFG Schedule RFU CPU RFU area time RFU

  30. t y By deleting this RFUOP (CPU performs the operation)... x 3-D Floorplanning DFG Schedule RFU CPU RFU

  31. t y This RFUOP can be moved on the RFU x 3-D Floorplanning DFG Schedule RFU CPU RFU

  32. t y These RFUOPs can be performed earlier... x 3-D Floorplanning DFG Schedule RFU CPU RFU

  33. t y x 3-D Floorplanning DFG Schedule RFU CPU RFU

  34. Our Current 3-D Floorplanners • No change in the schedule • Fixed insertion and deletions of RFUOPs • Annealing based. • Move set • Move operation from CPU set to RFU set • Move operation from RFU set to CPU set • Displace an already placed RFUOP on the RFU • Cost function • Penalty in rejecting modules (sum of volumes of the RFUOPs in the CPU set) • No overlap allowed during annealing • Greedy • Sort the modules on decreasing vol., apply KAMER

  35. Our Current 3-D Floorplanners (cont.) • KAMER-BF-Decreasing • Sort the modules on their volumes • Use KAMER to find a fast placement of the modules • Low-temp. annealing (LTSA) • Similar to KAMER-BFD, but use KAMER to place only the X% largest modules • Use low-temp annealing to place the rest • Zero-temp. annealing (ZTSA) -- Greedy • Use KAMER to place as many modules as you can • Use only displace and move from CPU to RFU annealing moves.

  36. Our Current 3-D Floorplanners (cont.) • BFOP - Best Fit Online Placement • Sort the RFUOPs on volume (decreasing) • For each RFUOP, find candidate “corners” • Choose the corner which results in min wasted area(similar to well-studied 2-D Bin Packing problem) corners t1 t1 A Floor corresponding to time t1 t y x

  37. Annealing-Based Offline vs. Online Percentage of accepted modules and penalties using two offline parameters. The higher the RFU acceptance rate and lower the penalty, the better the algorithm.

  38. Offline Placement Results - All

  39. Outline Outline • FPGA: What and why? • What is Reconfigurable Computing System (RCS)? • Application example • RCS: System components • Online placement: problem definition and our approach • Offline placement and scheduling • Flexible modules and firm templates • Conclusion and future work 

  40. Flexible Modules • Library of soft templates • Flexible shapes • Constant area, different width,height • Problem? Hard to build (PD should be done for each shape) • Median • Use the same area, but square shape • Rotation • Placement method • Use best shape (min wasted area)

  41. Using Flexible Modules in BFOP Median uses a square module with the same area

  42. Flexible Modules (cont.) • “Firm” templates • Slice the module into x horizontal or vertical strips • If cannot place the module, use the 2-split, 3-split, … until you can fit. • Problem? • Routing! • Limited module types can be split (like carry chains, etc. with min communication between stages) Vertical 3-split

  43. Quality Improvements Using Firm Templates

  44. Outline Outline • FPGA: What and why? • What is Reconfigurable Computing System (RCS)? • Application example • RCS: System components • Online placement: problem definition and our approach • Offline placement and scheduling • Flexible modules and firm templates • Conclusion and future work 

  45. Conclusion • Which online algorithm? • If speed is an issue, SSEG, ow KAMER • Online or offline? • If you have the schedule => offline • Which offline algorithm? • BFOP is the best (faster+better quality) • Median? Flexibility? Firm templates? • Surprisingly, median gives little improvement • If flexible shape avail, better than splitting (no additional routing problem) • How many splits? • no-split  2-split: 23% improvement • 5-split  6-split: 3% improvement

More Related