1 / 33

CS137: Electronic Design Automation

CS137: Electronic Design Automation. Day 13: May 20, 2002 Page Generation (Area and IO Constraints). [working problem with Eylon Caspi]. Today. Cover/clustering Minimize Weight W/ area and IO constraints Motivation: SCORE Page generation Also energy minimization Techniques

yon
Download Presentation

CS137: Electronic Design Automation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS137:Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with Eylon Caspi]

  2. Today • Cover/clustering • Minimize Weight • W/ area and IO constraints • Motivation: SCORE Page generation • Also energy minimization • Techniques • Current Results • FPGA/hardware implementation?

  3. Abstract Problem • Given: Graph (V,E) with a single weight (area) on each node and two weights (IO, cost) on the edges. • Cluster nodes into subsets Vi, such that • S (Cost(Vi)) minimized • IO(Vi) < IO limit • A(Vi) < Area limit • Cost(Vi) = S(cost(e) | e  E st. e1 Vi and e2Vi)

  4. memory segment memory segment Compile TDF operator compute page stream stream SCORE Compilation Programming ModelExecution Model • Graph of TDF FSMD operators • Graph of page configs - unlimited size, # IOs - fixed size, # IOs - no timing constraints - timed, single-cycle firing

  5. How Big is an Operator? • JPEG Encode • JPEG Decode • MPEG (I) • MPEG (P) • Wavelet Encode • IIR • Wavelet Decode • Wavelet Encode • JPEG Encode • MPEG Encode

  6. Clustering is Critical • Inter-page comm. latency may be long • Inter-page feedback loops are slow • Cluster to: • Fit feedback loops within page • Fit feedback loops on device

  7. DF CF i two_i *2 state pipeline pipeline Pipeline Extraction • Hoist uncontrolled FF data-flow out of FSMD • Benefits: • Shrink FSM cyclic core • Extracted pipeline has more freedom for scheduling and partitioning i Extract state foo(i): acc=acc+2*i state foo(two_i): acc=acc+two_i

  8. Pipeline Extraction – Extractable Area • JPEG Encode • JPEG Decode • MPEG (I) • MPEG (P) • Wavelet Encode • IIR

  9. Page Generation • Pipeline extraction • removes dataflow can freely extract from FSMD control • Still have to partition potentially large FSMs • approach: turn into a clustering problem

  10. IA IB OA OB State Clustering • Start: consider each state to be a unit • Cluster states into page-size sub-FSMDs • Inter-page transitions become streams • Possible clustering goals: • Minimize delay (inter-page latency) • Minimize IO (inter-page BW) • Minimize area (fragmentation)

  11. State Clustering to Minimize Inter-Page State Transfer • Inter-page state transfer is slow • Cluster to: • Contain feedback loops • Minimize frequency ofinter-page state transfer • Previously used in: • VLIW trace scheduling [Fisher ‘81] • FSM decomposition for low power[Benini/DeMicheli ISCAS ‘98] • VM/cache code placement • GarpCC code selection [Callahan ‘00]

  12. Clustering Problem • SCORE Page • Fixed area (# of LUTs) • Fixed IO • Cost on edges is probability take state transition • Clustering Goal is to minimize page-to-page transition • Maximize expected transitions within same page • Find page-count/page-transition tradeoff curve

  13. Pages Inter-Page Communication Frequency Abstract Problem • Given: Graph (V,E) with a single weight (area) on each node and two weights (IO, cost) on the edges. • Cluster nodes into subsets Vi, such that • S (Cost(Vi)) minimized • IO(Vi) < IO limit • A(Vi) < Area limit • Cost(Vi) = S(cost(e) | e  E st. e1 Vi and e2Vi)

  14. DSM • Possibly relevant for minimizing delay in DSM • Previously discussed: • Larger area  longer wires, slower • Want to cluster logic locally • Maybe: • Cluster common computations together • Make distant computation transfer uncommon

  15. Island Packing for Energy • Note: Modern FPGAs pack cluster of LUTs into an endpoint • e.g. Altera LAB

  16. Island Packing for Energy • Modern FPGAs pack cluster of LUTs into an endpoint • e.g. Altera LAB • Local wiring less energy cost than long wiring • Covering for energy: • minimize exposed activity factor • same covering problem

  17. Clusters/Islands Switching Activity Abstract Problem • Given: Graph (V,E) with a single weight (area) on each node and two weights (IO, cost) on the edges. • Cluster nodes into subsets Vi, such that • S (Cost(Vi)) minimized • IO(Vi) < IO limit • A(Vi) < Area limit • Cost(Vi) = S(cost(e) | e  E st. e1 Vi and e2Vi)

  18. First Try • Use FBB (flow cut) [Wong/cs137a:day7] • Pick seed element • Compute mincut • On mix of IO, cost edge weights? • If too small, • Cluster in node and repeat • Else • Cluster out node and repeat

  19. Mincut lessons • Couldn’t consistently control IO • Non-monotonic results adjusting weight • Not clear what to cluster in

  20. Idea #2 • If we had an ordering of nodes • (wishful thinking) • Then easy to know how to include more • Just pick the next node • Order: 1D list of nodes • Cluster: a contiguous sequence of nodes in list • Specify start, finish

  21. From Sequence to Clusters • Easy to know if a contiguous subsequence • Meets area constraints • Meets io constraints • Cover • Set of (non-overlapping) subsequences • Include all nodes

  22. Feasible Clusters (mult16a)

  23. Covering • Not clear when to put more or less stuff in a cluster…versus leave with next cluster • Can’t build clusters greedily • Like associative/parthesization problem saw earlier [day 5]

  24. Similar But compute from all breaks across a diagonal Not just nearest neighbor Hence extra O(N) Day 5 Parenthesis Matching

  25. Dynamic Programming • For each subsequence start,end • Either the area and io match • OR want to find a breakpoint between cluster sets • Cluster sets startmidpoint, midpointend may each either be single or multiple clusters • Different splits may • Minimize number of clusters • Minimize cost • Keep dominator set [day11]

  26. Algorithm • Compute Linear Order • Compute IO, Area on each subsequence • Think NxN table (but sparse) • Use Dynamic Programming to cover

  27. Compute Order? • Could experiment with various techniques • Considering: Spectral Ordering • [Hall/cs137a:day7] • How weight edges? • IO, cost, mix? • Try linear mix…vary mix weighting

  28. Weight Mix • Why unclear? • IO weight  good to cluster connectivity • If Ios limited, allows to use fewer clusters • Pack more stuff into pageless cases need to transition • Cost weight  what we’re minimizing • Cluster high cost edges together • Hide in page • But, cost ordering may get less stuff in page if poorly IO clustered…

  29. spp results • [see HTML]

  30. Versus Weighting (w by 0.01)

  31. Discussion • Promising Results • New capability not clear what compare to • Maybe LUT clustering to validate algorithm • Absolutes look promising • Weighting • Not clear how to search for best • Maybe should try other ways of weighting? • [Michael suggests try taking log(trans)]

  32. Spatial/Hdw Implementation? • Compute Linear Order • Use 1D FDSA? • Compute IO, Area on each subsequence • Parallel prefix sum scan • One for each start point? • Use Dynamic Programming to cover • Like parenthesis • Maybe 1D and combine with area/io scan?

  33. Promising Ideas • Compute good ordering • Easy to vary inclusion when know what’s next to include/exclude • Mix weights • Cluster to minimize exposed (cut) costs

More Related