1 / 67

High Level Synthesis

High Level Synthesis. CSE 237D: Spring 2008 Topic #6 Professor Ryan Kastner. ?. Ant System Optimization: Overview. Ants work corporately on the graph Each creates a feasible solution Ants leave pheromones on their traces Ant make decisions partially on amount of pheromones

Download Presentation

High Level Synthesis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High Level Synthesis CSE 237D: Spring 2008 Topic #6 Professor Ryan Kastner

  2. ? Ant System Optimization: Overview • Ants work corporately on the graph • Each creates a feasible solution • Ants leave pheromones on their traces • Ant make decisions partially on amount of pheromones • Global Optimizations • Evaporation: Pheromones dissipate over time • Reinforcement: Update pheromones from good solutions • Quickly converges to good solutions

  3. Solving Design Problems using AS • Problem model • Define the solution space: create decision variables • Pheromone model • Global heuristic: Provides history of search space traversal • Ant search strategy • Local heuristic: Deterministic strategy for individual ant decision making • Solution construction • Probabilistically derive solution from local and global heuristics • Feedback • Evaluate solution quality, Reinforce good solutions (pheromones), Slightly evaporate all decisions (weakens poor solutions)

  4. Autocatalytic Effect

  5. Max-Min Ant System (MMAS) Scheduling • Problem: Some pheromones can overpower others leading to local minimums (premature convergence) • Solution: Bound the strength of the pheromones • If , always a chance to make any decision • If , the decision is based solely on local heuristics, i.e. no past information is taken into account

  6. MMAS RCS Formulation • Idea: Combine ACO and List Scheduling • Ants determine priority list • List scheduling framework evaluates the “goodness” of the list • Global heuristics  permutation index • Local heuristic – can use different properties • Instruction mobility (IM) • Instruction depth (ID) • Latency weighted instruction depth (LWID) • Successor number (SN)

  7. RCS: List Scheduling • A simple scheduling algorithm based on greedy strategies • List scheduling algorithm: • Construct a priority list based on some metrics (operation mobility, numbers of successors, etc) • While not all operations scheduled • For each available resource, select an operation in the ready list following the descending priority. • Assign these operations to the current clock cycle • Update the ready list • Clock cycle ++ • Qualities depend on benchmarks and particular metrics

  8. Global heuristic: Pheromones : the favorableness of selecting operation i to position j Global pheromone matrix Local heuristic: Local metrics : Instruction mobility, number of successors, etc Local decision making: a probabilistic decision Evaporate pheromone and reinforce good solution MMAS RCS: Global and Local Heuristics

  9. op1 1 op2 2 op3 3 op4 4 op5 5 op6 6 Instructions Priority List Pheromone Model For Instruction Scheduling Each instruction opi  Iassociated with n pheromone trailswhere j = 1, …, n each indicates the favorableness of assign instruction i to position j Each instruction also has a dynamic local heuristic

  10. 1 2 3 4 5 6 Priority List Ant Search Strategy • Each run has multiple iterations • Each iteration, multiple ants independently create their own priority list • Fill one instruction at a time op1 op1 op4 op2 op2 op1 op3 op3 op5 op4 op4 op6 op5 op5 op2 op6 op6 op3 Instructions

  11. Ant Search Strategy • Each ant has memory about instructions already selected • At step j ant has already selected j-1 instructions • jth instruction selected probabilistically op1 op1 op4 1 op2 op2 2 op1 op3 op3 op5 3 op4 op4 4 op5 op5 5 op6 op6 6 Instructions Priority List

  12. Ant Search Strategy • ij(k) : global heuristic (pheromone) for selecting instruction i at j position • j(k) : local heuristic – can use different properties • Instruction mobility (IM) • Instruction depth (ID) • Latency weighted instruction depth (LWID) • Successor number (SN) • ,  control influence of global and local heuristics

  13. Pheromone Update • Lists constructed are evaluated with List Scheduling • Latency Lh for the result from ant h • Evaporation – prevent stigmergy and punish “useless” trails • Reinforcement – award trails with better quality

  14. 1 2 3 4 5 6 Priority List Pheromone Update • Evaporation happens on all trails to avoid stigmergy • Reward the used trails based on the solution’s quality op1 op1 op4 op2 op2 op1 op3 op3 op5 op4 op4 op6 op5 op5 op2 op6 op6 op3 Instructions

  15. Max-Min Ant System (MMAS) • Risks of Ant System optimization • Positive feedback • Dynamic range of pheromone trails can increase rapidly • Unused trails can be repetitively punished which reduce their likelihood even more • Premature convergence • MMAS is designed to address this problem • Built upon original AS • Idea is to limit the pheromone trails within an evolving bound so that more broader exploration is possible • Better balance the exploration and exploitation • Prevent premature convergence

  16. Max-Min Ant System (MMAS) • Limit (t) within min(t) and max(t) • Sgbis the best global solution found so far at t-1 • f(.) is the quality evaluation function, i.e. latency in our case • avg is the average size of decision choices • Pbest  (0,1]is the controlling parameter • Conditional prob. of Sgb being selected when all trails in Sgb have maxand othershavingmin • Smaller Pbest  tighter range for  more emphasis on exploration • When Pbest  0, we setmin  max

  17. Other Algorithmic Refinements • Dynamically evolving local heuristics • Example: dynamically adjust Instruction Mobility • Benefit: reduce search space progressively • Taking advantage of topological sorting of DFG when constructing priority list • Each step ants select from the ready instructions instead from all unscheduled instructions • Benefit: greatly reduce the search space

  18. MMAS RCS Algorithm

  19. RCS Results: Pheromones (ARF)

  20. Benchmarks: ExpressDFG • A comprehensive benchmark for TCS/RCS • Classic samples and more modern cases • Comprehensive coverage • Problem sizes • Complexities • Applications • Downloadable from http://express.ece.ucsb.edu/benchmark/

  21. Auto Regressive Filter

  22. Cosine Transform

  23. Matrix Inversion

  24. RCS Experimental Results • Heterogeneous RCS – multiple types of resources (e.g. fast and normal multiplier) • ILP (optimal) using CPLEX • List scheduling • Instruction mobility (IM), instruction depth (ID), latency weighted instruction depth (LWID), successor number (SN) • Ant scheduling results using different local heuristics (Averaged over 5 runs, each run 100 iteration with 5 ants)

  25. RCS Experimental Results • Homogenous RCS – all resources have unit delay • New benchmarks (compared to last slide) too large for ILP

  26. MMAS RCS: Results • Consistently generates better results over all testing cases • Up to 23.8% better than list scheduler • Average 6.4%, and up to 15% better than force-directed scheduling • Quantitatively closer to known optimal solutions

  27. MMAS TCS Formulation • Idea: Combine ACO and Force Directed Scheduling • Quick FDS review • Uniformly distribute the operations onto the available resources. • Operation probability • Distribution graph • Self force: changes on DG of scheduling an operation • Predecessor/successor force: implicit effects on DG • Schedule an operation to a step with the minimum force

  28. 1 4 ACO Formulation for TCS • Initialize pheromone model • While (termination not satisfied) • Create ants • Each ant finds a solution • Evaluate solutions and update pheromone • Report the best result found trailsijindicates the favorableness of assigning instruction i to position j S S 1 1 v1 v2     +   v1 v2 v6 v8 v10  v6  v3 2 v7 v9 v11   + < v3 2 v4 v4 - -   + v10 3 3 v7 v8 v9 v5 v11 - - v5 + < 4 4 E E vn vn

  29. ACO Formulation for TCS • Initialize pheromone model • While (termination not satisfied) • Create ants • Each ant finds a solution • Evaluate solutions and update pheromone • Report the best result found • Select operation oph probabilistically • Select its timestep as following: Global Heuristics: tied with the searching experience Local Heuristics: use the inverse of distribution graph, 1/qk(j) Here and β are constants

  30. Initialize pheromone model While (termination not satisfied) Create ants Each ant finds a solution Evaluate solutions and update pheromone Report the best result found ACO Formulation for TCS Rewarding good partial solutions based on solution quality Pheromone evaporation

  31. Final Version of MMAS-TCS

  32. Effectiveness of MMAS-TCS

  33. MMAS TCS: Results • MMAS TCS is more stable than FDS, especially solution highly unconstrained • 258 out of 263 test cases are equal to or better than FDS results • 16.4% fewer resources

  34. Design Space Exploration • DSE challenges to the designer • Ever increasing design options • Closely related w/ NP-hard problems • Resource allocation • scheduling • Conflict objectives (speed, cost, power, …) • Increasing time-to-market pressure

  35. Our Focus: Timing/Cost • Timing/Cost Tradeoffs • Known application • Known resource types • Known operation/resource mapping • Question: find the optimal timing/cost tradeoffs • Most commonly faced problem • Fundamental to other design considerations

  36. Common Strategies • Usually done in an ad-hoc way • Experience dependent • Or Scanning the design space withResource Constrained (RCS) or Time Constrained (TCS) scheduling • What’s the problem? • RCS and TCS are dual problems • Can we effectively use information from one to guide the other?

  37. Design Space Model

  38. Key Observations • A feasible configuration C covers a beam starting from (tmin, C) • tminis the RCS result for C

  39. Design Space Model

  40. Key Observations • A feasible configuration C covers a beam starting from (tmin, C) • Optimal tradeoff curve L is monotonically non-increasing as deadline increases

  41. Design Space Model

  42. Theorem • If C is the optimal TCS result at time t1, then the RCS result t2 of C satisfies t2 <= t1. • More importantly, there is no configuration C′with a smaller cost can produce an execution time within [t2, t1].

  43. Theorem (continued)

  44. What does it give us? • It implies that we can construct L: • Starting from the rightmost t • Find TCS solution C • Push it to leftwards using RCS solution of C • Do this iteratively (switch between TCS + RCS)

  45. DSE Using Time/Resource Duality

  46. Experiments • Three DSE approaches • FDS: Exhaustively scanning for TCS • MMAS-TCS: Exhaustively scanning for TCS • MMAS-D: Proposed method leveraging duality * Scanning means that we perform TCS on each interested deadline

  47. DSE: MMAS-D vs. FDS

  48. Experimental Results

  49. Algorithm Runtime

  50. Real Design Complications • Heterogeneous mapping • One operation has many implementations • Different bit-width, e.g. 32-bit multiplier good for mul(24) and mul(32) • Different area and delay • Real technology library extremely sophisticated • Hard to estimate final timing and total area • Sharing depends on the cost of multiplexers • Downstream tools may not generate what we expect • Resource sharing, register sharing • Downstream tools break components’ boundaries • Logic synthesis, placement and routing

More Related