1 / 37

Low Power Hardware Synthesis from Concurrent Action Oriented Specifications (CAOS)

Dagstuhl Seminar – Power-aware Computing Systems. Low Power Hardware Synthesis from Concurrent Action Oriented Specifications (CAOS). Gaurav Singh, Sandeep K. Shukla, FERMAT Lab, Virginia Tech. Talk Outline. CAOS. Formalization Schedule. Cost of a schedule (in terms of its power).

unity
Download Presentation

Low Power Hardware Synthesis from Concurrent Action Oriented Specifications (CAOS)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dagstuhl Seminar – Power-aware Computing Systems. Low Power Hardware Synthesis from Concurrent Action Oriented Specifications (CAOS) Gaurav Singh, Sandeep K. Shukla, FERMAT Lab, Virginia Tech.

  2. Talk Outline • CAOS. • Formalization • Schedule. • Cost of a schedule (in terms of its power). • Power Problems ( Peak and Dynamic ). • Low Power Strategies. FERMAT / Virginia Tech

  3. Bank Account Example • Process 0 increments register x • Process 1 transfers a unit from register x to register y • Process 2 decrements register y • This is an abstraction of some real applications: • Bank account: 0 = deposit to checking, 1 = transfer from checking to savings, 2 = withdraw from savings • Packet processor: 0 = packet arrives, 1 = packet is processed, 2 = packet departs • … 0 2 1 +1 -1 +1 -1 x y FERMAT / Virginia Tech

  4. 0 2 1 +1 -1 +1 -1 x y Concurrency in the example • Process j (= 0,1,2) only updates under condition condj • Only one process at a time can update a register. Note: • Process 0 and 2 can run concurrently if process 1 is not running • Both of process 1’s updates must happen “indivisibly” (else inconsistent state) • Suppose we want to prioritize process 2 over process 1 over process 0 cond0cond1cond2 Process priority: 2 > 1 > 0 FERMAT / Virginia Tech

  5. 0 2 1 +1 -1 +1 -1 x y Find the error: cond0cond1cond2 Process priority: 2 > 1 > 0 Process priority: 2 > 1 > 0 always @(posedge CLK) begin if (!cond2 || cond1) x <= x – 1; else if (cond0) x <= x + 1; if (cond2) y <= y – 1; else if (cond1) y <= y + 1; end always @(posedge CLK) begin if (!cond2 && cond1) x <= x – 1; else if (cond0) x <= x + 1; if (cond2) y <= y – 1; else if (cond1) y <= y + 1; end  Which of these solutions are correct, if any? FERMAT / Virginia Tech

  6. 0 2 1 +1 -1 +1 -1 x y CAOS design cond0cond1cond2 cond0 cond1 cond2 Process priority: 2 > 1 > 0 Process priority: 2 > 1 > 0 (* descending_urgency = “proc2, proc1, proc0” *) action proc0 (cond0); x <= x + 1; end action proc1 (cond1); y <= y + 1; x <= x – 1; end action proc2 (cond2); y <= y – 1; end FERMAT / Virginia Tech

  7. Concurrent Action Oriented Specifications – CAOS • Hardware design described in terms of atomic actions. • Each action consists of Guard and Body. Example - Action proc1 ( x>1 ) { y = y + 1; x = x - 1; } • Bluespec Compiler – High level Synthesis tool. FERMAT / Virginia Tech

  8. 0 2 1 +1 -1 +1 -1 x y Possible Schedule cond0cond1cond2 cond0 cond1 cond2 Process priority: 2 > 1 > 0 Process priority: 2 > 1 > 0 (* descending_urgency = “proc2, proc1, proc0” *) action proc0 (cond0); x <= x + 1; end action proc1 (cond1); y <= y + 1; x <= x – 1; end action proc2 (cond2); y <= y – 1; end Clock cycle cond0 cond0 cond1 cond2 cond0 proc0 proc1 proc2 proc0 Possible Schedule – one action in each clock cycle FERMAT / Virginia Tech

  9. Clock cycle cond0 cond0 cond1 cond2 cond0 proc0 proc1 proc2 proc0 Possible Schedule cond0 cond0 cond1 cond2 proc2 proc0 proc1 proc0 CAOS Schedule CAOS Schedule (* descending_urgency = “proc2, proc1, proc0” *) action proc0 (cond0); x <= x + 1; end action proc1 (cond1); y <= y + 1; x <= x – 1; end action proc2 (cond2); y <= y – 1; end Process priority: 2 > 1 > 0 proc0 and proc2 do not conflict FERMAT / Virginia Tech

  10. CAOS Semantics • Multiple non-conflicting actions can execute concurrently as long as concurrent behavior corresponds to at least one sequential ordering (may lead to high dynamic power and peak power consumption.) • Example with anti-dependency - Action a1 (true) : x = y + 1; Action a2 (true) : y = y + 1; • Concurrent execution corresponds to – a1 followed by a2. (since a2 updates y and a1 uses y – Anti-dependency) FERMAT / Virginia Tech

  11. Sequential Ordering a3 – a2 – a1– a5 – a4 Sequential Schedule Clock cycle a1 a4 a2 a5 a3 Original Schedule FERMAT / Virginia Tech

  12. Synthesis from CAOS. FERMAT / Virginia Tech

  13. Formalization. • Consider a design – • ŝ= { s1, s2,…, sk } : Set of k state elements. • σ(ŝ) : State of the design at some point. • A = {a1, a2,…, an} : Set of n actions of a design. • wi : Weight of an action ai ЄA. • Dependency (di, j) : An action ai is dependent on an action aj if any state accessed by ai is updated by aj. FERMAT / Virginia Tech

  14. Feasible Schedule • Consider original scheduleβ = {A1, A2,…, An …} where Aic A execute in clock cycle i. • If Ai= {ai1, ai2, …, aim } then • aij, aik Є Ai , aij and aik do not conflict. • if the concurrent execution of the actions in Aitransforms the design from a state σ(ŝ) toσ’(ŝ), then there exists a corresponding permutation (sequential ordering) that also transforms the design from a state σ(ŝ) toσ’(ŝ). FERMAT / Virginia Tech

  15. Cost of Schedule. • Costs for a schedule β = {A1, A2,…, An …} – • Peak Power: • Dynamic Power: where Pi, i+1 is the switching power expended in moving from Ai to Ai+1. • Low Power Goal – Given β, create a new schedule α such that Ppeak(α) < Ppeak(β) and/or Pswitch(α) < Pswitch(β). FERMAT / Virginia Tech

  16. Peak Power Problem. • G - Maximal set of actions enabled in a clock cycle c. • Ppeak – Maximum Allowable Peak Power. • fi = 1 if the action ai is executed in clock cycle c, otherwise fi = 0. • Peak power minimization problem– for each clock cycle under the following constraint – • d FERMAT / Virginia Tech

  17. Low Power Strategies. • Re-scheduling -Targets the power minimization in a design by re-scheduling the execution of various actions – • Uses sequential ordering for re-scheduling. • Factorizing and Re-scheduling -Targets the power minimization in a design by factorizing one or more actions of the design into lower granularity parts and re-scheduling these parts for power savings. FERMAT / Virginia Tech

  18. Clock cycle Clock cycle a1 a1 a4 a4 a2 a2 a5 a5 a3 a3 Peak Power Reduction -1 • Use Re-scheduling - Actions can be re-scheduled based on the ordering to meet the peak power goal. Original Schedule Low Power Schedule FERMAT / Virginia Tech

  19. Clock cycle Clock cycle a1 a1 a4 a4 a2 a2 a5 a5 a3 a3 Functional Equivalence a3 – a2 – a1– a4 – a5 Sequential Schedule Original Schedule Low Power Schedule FERMAT / Virginia Tech

  20. Peak Power Problem – Versions. • Version 1 – Actions have to be chosen based on sequential ordering. • Version 2 - Any action can be chosen with each action having same profit – • Order actions based on their weights (power consumed). • Version 3 – Any action can be chosen with each action having different profit – • Corresponds to 0/1 Knapsack problem - NP-Complete. FERMAT / Virginia Tech

  21. Factorizing an Action • Factorization - Larger action a can be factorized into parts a1and a2 each of which can execute in consecutive clock cycles to meet the peak power constraint. • Constraints - • Atomicity should be maintained – If a1 is accessing state updated by a2then a1 should execute before a2. • Dependencies with other actions should be maintained. FERMAT / Virginia Tech

  22. Clock cycle Clock cycle a1 a1-2 a1-1 a4 a2 a2 a4 a5 a3 a3 a5 Peak Power Reduction - 2 • Use Factorization– Factorized parts can be re-scheduled in consecutive clock cycles based on the dependency constraints. Original Schedule Low Power Schedule FERMAT / Virginia Tech

  23. Low Power Synthesis from CAOS. Main Issue - How to efficiently re-schedule actions in real hardware ? FERMAT / Virginia Tech

  24. Dynamic Power Problem • Dynamic power minimization problem – • Select the most power efficient ordering of execution of actions. • NP-Complete (Travelling Salesman Problem) –Given a weighted directed graph G = (V, E) find a path with the least weight that includes every vertex of set V exactly once. • Sub-problem to the Dynamic Power Problem. FERMAT / Virginia Tech

  25. Low Dynamic Power – Re-scheduling • Re-scheduling of Actions – • Actions are re-scheduled such that switching at the inputs of the functional units is minimized. • Resource sharing - Conflicts are created such that same functional units can be re-used to avoid switching. FERMAT / Virginia Tech

  26. Low Dynamic Power – Operand Isolating a single action action foo (… cond … (x < y) …); x <= x + z … endrule Computations stay quiescent except when action executes, i.e. guard is True x x’ action foo y y’ next-state values Φ2 z z’ next state Q D body logic current state EN cond logic enablesignals FERMAT / Virginia Tech

  27. D Q Enable Operand Isolating multiple actions Isolating multiple actions of a design. Rule1 Rule Control State DataSelect RuleN Φ2 Action1 ΦN ActionN Cond1 Scheduler CondN FERMAT / Virginia Tech

  28. Low Dynamic Power – Register-level clock gating • Register-level clock gating – • Registers having a common ENABLE signal can be provided the same gated clock. • Prevents unnecessary switching in the registers. • CAOS - Registers being updated in a body of an actions are gated using the guard of the action. • Implemented algorithm in Bluespec Compiler saved • Operand Isolation - upto 20% dynamic power. • Clock-gating – upto 26% dynamic power. FERMAT / Virginia Tech

  29. Thank You !! ? FERMAT / Virginia Tech

  30. Low Dynamic Power – Operand Isolation • Operand Isolation - • In order to save power, computation corresponding to the body of an action is allowed only when its output is used in the present clock cycle. • Involves - • Insertion of gates at the appropriate points without affecting guards. • Selection of activation signal. • Guards of actions used as gating signals. • Implemented algorithm in Bluespec Compiler saved upto 20% dynamic power [3]. FERMAT / Virginia Tech

  31. Implementation (Ongoing) • Control circuitry needed to decide which actions execute in each clock cycle – Will consume extra power if implemented in hardware. • How can this extra power consumption be avoided? • Create extra conflicts among the actions. • Analysis required to decide what conflicts to add FERMAT / Virginia Tech

  32. Side Effects - Latency • Re-scheduling for power minimization may degrade Latency. • µ - Maximum latency degradation factor for re-scheduling. • Corresponding Average Peak Power constraint can be estimated as - FERMAT / Virginia Tech

  33. Operand Isolation– 1Using register/latch for frequently enabled actions Maximum quiescence – good if actions alternate on-off, e.g. arbiter Phase 2 edge-triggered register OR phase 2 transmitting latch action foo next-state values current state Φ2 D body logic EN cond logic enablesignals FERMAT / Virginia Tech

  34. Operand Isolation–2Using AND gate for infrequently enabled actions Optimal area – great if actions stay unenabled for multiple cycles, e.g. FSM rules OR opcodes in a controller/processor AND gate action foo next-state values current state D body logic EN cond logic enablesignals FERMAT / Virginia Tech

  35. Automatic Clock-gating of Registers Registers having common ENABLE signals (updated by same set of actions) can be supplied the same gated clock. EN Register QOUT DIN CLK FERMAT / Virginia Tech

  36. Automatic Clock-gating of Registers In CAOS, guards of the actions provide the control for gating the clocks of the registers. CLK Register DIN EN QOUT GATED_CLK GATED_CLK EN CLK FERMAT / Virginia Tech

  37. Publications • G. Singh and S. K. Shukla, “Algorithms for Low Power Hardware Synthesis from CAOS - Concurrent Action Oriented Specifications” -Special Issue of International Journal of Embedded Systems (IJES’06). • G. Singh and S. K. Shukla, “Low-Power Hardware Synthesis from TRS-based Specifications” - MEMOCODE’06. • G. Singh, J.Schwartz, S.Ahuja and S. K. Shukla, “Techniques for Power-aware Synthesis from Concurrent Action Oriented Specifications” – Submitted to DAC’07. FERMAT / Virginia Tech

More Related