Automatic insertion of low power annotations in rtl for pipelined microprocessors
Download
1 / 30

Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors - PowerPoint PPT Presentation


  • 286 Views
  • Uploaded on

Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors . Vinod Viswanath The University of Texas at Austin. Outline. Power Dissipation in Hardware Circuits Instruction-driven Slicing to attain lower power dissipation

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors ' - Jeffrey


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Automatic insertion of low power annotations in rtl for pipelined microprocessors l.jpg

Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

Vinod Viswanath

The University of Texas at Austin


Outline l.jpg
Outline Pipelined Microprocessors

  • Power Dissipation in Hardware Circuits

  • Instruction-driven Slicing to attain lower power dissipation

    • Automatically annotates microprocessor description

    • At the Register Transfer Level and Architectural level

  • Applying Instruction-driven Slicing to pipelined architectures

  • Applying Instruction-driven Slicing to out-of-order superscalar architectures


Power dissipation l.jpg
Power Dissipation Pipelined Microprocessors

  • Switching activity power dissipation

    • To charge and discharge nodes

  • Short Circuit power dissipation

    • High only for output drivers, clock buffers

  • Static power dissipation

    • Due to leakage current


Switching activity power dissipation l.jpg
Switching Activity Power Dissipation Pipelined Microprocessors

  • Transistor-level

    • Reordering, sizing

  • Gate-level

    • Don’t-care optimizations (combinational)

    • Encoding (sequential)

    • Pre-computation based optimization (sequential)

    • Guarded evaluation (sequential)

  • RT-level

    • Use program structure and dataflow information available at that level of abstraction


Instruction driven slice l.jpg
Instruction-driven Slice Pipelined Microprocessors

  • An instruction-driven slice of a microprocessor design is

    • all the relevant circuitry of the design required to completely execute a specific instruction

    • Parts of the decode, execute, writeback etc. blocks

  • Cone of influence of the semantics of the instruction


Instruction driven slicing l.jpg
Instruction-driven Slicing Pipelined Microprocessors

  • Given a microprocessor design and an instruction

    • Identify the instruction-driven slice

    • Shut off the rest of the circuitry

  • This might include

    • Gating out parts of different blocks

    • Gating out floating point units during integer ALU execution

    • Turning off certain FSMs in different control blocks since exact constraints on their inputs are available due to instruction-driven slicing


Algorithm high level l.jpg
Algorithm (High Level) Pipelined Microprocessors

  • Algorithm instruction-driven-slicing.

    Begin

    • Inputs: vRTL (Verilog RTL), insts (instructions)

    • Output: aRTL (Annotated RTL)

    • Parse vRTL to obtain the Abstract Syntax Program Graph (ASPG)

    • For each instruction I in insts repeat

      • Slice the ASPG for instruction I

      • Traverse the ASPG

      • Add annotation variables if such a block is found

      • If a particular flop is already gated, then

        add the current annotation in an optimal fashion

      • Return the annotated ASPG

    • Generate Verilog code (aRTL) for the annotated ASPG

      End.


Or1200 ctrl lsu op l.jpg
or1200_ctrl.lsu_op Pipelined Microprocessors


Methodology l.jpg
Methodology Pipelined Microprocessors

  • In order to demonstrate our technique

    • We have incorporated instruction-driven slicing as part of the traditional design flow

    • The vRTL model is annotated to obtain the aRTL model

    • Synopsys Design Environment has been sufficiently modified to accept the aRTL, SPEC2000 benchmarks and power process parameters and estimate the power dissipation due to switching activity

    • The annotated Architectural model is fed to the SimpleScalar simulator with the Wattch power estimator to estimate the power dissipation


Methodology10 l.jpg
Methodology Pipelined Microprocessors


Experiment or1200 l.jpg
Experiment: OR1200 Pipelined Microprocessors

  • We have used our tool-chain to test our methodology on OR1200

    • OR1200 is a single-instruction-issue pipelined microprocessor implementing the OpenRISC ISA.

    • 4-stage integer pipeline with single instruction issue per cycle

    • We have annotated both the RTL and the architectural models of OR1200


Experiment or120012 l.jpg
Experiment: OR1200 Pipelined Microprocessors


Or1200 rtl results l.jpg
OR1200-RTL Results Pipelined Microprocessors

  • Results are shown after annotation insertion

    • Sliced on 1, 4, 10 instructions

    • For SPECINT2000 benchmarks

  • Power dissipation decreases consistently


Or1200 arch results l.jpg
OR1200-Arch Results Pipelined Microprocessors

  • Results are shown after annotation insertion

    • Sliced on 1, 4, 10 instructions

    • For SPECINT2000 benchmarks

  • Power dissipation decreases consistently


Or1200 results contd l.jpg
OR1200 Results (contd.) Pipelined Microprocessors

  • Power gains are consistently good

  • Power gains far outperform area losses


Or1200 results contd16 l.jpg
OR1200 Results (contd.) Pipelined Microprocessors

  • Flop distribution shown before slicing (Fig. a) after slicing on add, l.add (Fig. b) and after slicing on load, l.lw (Fig. c)

Fig. a

Fig. b

Fig. c


Experiment puma l.jpg
Experiment: PUMA Pipelined Microprocessors

  • We have used our tool-chain to test our methodology on PUMA

    • PUMA is a dual-issue, out-of-order super-scalar, fixed-point PowerPC core

    • We have annotated both the RTL and the architectural models of PUMA


Puma results contd l.jpg
PUMA Results (contd.) Pipelined Microprocessors

  • Power gains are good upon slicing for a few instructions (~7) before delay losses start dominating (Fig. 1)

  • Power gains far outperform area losses (Fig 2)

  • Flop distribution shown before slicing (Fig. 3a) after slicing on add (Fig. 3b) and after slicing on load (Fig. 3c)

Fig.3a

(Fig. 1)

Fig.3b

(Fig. 2)

Fig.3c


Conclusions l.jpg
Conclusions Pipelined Microprocessors

  • Proposed Instruction-driven Slicing as a new technique to automatically reduce power dissipation

  • Implemented the methodology of incorporating instruction-driven slicing into the design flow tool-chain

  • Inserting these annotations preserves the functionality of the circuit


Conclusions continued l.jpg
Conclusions (continued) Pipelined Microprocessors

  • This technique seems most applicable to single-issue multi-staged pipelined machines.

  • When there are multiple instructions in-flight in the same pipeline stage, the gains of a single-instruction-abstraction are lost.

  • Graphics processors, various embedded applications are more often better suited for this technique than general purpose out-of-order superscalars.


Spare slides l.jpg
Spare slides Pipelined Microprocessors


Puma a fixed point powerpc core l.jpg
PUMA: a fixed point PowerPC core Pipelined Microprocessors


Puma power gain results l.jpg
PUMA Power Gain Results Pipelined Microprocessors

  • Results are shown after annotating the

    • RTL (left) and Architectural (Right) models

    • For un-sliced and sliced on 1, 4, 10 instructions

    • For SPECINT2000 benchmarks

  • Power dissipation decreases consistently


Comparing or1200 and puma l.jpg
Comparing OR1200 and PUMA Pipelined Microprocessors


Correct annotations l.jpg
Correct Annotations Pipelined Microprocessors

  • Notion of correctness

    • Original RTL and the annotated RTL should be functionally equivalent under all conditions

  • Correctness theorem

    (defthm or1200_slicing_correct

    (equal (or1200_cpu n)

    (or1200_cpu_sliced n)))


Acl2 theorem prover l.jpg
ACL2 Theorem Prover Pipelined Microprocessors

  • First order logic general purpose theorem prover

  • Breakdown the theorem into sub-goals

  • Many engines work on the sub-goals and will either prove them or break them down further and add to the central pool of goals to be proved

  • Success story in Hardware

    • Verified FDIV in the AMD processors


Proof methodology l.jpg
Proof Methodology Pipelined Microprocessors


Proof methodology28 l.jpg
Proof Methodology Pipelined Microprocessors

  • The RTL is a shallow embedding in ACL2

  • Convert Verilog RTL into ACL2RTL

  • We have created a large RTL library to recognize as well as analyze ACL2RTL

  • Slicing is done on the Verilog code

  • Both original and annotated Verilog are converted into ACL2 and we construct the functional equivalence proof in ACL2


Verilog to acl2 l.jpg
Verilog to ACL2 Pipelined Microprocessors


Proof structure l.jpg
Proof Structure Pipelined Microprocessors

  • Create a library of functions to interpret the ACL2 model of the RTL

  • Functional equivalence theorem is built up block by block


ad