1 / 46

ECE 551 Digital System Design & Synthesis

ECE 551 Digital System Design & Synthesis. Lecture 10 Synthesis Techniques. Lecture 10 Topics. Synthesis Process Revisited Optimization Stages in Synthesis Advanced Synthesis Strategies. Synthesis. Verilog files aren’t hardware yet! Need to “synthesize” them

tadeo
Download Presentation

ECE 551 Digital System Design & Synthesis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE 551Digital System Design & Synthesis Lecture 10 Synthesis Techniques

  2. Lecture 10 Topics • Synthesis Process Revisited • Optimization Stages in Synthesis • Advanced Synthesis Strategies

  3. Synthesis • Verilog files aren’t hardware yet! • Need to “synthesize” them • Tool reads hardware descriptions • Figures out what hardware to make • Done automatically • Faster! • Easier! • Designers still have to understand hardware! • Avoid pre- vs. post-synthesis discrepancies • Describe EFFICIENT hardware

  4. Useful Documentation • Fairly complete documentation is available for the Synopsys tools using: /afs/engr.wisc.edu/apps/eda/synopsys/syn_Y-2006.06-SP1/sold • See especially (through Design Compiler link) • Design Vision User Guide • Design Compiler User Guide • Design Compiler Reference Manuals • HDL Compiler (Presto Verilog) Reference Manual • HDL Compiler for Verilog Reference Manual • Use as references

  5. HDL Compiler for Verilog Reference Manual, pg. 1-5. • HDL Compiler is called by Design Compiler and Design Vision • Why do we need to compare synthesized code to initial code?

  6. Design CompilerUser Guide, pg. 2-17 • Design Vision is GUI for Design Compiler: use design_vision • Can also run Design Compiler directly using dc_shell • To compile using a synthesis script use dc_shell –tcl_mode –f file_name

  7. Synthesis Script Example [1] # To run, place in the directory with all the Verilog files # and type: dc_shell -tcl_mode -f script.tcl #Analyze input files. analyze -library WORK -format verilog {./prob5.v ./prob1.v ./prob2.v} #Elaborate the design. elaborate GF_multiplier_mword -architecture verilog -library WORK #Sets clock constraint of 2ns with 50% duty cycle on signal "clock". create_clock -name "clk" -period 2 -waveform {0 1} {clock} set_dont_touch_network [ find clock clk ] #Sets the area constraint for the design set_max_area 50000

  8. Synthesis Script Example [2] #Check and compile the design check_design > check_design.txt uniquify compile -map_effort medium #Export netlist for post-synthesis simulation into synth_netlist.v change_names -rule verilog -hierarchy write -format verilog -hierarchy -output synth_netlist.v #Generate reports report_resources > resource_report.txt report_area > area_report.txt report_timing > timing_report.txt report_constraint -all_violators > violator_report.txt report_register -level_sensitive > latch_report.txt exit

  9. Internal Synthesizer Flow (Synopsys) Technology Mapping Multi-Level Logic Optimization Elaboration & Translation Synthesizer Policy Checking Architectural Optimization Syntax Checking Structural Representation Technology-Based Implementation Technology Library HDL Description

  10. Initial Steps • Parsing for Syntax and Semantics Checking • Gives error messages and warnings to user • User may modify the HDL description in response • Synthesizer Policy Checking (“Check Design”) • Check for adherence to allowable language constructs • Are you using unsupported operators or constructs? Combinational feedback? Multiple drivers to non-tristate? • This is where you find out you can’t use certain Verilog constructs • This is synthesizer-dependent • Example: Advanced DesignWare library allows modulo with any value; most other tools only allow modulo with powers of 2. • Certain things common to MOST synthesizers • See HDL Compiler for Verilog Reference Manual for constructs

  11. Elaboration & Translation • Unrolls loops, substitutes macros & parameters, computes constant functions, evaluates generate conditionals • Builds a structural representation of the design • Like a netlist, but includes larger components • Not just gate-level, may include adders, etc. • Gives additional errors or warnings to the user • Issues in initial transformation to hardware. • For example, port sizes do not match • Affects quality achieved by optimization steps • Structural representation depends on HDL quality • Poor HDL can prevent optimization

  12. Importance of Translation • It is important for the tool to recognize the sort of logic structures you are trying to describe. • If it sees a 32-bit full adder, the tool has built-in solutions for optimizing adders • Ripple-carry, carry-save, carry look-ahead, etc. • If it just sees a Boolean function with 65 inputs, it has to work a lot harder to achieve the same results • Do you think it can invent a CLA on the fly?

  13. Implications of Translation • Writing clear, easy to understand code not only benefits other engineers, but may give you better synthesis results. • Another reason for standard coding guidelines • Brush up on the list in “Verilog Styles That Kill” • If you have a decent synthesis tool, it’s usually better to use Verilog’s built-in arithmetic operators rather than trying to build them from gates or Boolean equations

  14. Optimization in Synthesis • None of these are guaranteed! • Most synthesizers will make at least some attempt • Detect and eliminate redundant logic • Detect combinational feedback loops • Exploit don't-care conditions • Try to detect unused states • Detect and collapse equivalent states • Make state assignments if not made already • Synthesize multi-level logic equations subject to: • constraints on area and/or speed • available technology (library)

  15. Optimization Process • Optimization modifies the generic netlist resulting from elaboration and translation. • Uses cells from the technology library (mapping) • Attempts to meet all specified constraints • The process is divided into major phases • All or some selection of the major phases may be performed during optimization • Phase selection can be controlled by the user • Some optimizations can be disabled (ex: set_structure) or forced (ex: set_flatten)

  16. Optimization Phases • Major Optimization Stages • Architectural • Logic-Level • Gate-Level • Architectural optimization • High-level optimizations that occur before the design is mapped to the logic-level • Based on constraints and high-level coding style • After optimization circuit function is represented by a generic, technology-independent netlist (GTECH)

  17. Architectural Optimization • In Synopsis, optimizations include: • Sharing common mathematical subexpressions • Sharing resources • Selecting DesignWare* implementations • Replacing the generic representation from Translation with a pre-built, optimized circuits • Reordering operators • Identifying arithmetic expressions for datapath synthesis *DesignWare is Synopsys’s library of pre-designed circuit implementations

  18. Architectural Optimization • Examples: • Replace an adder used as a counter with incrementer count = count + 1; • Replace adder and separate subtractor with adder/subtractor if not used simultaneously if (~sub) z = a + b; else z = a – b; • Performs selection of pre-designed components (Synopsys DesignWare) • adders, multipliers, shifters, comparators, muxes, etc. • Need good code for synthesizer to do this • Designer knows more about the project than the tool does! It can only do so much on its own.

  19. Logic/Gate-Level Optimization • Works on the generic netlist created by logic synthesis • Produces a technology-specific netlist. • In Synopsis, it consists of four stages: • Mapping • Delay optimization • Design rule fixing • Area optimization • This phase often runs in multiple iterations if constraints are not met on the first try

  20. Logic/Gate-Level Optimization • Mapping • Generates a gate level implementation using tech library • Tries to meet timing and area goals • Delay optimization • Tries to fix delay violations from mapping phase. • Does not fix design rule violations or meet area constraints. • Design rule fixing • Tries to correct design rule violations • Inserting buffers or resizing existing cells • If necessary, violates optimization constraints • Area optimization • Tries to meet area constraints, which have lowest priority

  21. Combinational Optimization

  22. Gate-Level Optimization

  23. Boolean Logic-Level Optimizations Verilog Technology Description Libraries TRANSLATION OPTIMIZATION MAPPING ENGINE ENGINE ENGINE Optimized Two-level Technology Multi-level Logic Logic Functions Implementation Functions

  24. Logic Optimizations • Area • Number of gates fewer == smaller • Size of gates (# inputs) fewer == smaller • Delay • Number of logic levels fewer == faster • Size of gates (# inputs) fewer == faster • Note that examples that follow ignore NOT gates for gate count / levels of circuits • This is because many libraries offer gate cells with one or more inputs already inverted.

  25. Logic Optimizations • Decomposition • Extraction • Factoring • Substitution • Elimination • You don’t have to remember the names of these • But should understand logic optimization • Different techniques targeting area vs. delay

  26. Decomposition • Find common expressions in a single function • Reduce redundancy • Reduce area (number/size of gates) • May increase delay • More levels of logic • Define a G(x) cost function to compare expressions • G(inverter) = 0 • G(basic gate) = #inputs to the gate • Basic gates: AND, OR, NAND, NOR • Based on the concept that the size of a gate is proportional to the number of inputs

  27. Decomposition Example • F = abc + abd + a’c’d’ + b’c’d’ • F = ab(c + d) + c’d’(a’ + b’) • F = ab(c + d) + (c + d)’(ab)’ • X = ab 1 gate, 1 level • Y = c + d 1 gate, 1 level • F = XY + X’Y’ 3 gates, 2 levels (5 gates, 3 levels total) G(Original) = 16 (four 3-input, one 4-input gates) G(Decomposed) = 10 (five 2-input gates)

  28. Extraction • Find common sub-expressions between functions • Like decomposition, but across more than one function • Reduce redundancy • Reduce area (number/size of gates) • May increase delay if more logic levels introduced

  29. Extraction Example • F = (a + b)cd + e 3 gates, 3 levels • G = (a + b)e’ 2 gates, 2 levels • H = cde 1 gate, 1 level • Common subexp: X = a + b, Y = cd 1 gate, 1 level (each) • F = XY + e 4 gates, 3 levels • G = Xe’ 2 gate, 2 levels • H = Ye 2 gate, 2 levels • Before: • (3) 2-input ORs, (2) 3-input ANDs, (1) 2-input AND • G(original) = 6 + 6 + 2 = 14 • After • (2) 2-input Ors, (4) 2-input ANDs • G(extracted) = 4 + 8 = 12

  30. Factoring • Traditional two-level logic is sum-of-products • Sometimes better expressed by product-of-sums • Fewer literals => less area • May increase delay if logic equation not completely factored (becomes multi-level)

  31. Factoring Example • Definitely good: • F = ac + ad + bc + bd 7 gates, 3 levels* • F = (a + b)(c + d) 3 gates, 2 levels • Maybe good: • F = ac + ad + e 3 gates, 2 levels (G=7) • F = a(c + d) + e 3 gates, 3 levels (G=6) • This one might improve area... • But will likely increase delay (tradeoff) *Assuming 2-input gates

  32. Substitution • Similar to Extraction • When one function is a sub-function of another • Reduce area • Fewer gates • Can increase delay if more logic levels

  33. Substitution Example • G = a + b 1 gate, 1 level • F = a + b + c 1 gate, 1 level • F = G + c 2 gate, 2 levels • Before: • (1) 2-input OR, (1) 3-input OR • After: • (2) 2-input ORs (better area but increased levels) With compile_ultra, the sub-expressions do not have to explicitly match, i.e. a + b would still be identified if F = b + c + a

  34. Elimination (Flattening) • Opposite of previous optimizations • Goal is to reduce delay • Make signals travel though as few logic levels as possible • But will likely increase area • Gate replication / redundant logic • Can force/disable this step using set_flatten true / set_flatten false

  35. Elimination Example • G = c + d 1 gate, 1 level • F = Ga + G' b 3 gates, 3 levels • G = c + d 1 gate, 1 level • F = ac + ad + bc’d’ 4 gates, 2 levels • Before: • (2) 2-input ORs, (2) 2-input ANDs • After: • (1) 2-input OR, (1) 3-input OR, (2) 2-input ANDs,(1) 3-input AND (worse area, but fewer levels)

  36. compile_ultra Optimizations • Ultra-high mapping effort, 2-pass Compilation • Automatic hierarchical ungrouping • Ungroups small modules before mapping • Ungroups critical path based on delay • Automatic datapath extraction * • E.g. carry-save adders, sharing/unsharing • Boundary optimization • Propagates logic across hierarchical boundaries (constants, NC inputs/outputs, NOT) • Sequential inversion * • Sequential elements can have their outputs inverted

  37. Datepath Extraction Optimizations • Uses carry-save adders where beneficial • Carry-propagate adders only when result is needed

  38. Datapath Extraction Optimizations • Comparator sharing • A>B, A=B, A<B use a single subtractor with multiple outputs • Optimization of parallel constant multipliers • SOP to POS transformation • Operand reordering • Explores trade-offs of common sub-expression sharing and mutually exclusive resource sharing

  39. Sharing and Unsharing • Expression sharing may be overridden later due to timing • Z1 <= A + B + C • Z2 <= A + B + D • Arrival time is A < B < D < C

  40. Sharing and Unsharing • Mutually exclusive operations can share resources • if(SEL) Z = A + B • else Z = C + D • When would this kind of sharing be a bad idea?

  41. Sequential Inversion • set compile_seqmap_enable_output_inversion true • Useful if the available flip-flops do not have the same asynchronous input (preset or clear) as required in the design

  42. Register Retiming • At the HDL level, determining the optimal placement of registers is difficult and tedious at best, or just plain impossible at worst • The register retiming tool moves registers through the synthesized combinational logic network to improve timing and/or area • Equalize delay (i.e. reduce critical path delay by increasing delay in other paths) • Reduce the number of flip-flops if timing criteria are met • Usually propagate registers forward • Be aware that this may change the values of some internal signals compared to pre-synthesis.

  43. Register Retiming Example (1)

  44. Register Retiming Example (2)

  45. DC Topographical Mode • When optimizing for delay, the synthesis engine is not aware of the net delays, since the place-and-route has not been accomplished • Delays can be back-annotated and synthesis repeated after place-and-route, until closure is reached • Layout-aware synthesis attempts to get faster timing closure by predicting the physical design and using that information in synthesis and optimization, particularly with respect to delay • Estimates the placement and routing • Predicts and uses net capacitances in synthesis and optimization

  46. Further Reading • There are many more commands out there to give you greater control over the synthesis process if you want it. • See: • Synopsys Online Documentation (SOLD) • Design Compiler man pages

More Related