ECE 681 VLSI Design Automation

ECE 681VLSI Design Automation Khurram Kazi Thanks to Automation press THE button outcomes the Chip !!! Reality or Myth

Fundamental Steps to a Good design • Partitioning the Design is a good start • Partition by: • Functionality • Don’t mix two different clock domains in a single block • Don’t make the blocks too large • Optimize for Synthesis

Recommended rules for Synthesis • When implementing combinatorial paths do not have hierarchy • Register all outputs • Do not implement glue logic between block, partition them well • Separate designs on functional boundary • Keep block sizes to a reasonable size • Separate core logic, pads, clock and JTAG

Compilation of the Design • Compile command maps HDL code to gates from targeted library. Design compiler (Synopsys) provides a range of options for this command to control the mapping optimization of the design • compile –map_effort <low|medium|high> -incremental_mapping -in_place -no_design_rule | -only_design_rule -scan

Compile options explained -map_effort medium (default), if timing is not met, use –map_effort_high (DC maximizes its effort to meet the required constraints, compile times are long) -incremental_mapping is used after initial compile (i.e. the design has been mapped to gates of the targeted library). Is used to improve timing of the logic and fix DRCs (Design Rule Checks) DRC conditions are based on the vendor process technology and should not be violated. These attributes define under which the library cells operate safely (e.g. output loading). If DRC violations occur, then the Design Compiler replaces the driving cell with another cell that has high drive strength.

Compile options explained -in_place provides the capability of resizing gates (can control the buffering of the logic) -no_design_rule is not used frequently. It tells Design Compiler not to fix DRC violations -scan uses test ready compile features of the Design Compiler. This instructs DC to directly map the design to the scan-flops as opposed to normal flip flops. Normally RTL code is synthesized using normal flops. Close to the end of the design cycle, these flops are replaced by scan flops.

Constraint Based Optimization High Level Synthesis HDL Description Architecture Level Flatten Structure Gate Level Netlist Logic Level Gate Level Optimized Gate Level Netlist Map

Architectural Level Optimization High Level Synthesis: • Resource Sharing • DesignWare Implementation Selection (Synopsys specific) • Sharing Common Sub-Expressions • Inferencing Adders with Carry-In • Re-ordering Operators

A + B Mux sum C + D select Resource Sharing HDL Description if (select) then sum <= A + B; Else sum <= C + D; A mux C select + sum B mux D Another Implementation: shared resource Implementation -> Area-efficient One Possible Implementation

Sharable HDL Operators • Following HDL (VHDL and Verilog) synthetic operators can result in shared implementation * + - >= < <= = /= == • Within the same blocks, the operators can be shared (i.e. they are in the same process)

DesignWare Implementation Selection DesignWare implementation is dependent on Area and timing goals Smallest implementation is selected based on timing goals being met fastest Carry Look Ahead + smallest Ripple Carry Synthetic Module

Sharing Common Sub-Expressions Design compiler tries to share common sub-expressions to reduce the number of resources necessary to implement the design -> area savings while timing goals are met A B C D E SUM1 <= A + B + C; SUM2 <= A + B + D; SUM3 <= A + B + E; + + + + SUM1 SUM2 SUM3

Sharing Common Sub-Expression’s Limitations • Sharable terms must be in the same order within the each expression sum1 <= A + B + C; sum2 <= B + A + D; -> not sharable sum3 <= A + B + E; -> sharable • Sharable terms must occur in the same position (or use parentheses to maintain ordering) sum1 <= A + B + C; sum2 <= D+ A + B; -> not sharable sum3 <= E +(A + B); -> sharable

How to Infer Specific Implementation (Adder with Carry-In • Following expression infers adder with carry-in sum <= A + B + Cin; where A and B are vectors, and Cin is a single bit A B Cin + sum

Operator Reordering • Design Compiler has the capability to produce the reordering the arithmetic operators to produce the fastest design • For example Z <= A + B + C + D; (Z is time constrained) Initially the ordering is from left to right A + B + C + Z D

Reordering of the Operator for a Fast Design If the arrival time of all the signals, A, B, C and D is the same, the Design Compiler will reorder the operators using a balanced tree type architecture A + B + Z C + D

Reordering of the Operator for a Fast Design If the arrival time of the signal A is the latest, the Design Compiler will reorder the operators such that it accommodates the late arriving signal C + B + D + Z A

Summarizing: High level synthesis is constraint driven • Resource sharing, sharing common sub-expressions and implementation selection are all dependent on design constraints and coding style • Design Compiler based on timing constraints decides what to share, how to implement and what ordering should be done. • If no constraints are given, area based optimization is performed (maybe a good start to get an idea of the synthesized circuit) • It is imperative that realistic constraints should be set prior to compilation • High Level synthesis takes place only when optimizing an HDL description

Constraint Based Optimization High Level Synthesis HDL Description Architecture Level Flatten Structure Gate Level Netlist Logic Level Gate Level Optimized Gate Level Netlist Map

Flattening and Structuring Flattening creates two-level, sum of products (SOP) implementation of the design Can create faster design with possible area penalty (only 2 levels of logic)

Structuring Example Structuring is used for designs containing regular logic, e.g. carry look ahead adder. It creates multi-level logic in implementing the design Can be area and time optimized. Usually this is used

Before Structuring P = ax + ay + c Q = x + y + z After Structuring P = aI + c Q = I + z I = x + y Structuring Example

Flattening removes intermediate structures – reduces designs to SOP Is done independent of constraints Can be area intensive (e.g. usage of XORs) There is no guarantee that flattening will map into 2 level SOP Structuring creates intermediate structures to realize design Is constraint based Can speed up design along with area reduction Optimizes area if no time constraints are set Comparison between Flattening and Structuring

How to control flattening and structuring Using set_flatten <true|false> -design <list of designs> -effort <low|medium|high> -phase <true|false> -phase true, advises DC to compare the logic produced by inverting the equation versus the non-inverted form of the equation. This can come in handy in providing timing closure (i.e. may result in logic that meets timing requirements)

set_flatten true –phase true

Example of SONET Framing block in a framer ASIC 1 2 3 4 5 ………….. ………. 90th byte A1 A2 1 2 3 4 5 6 7 8 9 A1 = hexF6, A2 = hex28; is the framing pattern used in SONET networks; Order or transmission is F6 (11110110) msb transmitted first. All bytes other than A1 and A2 are scrambled

Data Com bytes (D1-D3) • D1-D3 bytes are the 1st three bytes in the 3rd row of the STS-1 frame. These bytes are used as a 192 kbps data channel for operations functions, such as Operations, Administration, Management and Provisioning (OAM&P). These bytes are used between 2 “section” type equipment (like regenerator)

Data Com bytes D4-D12 • These bytes (1st three bytes of rows 6,7 and 8) represent a 576 kbps message-based channel used for OAM&P messages between SONET line-level network equipment.

Assignment 1 continued • Use the SONET scramble to scramble the data (except A1 and A1 bytes) • Calculate B1and insert it in the next SONET frame • Use PRBS pattern generator to insert in the 3 Data Com bytes D1-D3 byte positions • Take first 9 characters of your name and convert them into ASCII (bit value). Insert those values in the D4-D12 byte positions. • The D1-D3 and D4-D12 data should come out on separate serial ports along with 192 kbps and 576 kbps clock. The firs byte should be indicated by a start of frame signal. (3 ports per Data com bytes should be output ports from your block, therefore total of 6 output ports)

I/Os of the SONET framer

ECE 681 VLSI Design Automation