1 / 64

ECE 551 Digital System Design & Synthesis

ECE 551 Digital System Design & Synthesis. Lecture 11 Verilog Design for Synthesis. Topics. Optimization from the Design Level Interaction of Description and Synthesis Critical Path Optimization High-Level Architectures for Datapaths. Overview.

darci
Download Presentation

ECE 551 Digital System Design & Synthesis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE 551Digital System Design & Synthesis Lecture 11 Verilog Design for Synthesis

  2. Topics Optimization from the Design Level • Interaction of Description and Synthesis • Critical Path Optimization • High-Level Architectures for Datapaths

  3. Overview • In the previous lecture, we looked at ways the synthesis tool can automatically optimize our logic • In this lecture, we will look at the ways the designer who is writing the HDL code can optimize and manage trade-offs.

  4. Overview • How you implement something in Verilog can have a profound effect on what is actually synthesized (and the effort required to do it!) • Functionally identical ≠ identical hardware • To be effective, you need to • Know what it is that you are trying to describe (i.e. not viewing Verilog as an abstract language) • Know how the desired hardware should be organized • Know how the synthesis tools will be likely to implement a given description • Describe the hardware in a way that causes the synthesis tools to do what you want

  5. Knowing what you want to describe Case Study: Multiplier

  6. 4-Input Multiplier • What does the below code describe? module mult(outputreg [31:0] out,input [31:0] a, b, c, d); always@(*) begin out = ((a * b) * c) * d; end endmodule

  7. Area: 47381 Delay: 8.37 How can we improve the delay and/or area? Multiplier Implementation

  8. Multiplier Redux • What are we describing? • How will it compare in speed and area? module multtree(outputreg [31:0] out,input [31:0] a, b, c, d); always@(*) begin out = (a * b) * (c * d); end endmodule

  9. Area: 47590 vs. 47381 Delay: 5.75 vs. 8.37 Tree Multiplier

  10. Multiplier – once again... • How can we reduce the area? module multtree(outputreg [31:0] out,input [31:0] a, b, c, d); always@(*) begin out = (a * b) * (c * d); end endmodule

  11. Shared Multiplier [1] module multshare(outputreg [31:0] out, input [31:0] in, input clk, rst); reg [31:0] multval; reg [1:0] cycle; always @(posedge clk) begin if (rst) cycle <= 0; else cycle <= cycle + 1; out <= multval; end always @(*) begin if (cycle == 2'b0) multval = in; else multval = in * out; end endmodule

  12. Area: 15990 vs. 47590 Critical Path Delay: 3.14 Latency: 3.14 * 4= 12.56 vs. 5.75 Shared Multiplier [2]

  13. Shared Multiplier (cont) • Given that only one multiplier will be allowed for the implementation, could we have done better on the latency than the previous example did? At what cost? module multtree(outputreg [31:0] out,input [31:0] a, b, c, d); always@(*) begin out = (a * b) * (c * d); end endmodule

  14. Knowing what you want to describe Lesson: You need to think about what sort of hardware you want to design from the very beginning of the process. Synthesis tools will only do so much with the descriptions you give them.

  15. Knowing what you are describing Case Study: Mixed Flip-Flops

  16. Mixing Flip-Flop Styles (1) • Say we don’t need to reset q2 • What will this synthesize to? module badFFstyle (output reg q2, input d, clk, rst_n); reg q1; always @(posedge clk) if (!rst_n) q1 <= 1'b0; else begin q1 <= d; q2 <= q1; end endmodule

  17. Flip-Flop Synthesis (1) • Area = 59.0 • Slack = 0.53 (clock = 1ns, input delay 0.2) • Q2 now has to implement a load enable that is connected to the reset

  18. Mixing Flip-Flop Styles (2) module goodFFstyle (output reg q2, input d, clk, rst_n); reg q1; always @(posedge clk) if (!rst_n) q1 <= 1'b0; else q1 <= d; always @(posedge clk) q2 <= q1; endmodule

  19. Flip-Flop Synthesis (2) • Area = 50.2 (85% of original area!) • Slack = 0.53 (unchanged) • Without the load enable function, flip flop Q2 is smaller. • Use reset and enable only when you need them!

  20. Mixing Flip-Flop Styles • Would an asynchronous reset have fixed it? module badFFstyle2 (output reg q2, input d, clk, rst_n); reg q1; always @(posedge clk, negedge rst_n) if (!rst_n) q1 <= 1'b0; else begin q1 <= d; q2 <= q1; end endmodule

  21. Flip-Flop Synthesis (3) • Using asynchronous reset instead • Bad: Area = 58.0, slack = 0.57 • Good: Area = 49.1, slack = 0.57

  22. Knowing what you are describing Lesson: If you don’t know the rules of the language, it’s easy to describe something different than what you intended. Following coding style guidelines makes this easier.

  23. Knowing the interpretation Case Study: Conditional Multiplier

  24. Conditional Multiplier [1] module multcond1(outputreg [31:0] out, input [31:0] a, b, c, d, input sel); always @(*) begin if (sel) out = a * b; else out = c * d; end endmodule What would you expect this to generate?

  25. Area: 15565 Delay: 3.14 Two 32-bit muxes and one multiplier! Conditional Multiplier [2]

  26. Selected Conditional Multiplier [1] module multcond2(outputreg [31:0] out, input [31:0] a, b, c, d, input sel); wire [31:0] m1, m2; assign m1 = a * b; assign m2 = c * d; always @(*) begin if (sel) out = m1; else out = m2; end endmodule What do you expect here compared to the previous one?

  27. Area: 30764 vs. 15565 Delay: 3.02 vs. 3.14 Why is the area larger and delay lower? 2 multipliers and a 64-bit mux! So why did that happen? Selected Cond. Mult. [2]

  28. Resource Sharing Rules • Can happen automatically if variable is assigned by multiple expressions (if/else) with the same operation and bit widths • NO combinational feedback can be caused • Inputs may be reordered to reduce mux area • The Verilog HDL Compiler operates according to the following rules for automatic sharing • No sharing in conditional operators • x = s ? (a+b) : (a+c); //will use two adders • If/else will permit sharing • Manual control is also available – see reading.

  29. Conditional Multipler – One More Time • If you know ahead of time that you want two muxes and one multiplier, describe that directly! • Don’t rely on the synthesis tool to improve inefficient HDL; describe what you want first. • Caveat: You have to know what you want. • module multcond2(outputreg [31:0] out, • input [31:0] a, b, c, d, inputsel); • wire [31:0] op1, op2; • assign op1 = sel ? a : c; • assign op2 = sel ? b : d; • always @(*) begin • out = op1 * op2; • endmodule

  30. Knowing the interpretation Lesson: Different ways of describing the same behavior in Verilog may lead to different results. Understanding how the synthesis tool interprets different Verilog constructs is a valuable skill to becoming an expert designer.

  31. Knowing the Synthesis Tool Case Study: Decoder Synthesis

  32. Decoder Synthesis • Parameterized decoders are commonly written in one of two ways in Behavioral Verilog • Use the select input as an index to assert only the desired output after negating all outputs • Test the select input in a loop for all decoder outputs, and only asserted the matching output • Will this choice affect • Circuit delay? • Circuit area? • Compiler time? • Surprisingly, the answer is: Yes, quite a lot, even though we are trying to describe the exact same hardware!

  33. Decoder Using Indexing

  34. Decoder Using Loop

  35. Decoder Verilog: Timing Comparison

  36. Decoder Verilog: Area Comparison

  37. Decoder Verilog: Compile Time Comparison

  38. Knowing the Synthesis Tool Lesson: Never forget that in the end, you are at the mercy of the synthesis tool. Even when something is part of the Verilog Standard, you can’t always be sure it will be supported (or supported well) by every tool. This knowledge comes with time.

  39. Putting it all Together • If we • Know what hardware we want • Know how to describe what we want • Can interpret the results we get from the synthesis tool • Now we can begin making low-level optimizations

  40. Late-Arriving Signals • After synthesis, we can identify the critical path(s) that are controlling the overall circuit speed, and which signals are responsible for those path(s). • Assume that one signal to a block of logic is known to arrive after the others. To deal with this: • Circuit reorganization • Rewrite the code to restructure the circuit in a way that minimizes the delay with respect to the late arriving signal • Logic duplication • This is the classic speed-area trade-off. By duplicating logic, we can move signal dependencies ahead in the logic chain.

  41. Original Code

  42. Original Synthesis What can we do if A is the late-arriving signal?

  43. Reorganized: Operator In if Changed the operation from (A + B) < 24 to A < (24 – B)

  44. Reorganized: New Hardware What’s going on here?

  45. Duplication Example: Original Design

  46. Original Hardware ADDR ADDRESS PTR OFFSET What if control is the late arriving signal?

  47. Data Duplication : New HDL Code

  48. Duplication: New Hardware ADDR1 COUNT1 OFFSET1 COUNT1 ADDR2 COUNT2 OFFSET2

  49. Exercise • Assume we are implementing the below code, and cin is the late arriving signal. How can we optimize the resulting hardware for speed? At what cost? reg [30:0] a, b; reg [31:0] y; reg cin; always@(*) y = a + b + cin;

  50. Exercise • Rewrite the code below to • 1. Minimize area • 2. Best performance if sel is late-arriving reg [3:0] x [3:0]; reg [1:0] sel; reg [3:0] y, sum; always@(*) y = sum + x[sel];

More Related