E N D
1. EECS 470: Computer Architecture Discussion #2
Friday, Sept. 17, 2010
2. Administrative Homework 1 due right now
Project 1 due Monday night (9/20)
Directions to submit in handout
3. Simple Example
4. Simple Example - Diagram
5. Questions Any questions about project 1?
6. Array Connections Make a simple module and duplicate it a bunch
Assume we have a module definition:
one_bit_addr(a,b,cin,sum,cout);
All ports are 1 bit, first three input, last two output
How do we build an eight bit adder?
7. The Error Prone Way module eight_bit_addr(a,b,cin,sum,cout);
input [7:0] a,b;
input cin;
output [7:0] sum;
output cout;
wire [6:0] carries;
one_bit_addr a0(a[0],b[0],cin,sum[0], carries[0]);
one_bit_addr a1(a[1],b[1],carries[0],sum[1], carries[1]);
one_bit_addr a2(a[2],b[2],carries[1],sum[2], carries[2]);
one_bit_addr a3(a[3],b[3],carries[2],sum[3], carries[3]);
one_bit_addr a4(a[4],b[4],carries[3],sum[4], carries[4]);
one_bit_addr a5(a[5],b[5],carries[4],sum[5], carries[5]);
one_bit_addr a6(a[6],b[6],carries[5],sum[6], carries[6]);
one_bit_addr a7(a[7],b[7],carries[6],sum[7], cout);
endmodule
8. The Error Prone Way Continued Lots of duplicated code
If you missed replacing one number it’s hard to find
Especially if it was much bigger, and had even more connections
Your tests might not catch the case
There is a one line substitute
9. The Better Way module eight_bit_addr(a,b,cin,sum,cout);
input [7:0] a,b;
input cin;
output [7:0] sum;
output cout;
wire [6:0] carries;
one_bit_addr addr [7:0] (.a(a),.b(b),.cin({carries,cin}),.sum(sum),.cout({cout,carries}));
Since the one_bit_addr ports are all 1 bit, we are instantiating 8 of them, and the eight_bit_addr ports are 8 bits, each one bit port will get one bit from the 8 bit value.
10. Array Connections Summary If the port width matches the wire width, then each module in the array is connected to the same wire
Example: a clock signal attached to an array of modules
If the port is 1/N as wide as the wire (N=array size), then each module is connected to a chunk of the wire
Example: The 8-bit inputs from the previous slide connected to array of modules with 1-bit inputs.
Note the concatenation operator in the previous example
It’s making the carries width correct and taking care of the boundary conditions
11. Synthesis Translate verilog to gates
Optimize translation to meet certain constraints
Extremely complex process
If you follow all the directions we’ve given you everything will probably work
I’m not guaranteeing it though
All your designs will need to synthesize
That way you’ll know you’re not doing anything that would be hard to implement in gates
Clock period reported by synthesis tool isn’t truly accurate
No global placement and routing
We fake the capacitance of wires
12. Hints to Synthesis Tool //synopsys sync_set_reset "<signal>"
Goes right before a synchronous always block
Tells Design Compiler that the <signal> is a synchronous reset
Helps the synthesis tool choose a synchronous reset
//synopsys parallel_case
Placed before a case statement
Only one branch of a case can be true at a time
//synopsys full_case
Placed before a case statement
Any unspecified cases are invalid
You can also put a default: in the case for good measure
//synopsys one_hot "<signal>"
Placed after signal declared
Only one signal of the group will be 1 at a given time
13. Synthesis Scripts #/***********************************************************/
#/* The following five lines must be updated for every */
#/* new design */
#/***********************************************************/
read_file -f verilog [list "inout.v"]
set design_name tinout
set clock_name clock
set CLK_PERIOD 6
set reset_name reset
#/***********************************************************/
#/* The rest of this file may be left alone for most small */
#/* to moderate sized designs. You may need to alter it */
#/* when synthesizing your final project. */
#/***********************************************************/
set SYN_DIR ./
set search_path
"/afs/engin.umich.edu/caen/generic/mentor_lib-D.1/public/eecs470/synopsys/"
set target_library "lec25dscc25_TT.db“
…
14. Synthesis Script A bunch of directives to tell Design Compiler what to do
Minimally you need to be familiar with the following 5 lines
read_file -f verilog [list "myfile.v"]
Read the verilog file myfile.v
set design_name mydesign
Synthesize the module mydesign and all modules it instantiates
set clock_name clock
The name of the clock signal
set CLK_PERIOD 6
Set the clock period to 6ns
set reset_name reset
The name of the reset signal
15. More Advanced Synthesis As designs get bigger you may want to break up the synthesis into multiple parts
In this case you may compile lower level modules separately and work your way up
Although not strictly necessary, you’ll need to do it for the multiplier
We’ll talk more about it for your final project
The lowest level will be just like this
Higher levels will include the lower levels output and the file to synthesize
Look at .tcl in project 2 to see how the higher level includes the lower level
You should familiarize yourself with the tcl files. If you would like to look at the documentation for VCS or Design Complier execute: sold
16. Synthesis Output xxxx_synth.out — The output that scrolls across the screen at high speed
<designname>.chk — The synthesis tool places warnings in here
<designname>.rep — Timing report
<designname>.vg — Structural verilog output
<designname>.db/xg — Compiled output for including in other designs
17. synth.out Prints all the lines in the tcl file as it executes them
If you have a problem with synthesis this is a good first place to look
*** Presto compilation terminated with 2 errors. ***
Also contains information about what flip-flops/latches it found
18. synth.out – Good output Inferred memory devices in process
in routine <design_name> line XXX in file
’<path to file>/<file>.v’.
===============================================================================
| Register Name | Type | Width | Bus | MB | AR | AS | SR | SS | ST |
===============================================================================
| state_reg | Flip-flop | 2 | Y | N | N | N | Y | N | N |
===============================================================================
All the Types are: Flip-flop
Every register we think we should have, should be listed along with the correct width
19. synth.out – Bad output Inferred memory devices in process
in routine <design_name> line XXX in file
’<path to file>/<file>.v’.
===========================================================================
| Register Name | Type | Width | Bus | MB | AR | AS | SR | SS | ST |
===========================================================================
| next_state_reg | Latch | 2 | Y | N | N | N | - | - | - |
===========================================================================
You should never see a Latch
It means you have some state in one of your combinational blocks
Gives you the line number to go find the error
20. <design name>.chk Prints warnings that may or may not be a problem
Good to look at and verify that you don’t have a problem
Warning: In design ’icache’, port ’proc2Icache_addr[0]’ is not connected to any nets. (LINT-28)
That is fine if you didn’t connect those bits to anything
Or they are always 0 because you can’t have an unaligned access
Will give you places to look if you have problems with your synthesized code
21. <design name>.rep Lists critical paths through your design
All slacks should be “MET”
If any are “VIOLATED” you have too aggressive of a clock period or a bad design
22. <design name>.rep startpoint: state_reg[1]
(rising edge-triggered flip-flop clocked by clock)
Endpoint: gnt_b (output port clocked by clock)
...
Point Fanout Trans Incr Path
---------------------------------------------------------------------
state_reg[1]/CLK (dffcs1) 0.00 0.00 0.00 r
state_reg[1]/QN (dffcs1) 0.15 0.16 0.16 f
n5 (net) 1 0.00 0.16 f
state_reg[1]/Q (dffcs1) 0.59 0.24 0.40 r
gnt_b (net) 2 0.00 0.40 r
gnt_b (out) 0.59 0.02 0.42 r
data arrival time 0.42
max_delay 6.00 6.00
clock uncertainty -0.10 5.90
output external delay -0.10 5.80
data required time 5.80
---------------------------------------------------------------------
data required time 5.80
data arrival time -0.42
---------------------------------------------------------------------
slack (MET) 5.38
23. <design name>.rep Trans – Time for a logic transition to occur
Incr – Time that is added to the critical path because of it
Path – Total Path so far
Slack needs to be positive: closer to 0 it is, closer you are to the clock period limit
Just because you have Xns of slack doesn’t mean that you can’t do better
If there is a lot of slack DC won’t try very hard
Closer to the limit you are the harder it will try (the longer it will take)
24. <design name>.vg module a1 ( clock, reset, req_a, gnt_a, req_b, gnt_b );
input clock, reset, req_a, req_b;
output gnt_a, gnt_b;
wire N19, N20, N21, n2, n3, n5;
wire [1:0] next_state;
hib1s1 U9 ( .Q(n2), .DIN(reset) );
dffcs2 \state_reg[0] ( .Q(gnt_a), .CLK(clock), .CLRB(next_state[0]), .DIN( n2) );
dffcs1 \state_reg[1] ( .Q(gnt_b), .QN(n5), .CLK(clock), .CLRB(next_state[.DIN(n3) );
and2s1 U10 ( .Q(N19), .DIN1(req_a), .DIN2(n5) );
nor3s1 U11 ( .Q(N21), .DIN1(N19), .DIN2(gnt_a), .DIN3(n3) );
ib1s1 U12 ( .Q(n3), .DIN(req_b) );
or4s1 U13 ( .Q(N20), .DIN1(gnt_a), .DIN2(gnt_b), .DIN3(req_a), DIN4(req_b));
endmodule
25. Multiplying by partial products Most hardware multipliers involve computing a number of partial products and then summing them
Very similar to how you learned to multiply in second grade
Do each bit at a time and then sum all the partial products to get your answer
26. Second Grade Way 000111 7
* 0101 * 5
000111
0000
0111
+ 0000 . .
100011 35
27. 2 bits at a time – partial products 000111 xx011100 << 2
* xx01 * 01xx >> 2
000111 011100
+ 000000 + 000000 .
000111 011100
28. 2 bits at a time – add products 000111
+ 011100
100011
29. 2-stage multiplication – 2 bits at a time 00001011 multicand: 00001011 (11)
* 00000011 multiplier: 00000111 (7)
partial product: 00000000
30. 2-stage multiplication – 2 bits at a time 00001011 multicand: 00001011
* 00000011 multiplier: 00000111
00100001 (33) + partial product: 00000000 (0)
31. 2-stage multiplication – 2 bits at a time 00001011 multicand: 00001011
* 00000011 multiplier: 00000111
00100001 partial product: 00100001 (33)
32. 2-stage multiplication – 2 bits at a time 00001011 multicand: 00001011 << 2
* 00000011 multiplier: 00000111 >> 2
00100001 partial product: 00100001
33. 2-stage multiplication – 2 bits at a time 00101100 multicand: 00101100 (44)
* 00000001 multiplier: 00000001 (1)
partial product: 00100001
34. 2-stage multiplication – 2 bits at a time 00101100 multicand: 00101100
* 00000001 multiplier: 00000001
00101100 (44) + partial product: 00100001 (33)
35. 2-stage multiplication – 2 bits at a time 00101100 multicand: 00101100
* 00000001 multiplier: 00000001
00101100 partial product: 01001101 (77)
36. Project 2 – Part 1 Supplied with a 8-stage multiplier
Make a 4-stage multiplier
Make a 2-stage multiplier
Synthesize each and answer some questions
Make sure you set an aggressive clock period
37. Part 1 – pipe_mult.v module mult(clock, reset, mplier, mcand, start, product, done);
input clock, reset, start;
input [63:0] mcand, mplier;
output [63:0] product;
output done;
wire [63:0] mcand_out, mplier_out;
wire [(7*64)-1:0] internal_products, internal_mcands, internal_mpliers;
wire [6:0] internal_dones;
mult_stage mstage [7:0]
(.clock(clock),
.reset(reset),
.product_in({internal_products,64’h0}),
.mplier_in({internal_mpliers,mplier}),
.mcand_in({internal_mcands,mcand}),
.start({internal_dones,start}),
.product_out({product,internal_products}),
.mplier_out({mplier_out,internal_mpliers}),
.mcand_out({mcand_out,internal_mcands}),
.done({done,internal_dones})
);
endmodule
38. Part 1 – mult_stage.v module mult_stage(clock, reset, product_in, mplier_in, mcand_in, start,
product_out, mplier_out, mcand_out, done);
....
reg [63:0] prod_in_reg, partial_prod_reg;
wire [63:0] partial_product, next_mplier, next_mcand;
assign product_out = prod_in_reg + partial_prod_reg;
assign partial_product = mplier_in[7:0] * mcand_in;
assign next_mplier = {8’b0,mplier_in[63:8]};
assign next_mcand = {mcand_in[55:0],8’b0};
always @(posedge clock)
begin
prod_in_reg <= #1 product_in;
partial_prod_reg <= #1 partial_product;
mplier_out <= #1 next_mplier;
mcand_out <= #1 next_mcand;
end
always @(posedge clock)
begin
if(reset)
done <= #1 1’b0;
else
done <= #1 start;
end
endmodule
39. Part 2 – Integer Square Root Conceptually it’s a loop
Propose highest bit of answer is set and square the proposed answer
If the result < value keep the bit set
Otherwise clear the bit
now try the next most significant bit
You won’t use a loop primitive to implement it though
40. Part 2 – ISR state machine Set the highest bit of the solution
Start a multiply
Wait until the multiply completes
Check the result against the value that you’re computing the ISR of
If less than keep the bit, greater than clear the bit
Start with the next most significant bit until you’ve tested all 32 bits
When done with all 32 bits raise the done signal for 1 cycle
If at any time you receive a reset signal start over
41. Part 2 – ISR state machine See Verilog Overview for example FSM
Look at first two_bit_pred module
ISR is more complicated, but general design of two_bit_pred is good
42. Predictor FSM snippet output prediction;
reg [1:0] state, next_state;
assign prediction = state[1];
always @* begin
case(state)
2’b00 : next_state = taken ? 2’b01 : 2’b00;
2’b01, 2’b10 : next_state = taken ? 2’b11 : 2’b00;
2’b11 : next_state = taken ? 2’b11 : 2’b10;
endcase
end
always @(posedge clock) begin
if(reset)
state <= #1 2’b01;
else if(transition)
state <= #1 next_state;
end
43. Part 2 – Warnings When you’re dealing with 64 bit numbers in verilog you need to specify them as 64’hXXXX or 64’dXXXXX
If you leave off the 64’ you won’t get the number you wanted
Default width is 32 bits
Pay attention to how the reset operates
If your device receives a reset during it’s calculation, it should start over with the new value
The reset causes the input value to be flopped (stored by the ISR module)
The value can change after the reset goes low
Your testbenches should also be testing for these conditions
Must not take more than 600 cycles to complete one ISR
Average is between 300-400 cycles
44. Part 2 – Simple Example Input 10101101 (173)
Proposed 0000 (0)
Proposed2 00000000 (0)
45. Part 2 – Simple Example Input 10101101 (173)
Proposed 1000 (8)
Proposed2 00000000 (0)
46. Part 2 – Simple Example Input 10101101 (173)
Proposed 1000 (8)
Proposed2 01000000 (64)
47. Part 2 – Simple Example Input 10101101 (173)
Proposed 1100 (12)
Proposed2 00000000 (0)
48. Part 2 – Simple Example Input 10101101 (173)
Proposed 1100 (12)
Proposed2 10010000 (144)
49. Part 2 – Simple Example Input 10101101 (173)
Proposed 1110 (14)
Proposed2 00000000 (0)
50. Part 2 – Simple Example Input 10101101 (173)
Proposed 1110 (14)
Proposed2 11000100 (196)
51. Part 2 – Simple Example Input 10101101 (173)
Proposed 1100 (12)
Proposed2 11000100 (196)
52. Part 2 – Simple Example Input 10101101 (173)
Proposed 1101 (13)
Proposed2 00000000 (0)
53. Part 2 – Simple Example Input 10101101 (173)
Proposed 1101 (13)
Proposed2 10101001 (169)
54. Part 2 – Simple Example Input 10101101 (173)
Proposed 1101 (13)
Proposed2 10101001 (169)
v173 = 13.153
55. Part 3 – Synthesize ISR Synthesize the ISR module you made in Part 2
Answer some more questions