1 / 54

EECS 470: Computer Architecture

blue
Download Presentation

EECS 470: Computer Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. EECS 470: Computer Architecture Discussion #2 Friday, Sept. 17, 2010

    2. Administrative Homework 1 due right now Project 1 due Monday night (9/20) Directions to submit in handout

    3. Simple Example

    4. Simple Example - Diagram

    5. Questions Any questions about project 1?

    6. Array Connections Make a simple module and duplicate it a bunch Assume we have a module definition: one_bit_addr(a,b,cin,sum,cout); All ports are 1 bit, first three input, last two output How do we build an eight bit adder?

    7. The Error Prone Way module eight_bit_addr(a,b,cin,sum,cout); input [7:0] a,b; input cin; output [7:0] sum; output cout; wire [6:0] carries; one_bit_addr a0(a[0],b[0],cin,sum[0], carries[0]); one_bit_addr a1(a[1],b[1],carries[0],sum[1], carries[1]); one_bit_addr a2(a[2],b[2],carries[1],sum[2], carries[2]); one_bit_addr a3(a[3],b[3],carries[2],sum[3], carries[3]); one_bit_addr a4(a[4],b[4],carries[3],sum[4], carries[4]); one_bit_addr a5(a[5],b[5],carries[4],sum[5], carries[5]); one_bit_addr a6(a[6],b[6],carries[5],sum[6], carries[6]); one_bit_addr a7(a[7],b[7],carries[6],sum[7], cout); endmodule

    8. The Error Prone Way Continued Lots of duplicated code If you missed replacing one number it’s hard to find Especially if it was much bigger, and had even more connections Your tests might not catch the case There is a one line substitute

    9. The Better Way module eight_bit_addr(a,b,cin,sum,cout); input [7:0] a,b; input cin; output [7:0] sum; output cout; wire [6:0] carries; one_bit_addr addr [7:0] (.a(a),.b(b),.cin({carries,cin}),.sum(sum),.cout({cout,carries})); Since the one_bit_addr ports are all 1 bit, we are instantiating 8 of them, and the eight_bit_addr ports are 8 bits, each one bit port will get one bit from the 8 bit value.

    10. Array Connections Summary If the port width matches the wire width, then each module in the array is connected to the same wire Example: a clock signal attached to an array of modules If the port is 1/N as wide as the wire (N=array size), then each module is connected to a chunk of the wire Example: The 8-bit inputs from the previous slide connected to array of modules with 1-bit inputs. Note the concatenation operator in the previous example It’s making the carries width correct and taking care of the boundary conditions

    11. Synthesis Translate verilog to gates Optimize translation to meet certain constraints Extremely complex process If you follow all the directions we’ve given you everything will probably work I’m not guaranteeing it though All your designs will need to synthesize That way you’ll know you’re not doing anything that would be hard to implement in gates Clock period reported by synthesis tool isn’t truly accurate No global placement and routing We fake the capacitance of wires

    12. Hints to Synthesis Tool //synopsys sync_set_reset "<signal>" Goes right before a synchronous always block Tells Design Compiler that the <signal> is a synchronous reset Helps the synthesis tool choose a synchronous reset //synopsys parallel_case Placed before a case statement Only one branch of a case can be true at a time //synopsys full_case Placed before a case statement Any unspecified cases are invalid You can also put a default: in the case for good measure //synopsys one_hot "<signal>" Placed after signal declared Only one signal of the group will be 1 at a given time

    13. Synthesis Scripts #/***********************************************************/ #/* The following five lines must be updated for every */ #/* new design */ #/***********************************************************/ read_file -f verilog [list "inout.v"] set design_name tinout set clock_name clock set CLK_PERIOD 6 set reset_name reset #/***********************************************************/ #/* The rest of this file may be left alone for most small */ #/* to moderate sized designs. You may need to alter it */ #/* when synthesizing your final project. */ #/***********************************************************/ set SYN_DIR ./ set search_path "/afs/engin.umich.edu/caen/generic/mentor_lib-D.1/public/eecs470/synopsys/" set target_library "lec25dscc25_TT.db“ …

    14. Synthesis Script A bunch of directives to tell Design Compiler what to do Minimally you need to be familiar with the following 5 lines read_file -f verilog [list "myfile.v"] Read the verilog file myfile.v set design_name mydesign Synthesize the module mydesign and all modules it instantiates set clock_name clock The name of the clock signal set CLK_PERIOD 6 Set the clock period to 6ns set reset_name reset The name of the reset signal

    15. More Advanced Synthesis As designs get bigger you may want to break up the synthesis into multiple parts In this case you may compile lower level modules separately and work your way up Although not strictly necessary, you’ll need to do it for the multiplier We’ll talk more about it for your final project The lowest level will be just like this Higher levels will include the lower levels output and the file to synthesize Look at .tcl in project 2 to see how the higher level includes the lower level You should familiarize yourself with the tcl files. If you would like to look at the documentation for VCS or Design Complier execute: sold

    16. Synthesis Output xxxx_synth.out — The output that scrolls across the screen at high speed <designname>.chk — The synthesis tool places warnings in here <designname>.rep — Timing report <designname>.vg — Structural verilog output <designname>.db/xg — Compiled output for including in other designs

    17. synth.out Prints all the lines in the tcl file as it executes them If you have a problem with synthesis this is a good first place to look *** Presto compilation terminated with 2 errors. *** Also contains information about what flip-flops/latches it found

    18. synth.out – Good output Inferred memory devices in process in routine <design_name> line XXX in file ’<path to file>/<file>.v’. =============================================================================== | Register Name | Type | Width | Bus | MB | AR | AS | SR | SS | ST | =============================================================================== | state_reg | Flip-flop | 2 | Y | N | N | N | Y | N | N | =============================================================================== All the Types are: Flip-flop Every register we think we should have, should be listed along with the correct width

    19. synth.out – Bad output Inferred memory devices in process in routine <design_name> line XXX in file ’<path to file>/<file>.v’. =========================================================================== | Register Name | Type | Width | Bus | MB | AR | AS | SR | SS | ST | =========================================================================== | next_state_reg | Latch | 2 | Y | N | N | N | - | - | - | =========================================================================== You should never see a Latch It means you have some state in one of your combinational blocks Gives you the line number to go find the error

    20. <design name>.chk Prints warnings that may or may not be a problem Good to look at and verify that you don’t have a problem Warning: In design ’icache’, port ’proc2Icache_addr[0]’ is not connected to any nets. (LINT-28) That is fine if you didn’t connect those bits to anything Or they are always 0 because you can’t have an unaligned access Will give you places to look if you have problems with your synthesized code

    21. <design name>.rep Lists critical paths through your design All slacks should be “MET” If any are “VIOLATED” you have too aggressive of a clock period or a bad design

    22. <design name>.rep startpoint: state_reg[1] (rising edge-triggered flip-flop clocked by clock) Endpoint: gnt_b (output port clocked by clock) ... Point Fanout Trans Incr Path --------------------------------------------------------------------- state_reg[1]/CLK (dffcs1) 0.00 0.00 0.00 r state_reg[1]/QN (dffcs1) 0.15 0.16 0.16 f n5 (net) 1 0.00 0.16 f state_reg[1]/Q (dffcs1) 0.59 0.24 0.40 r gnt_b (net) 2 0.00 0.40 r gnt_b (out) 0.59 0.02 0.42 r data arrival time 0.42 max_delay 6.00 6.00 clock uncertainty -0.10 5.90 output external delay -0.10 5.80 data required time 5.80 --------------------------------------------------------------------- data required time 5.80 data arrival time -0.42 --------------------------------------------------------------------- slack (MET) 5.38

    23. <design name>.rep Trans – Time for a logic transition to occur Incr – Time that is added to the critical path because of it Path – Total Path so far Slack needs to be positive: closer to 0 it is, closer you are to the clock period limit Just because you have Xns of slack doesn’t mean that you can’t do better If there is a lot of slack DC won’t try very hard Closer to the limit you are the harder it will try (the longer it will take)

    24. <design name>.vg module a1 ( clock, reset, req_a, gnt_a, req_b, gnt_b ); input clock, reset, req_a, req_b; output gnt_a, gnt_b; wire N19, N20, N21, n2, n3, n5; wire [1:0] next_state; hib1s1 U9 ( .Q(n2), .DIN(reset) ); dffcs2 \state_reg[0] ( .Q(gnt_a), .CLK(clock), .CLRB(next_state[0]), .DIN( n2) ); dffcs1 \state_reg[1] ( .Q(gnt_b), .QN(n5), .CLK(clock), .CLRB(next_state[.DIN(n3) ); and2s1 U10 ( .Q(N19), .DIN1(req_a), .DIN2(n5) ); nor3s1 U11 ( .Q(N21), .DIN1(N19), .DIN2(gnt_a), .DIN3(n3) ); ib1s1 U12 ( .Q(n3), .DIN(req_b) ); or4s1 U13 ( .Q(N20), .DIN1(gnt_a), .DIN2(gnt_b), .DIN3(req_a), DIN4(req_b)); endmodule

    25. Multiplying by partial products Most hardware multipliers involve computing a number of partial products and then summing them Very similar to how you learned to multiply in second grade Do each bit at a time and then sum all the partial products to get your answer

    26. Second Grade Way 000111 7 * 0101 * 5 000111 0000 0111 + 0000 . . 100011 35

    27. 2 bits at a time – partial products 000111 xx011100 << 2 * xx01 * 01xx >> 2 000111 011100 + 000000 + 000000 . 000111 011100

    28. 2 bits at a time – add products 000111 + 011100 100011

    29. 2-stage multiplication – 2 bits at a time 00001011 multicand: 00001011 (11) * 00000011 multiplier: 00000111 (7) partial product: 00000000

    30. 2-stage multiplication – 2 bits at a time 00001011 multicand: 00001011 * 00000011 multiplier: 00000111 00100001 (33) + partial product: 00000000 (0)

    31. 2-stage multiplication – 2 bits at a time 00001011 multicand: 00001011 * 00000011 multiplier: 00000111 00100001 partial product: 00100001 (33)

    32. 2-stage multiplication – 2 bits at a time 00001011 multicand: 00001011 << 2 * 00000011 multiplier: 00000111 >> 2 00100001 partial product: 00100001

    33. 2-stage multiplication – 2 bits at a time 00101100 multicand: 00101100 (44) * 00000001 multiplier: 00000001 (1) partial product: 00100001

    34. 2-stage multiplication – 2 bits at a time 00101100 multicand: 00101100 * 00000001 multiplier: 00000001 00101100 (44) + partial product: 00100001 (33)

    35. 2-stage multiplication – 2 bits at a time 00101100 multicand: 00101100 * 00000001 multiplier: 00000001 00101100 partial product: 01001101 (77)

    36. Project 2 – Part 1 Supplied with a 8-stage multiplier Make a 4-stage multiplier Make a 2-stage multiplier Synthesize each and answer some questions Make sure you set an aggressive clock period

    37. Part 1 – pipe_mult.v module mult(clock, reset, mplier, mcand, start, product, done); input clock, reset, start; input [63:0] mcand, mplier; output [63:0] product; output done; wire [63:0] mcand_out, mplier_out; wire [(7*64)-1:0] internal_products, internal_mcands, internal_mpliers; wire [6:0] internal_dones; mult_stage mstage [7:0] (.clock(clock), .reset(reset), .product_in({internal_products,64’h0}), .mplier_in({internal_mpliers,mplier}), .mcand_in({internal_mcands,mcand}), .start({internal_dones,start}), .product_out({product,internal_products}), .mplier_out({mplier_out,internal_mpliers}), .mcand_out({mcand_out,internal_mcands}), .done({done,internal_dones}) ); endmodule

    38. Part 1 – mult_stage.v module mult_stage(clock, reset, product_in, mplier_in, mcand_in, start, product_out, mplier_out, mcand_out, done); .... reg [63:0] prod_in_reg, partial_prod_reg; wire [63:0] partial_product, next_mplier, next_mcand; assign product_out = prod_in_reg + partial_prod_reg; assign partial_product = mplier_in[7:0] * mcand_in; assign next_mplier = {8’b0,mplier_in[63:8]}; assign next_mcand = {mcand_in[55:0],8’b0}; always @(posedge clock) begin prod_in_reg <= #1 product_in; partial_prod_reg <= #1 partial_product; mplier_out <= #1 next_mplier; mcand_out <= #1 next_mcand; end always @(posedge clock) begin if(reset) done <= #1 1’b0; else done <= #1 start; end endmodule

    39. Part 2 – Integer Square Root Conceptually it’s a loop Propose highest bit of answer is set and square the proposed answer If the result < value keep the bit set Otherwise clear the bit now try the next most significant bit You won’t use a loop primitive to implement it though

    40. Part 2 – ISR state machine Set the highest bit of the solution Start a multiply Wait until the multiply completes Check the result against the value that you’re computing the ISR of If less than keep the bit, greater than clear the bit Start with the next most significant bit until you’ve tested all 32 bits When done with all 32 bits raise the done signal for 1 cycle If at any time you receive a reset signal start over

    41. Part 2 – ISR state machine See Verilog Overview for example FSM Look at first two_bit_pred module ISR is more complicated, but general design of two_bit_pred is good

    42. Predictor FSM snippet output prediction; reg [1:0] state, next_state; assign prediction = state[1]; always @* begin case(state) 2’b00 : next_state = taken ? 2’b01 : 2’b00; 2’b01, 2’b10 : next_state = taken ? 2’b11 : 2’b00; 2’b11 : next_state = taken ? 2’b11 : 2’b10; endcase end always @(posedge clock) begin if(reset) state <= #1 2’b01; else if(transition) state <= #1 next_state; end

    43. Part 2 – Warnings When you’re dealing with 64 bit numbers in verilog you need to specify them as 64’hXXXX or 64’dXXXXX If you leave off the 64’ you won’t get the number you wanted Default width is 32 bits Pay attention to how the reset operates If your device receives a reset during it’s calculation, it should start over with the new value The reset causes the input value to be flopped (stored by the ISR module) The value can change after the reset goes low Your testbenches should also be testing for these conditions Must not take more than 600 cycles to complete one ISR Average is between 300-400 cycles

    44. Part 2 – Simple Example Input 10101101 (173) Proposed 0000 (0) Proposed2 00000000 (0)

    45. Part 2 – Simple Example Input 10101101 (173) Proposed 1000 (8) Proposed2 00000000 (0)

    46. Part 2 – Simple Example Input 10101101 (173) Proposed 1000 (8) Proposed2 01000000 (64)

    47. Part 2 – Simple Example Input 10101101 (173) Proposed 1100 (12) Proposed2 00000000 (0)

    48. Part 2 – Simple Example Input 10101101 (173) Proposed 1100 (12) Proposed2 10010000 (144)

    49. Part 2 – Simple Example Input 10101101 (173) Proposed 1110 (14) Proposed2 00000000 (0)

    50. Part 2 – Simple Example Input 10101101 (173) Proposed 1110 (14) Proposed2 11000100 (196)

    51. Part 2 – Simple Example Input 10101101 (173) Proposed 1100 (12) Proposed2 11000100 (196)

    52. Part 2 – Simple Example Input 10101101 (173) Proposed 1101 (13) Proposed2 00000000 (0)

    53. Part 2 – Simple Example Input 10101101 (173) Proposed 1101 (13) Proposed2 10101001 (169)

    54. Part 2 – Simple Example Input 10101101 (173) Proposed 1101 (13) Proposed2 10101001 (169) v173 = 13.153

    55. Part 3 – Synthesize ISR Synthesize the ISR module you made in Part 2 Answer some more questions

More Related