1 / 37

October 3rd, 2007 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f07/

CS-447– Computer Architecture M,W 10-11:20am Lecture 11 Single Cycle Datapath. October 3rd, 2007 Majd F. Sakr msakr@qatar.cmu.edu www.qatar.cmu.edu/~msakr/15447-f07/. Lecture Objectives. Learn what a datapath is, and how does it provide the required functions.

kaygreen
Download Presentation

October 3rd, 2007 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f07/

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS-447– Computer Architecture M,W 10-11:20amLecture 11Single Cycle Datapath October 3rd, 2007 Majd F. Sakrmsakr@qatar.cmu.edu www.qatar.cmu.edu/~msakr/15447-f07/

  2. Lecture Objectives • Learn what a datapath is, and how does it provide the required functions. • Appreciate why different implementation strategies affects the clock rate and CPI of a machine. • Understand how the ISA determines many aspects of the hardware implementation.

  3. Implementation vs. Performance Performance of a processor is determined by • Instruction count of a program • CPI • Clock cycle time (clock rate) The compiler & the ISA determine the instruction count. The implementation of the processor determines the CPI and the clock cycle time.

  4. Possible Execution Steps of Any Instructions • Instruction Fetch • Instruction Decode and Register Fetch • Execution of the Memory Reference Instruction • Execution of Arithmetic-Logical operations • Branch Instruction • Jump Instruction

  5. Instruction Processing • Five steps: • Instruction fetch (IF) • Instruction decode and operand fetch (ID) • ALU/execute (EX) • Memory (not required) (MEM) • Write-back (WB) WB IF EX ID MEM

  6. Datapath & Control Control

  7. Datapath Elements The data path contains 2 types of logic elements: • Combinational: (e.g. ALU) Elements that operate on data values. Their outputs depend on their inputs. • State: (e.g. Registers & Memory) Elements with internal storage. Their state is defined by the values they contain.

  8. State Elements

  9. Pentium Processor Die • State • Registers • Memory • Control ROM • Combinational logic (Compute) REG

  10. Abstract View of the Datapath

  11. Single Cycle Implementation • This simple processor can compute ALU instructions, access memory or compute the next instruction's address in a single cycle.

  12. Program Counter If each instruction needs 4 memory locations then, Next PC <= PC + 4

  13. PC Datapath – Branch Offset PC <= PC + Branch Offset

  14. Abstract View After PC Basic Implementation

  15. The Register File • Arithmetic & Logical instructions (R-type), read the contents of 2 registers, perform an ALU operation, and write the result back to a register. • Registers are stored in the register file. The register file has inputs to specify the registers, outputs for the data read, input for the data written and 1 control signal to decide if data should be written in. In addition we will need an ALU to perform the operations.

  16. The Register File

  17. R-Type Instructions • Assembly (e.g., register-register signed addition) • ADD rdreg rsreg rtreg • Machine encoding • Semantics • if MEM[PC] == ADD rd rs rt • GPR[rd] ← GPR[rs] + GPR[rt] • PC ← PC + 4

  18. ADD rd rs rt

  19. Datapath for Add

  20. I-Type ALU Instructions • Assembly (e.g., register-immediate signed additions)ADDI rtreg rsreg immediate16 • Machine encoding • Semantics if MEM[PC] == ADDI rt rs immediate GPR[rt] ← GPR[rs] + sign-extend (immediate) PC ← PC + 4

  21. ADDI rtreg rsreg immediate16

  22. Datapath for R and I-Type ALU Instructions

  23. Data Memory • The element needed to implement load and store instructions are data memory. In addition we use the existing ALU to compute the address to access. • The data memory has 2 x-bit inputs: the address and the write data, and 1 x-output: the read data. In addition it has 2 control lines: MemWrite and MemRead.

  24. Data Memory

  25. Load Instruction • Assembly (e.g., load 4-byte word) LW rtreg offset16 (basereg) • Machine encoding • Semantics if MEM[PC]==LW rt offset16 (base) EA = sign-extend(offset) + GPR[base] GPR[rt] ← MEM[ translate(EA) ] PC ← PC + 4

  26. LW Datapath

  27. Branch Equal • The beq (branch if equal) instruction has 3 operands two registers that are compared for equality and a n-bit offset used to compute the branch address relative to the PC.

  28. Branch Equal

  29. Unconditional Jump • Assembly J immediate26 • Machine encoding • Semantics if MEM[PC]==J immediate26 target = { PC[31:28], immediate26, 2’b00 } PC ← target

  30. Unconditional Jump Datapath

  31. Combining ALU and Memory Instructions • The ALU datapath and the Memory datapath are similar. The differences are: • The second input to the ALU is a register (R-type) or the offset (I-type). • The value stored into the destination register comes from the ALU (R-type) or from memory (I-type) . • Using 2 multiplexers (Mux) we can combine both datapaths.

  32. Combining ALU and Memory Instructions

  33. The Complete Datapath

  34. Complete Datapath

  35. What’s Wrong with Single Cycle? • All instructions run at the speed of the slowest instruction. • Adding a long instruction can hurt performance • What if you wanted to include multiply? • You cannot reuse any parts of the processor • We have 3 different adders to calculate PC+1, PC+1+offset and the ALU • No profit in making the common case fast • Since every instruction runs at the slowest instruction speed • This is particularly important for loads as we will see later

  36. What’s Wrong with Single Cycle? 1 ns – Register read/write time 2 ns – ALU/adder 2 ns – memory access 0 ns – MUX, PC access, sign extend, ROM add: 2ns + 1ns + 2ns + 1ns = 6 ns beq: 2ns + 1ns + 2ns = 5 ns sw: 2ns + 1ns + 2ns + 2ns = 7 ns lw: 2ns + 1ns + 2ns + 2ns + 1ns = 8 ns Get read ALU mem write Instr reg operation reg

  37. Computing Execution Time Assume: 100 instructions executed 25% of instructions are loads, 10% of instructions are stores, 45% of instructions are adds, and 20% of instructions are branches. Single-cycle execution: 100 * 8ns = 800ns Optimal execution: 25*8ns + 10*7ns + 45*6ns + 20*5ns = 640ns

More Related