Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Chapter 4 PowerPoint Presentation

Chapter 4

191 Views Download Presentation
Download Presentation

Chapter 4

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Chapter 4 The Processor

  2. Introduction §4.1 Introduction • CPU performance factors • Instruction count • Determined by ISA and compiler • CPI and Cycle time • Determined by CPU hardware • We will examine two MIPS implementations • A simplified version • A more realistic pipelined version • Simple subset, shows most aspects • Memory reference: lw, sw • Arithmetic/logical: add, sub, and, or, slt • Control transfer: beq, j Chapter 4 — The Processor — 2

  3. Instruction Execution • PC  instruction memory, fetch instruction • Register numbers  register file, read registers • Depending on instruction class • Use ALU to calculate • Arithmetic/logical result • Memory address for load/store • Branch target address • Access data memory for load/store • PC  target address or PC + 4 Chapter 4 — The Processor — 3

  4. CPU Overview Chapter 4 — The Processor — 4

  5. Multiplexers • Can’t just join wires together • Use multiplexers Chapter 4 — The Processor — 5

  6. Control Chapter 4 — The Processor — 6

  7. Logic Design Basics • Information encoded in binary • Low voltage = 0, High voltage = 1 • One wire per bit • Multi-bit data encoded on multi-wire buses • Combinational element • Operate on data • Output is a function of input • State (sequential) elements • Store information §4.2 Logic Design Conventions Chapter 4 — The Processor — 7

  8. A Y + B A Y B A Mux I0 Y ALU Y I1 B F S Combinational Elements • AND-gate • Y = A & B • Adder • Y = A + B • Arithmetic/Logic Unit • Y = F(A, B) • Multiplexer • Y = S ? I1 : I0 Chapter 4 — The Processor — 8

  9. falling edge cycle time rising edge State Elements • Unclocked vs. Clocked • Clocks used in synchronous logic • when should an element that contains state be updated?

  10. An unclocked state element • The set-reset latch • output depends on present inputs and also on past inputs Qn+1 Qn+1 R S State element 0 Qn 0 Qn Set 1 0 1 0 Reset 0 1 1 0 initialization * * 0 1 1 0 Don’t use this mode!!!

  11. Latches and Flip-flops • Output is equal to the stored value inside the element (don't need to ask for permission to look at the value) • Change of state (value) is based on the clock • Latches: whenever the inputs change, and the clock is asserted (level-triggered methodology) • Flip-flop: state changes only on a clock edge(edge-triggered methodology) "logically true", ?could mean electrically low A clocking methodology defines when signals can be read and written You wouldn't want to read a signal at the same time it was being written

  12. D-latch (Transparent Latch) • Two inputs: • the data value to be stored (D) • the clock signal (C) indicating when to read & store D • Two outputs: • the value of the internal state (Q) and it's complement Propagation delay

  13. D flip-flop (1-bit register) • Output changes only on the clock edge • Negative (falling) edge in this example

  14. Comparison between D latch and D flip-flop • A timing diagram for normal operation of a D latch and a D flip-flop (positive edge triggered): • Latch : clock = 1 propagate、open • Flip-flop : clock: 0 → 1 (rising edge) catch input signal

  15. Clk D Q D Clk Q Sequential Elements • Register: stores data in a circuit • Uses a clock signal to determine when to update the stored value • Edge-triggered: update when Clk changes from 0 to 1 (positive edge); or from 1 to 0 (negative edge) Chapter 4 — The Processor — 15

  16. Clk Write D Q Write D Clk Q Sequential Elements • Register with write control • Only updates on clock edge when write control input is 1 • Used when stored value is required several cycles later Chapter 4 — The Processor — 16

  17. Clocking Methodology • Combinational logic transforms data during clock cycles • Between clock edges • Input from state elements, output to state element • Longest delay determines clock period Chapter 4 — The Processor — 17

  18. Register File (read operation) • Built using D flip-flops

  19. Register File (write operation) • Note: we still use the real clock to determine when to write

  20. Simple Implementation • Include the functional units we need for each instruction Why do we need this stuff?

  21. Building a Datapath • Datapath • Elements that process data and addressesin the CPU • Registers, ALUs, mux’s, memories, … • Structure of the datapath • similar to a flow chart • We will build a MIPS datapath incrementally • Refining the overview design §4.3 Building a Datapath Chapter 4 — The Processor — 21

  22. op rs rt rd shamt funct op rs rt 16 bit address op 26 bit address So far: • InstructionMeaning (Register Transfer Language, RTL)add $s1,$s2,$s3 $s1 = $s2 + $s3sub $s1,$s2,$s3 $s1 = $s2 – $s3lw $s1,100($s2) $s1 = Memory[$s2+100] sw $s1,100($s2) Memory[$s2+100] = $s1bne $s4,$s5,L Next instr. is at Label if $s4 ≠ $s5beq $s4,$s5,L Next instr. is at Label if $s4 = $s5j Label Next instr. is at Label • Formats: R I J

  23. 0 rs rt rd shamt funct R-type 31:26 25:21 20:16 15:11 10:6 5:0 35 or 43 rs rt address 31:26 25:21 20:16 15:0 4 rs rt address 31:26 25:21 20:16 15:0 Information in Instruction Formats Load/Store Branch opcode always read read, except for load write for R-type and load sign-extend and add Chapter 4 — The Processor — 23

  24. Instruction Fetch Increment by 4 for next instruction 32-bit register Chapter 4 — The Processor — 24

  25. R-Format Instructions • Read two register operands • Perform arithmetic/logical operation • Write register result Chapter 4 — The Processor — 25

  26. Load/Store Instructions • Read register operands • Calculate address using 16-bit offset • Use ALU, but sign-extend offset • Load: Read memory and update register • Store: Write register value to memory Chapter 4 — The Processor — 26

  27. Branch Instructions • Read register operands • Compare operands • Use ALU, subtract and check Zero output • Calculate target address • Sign-extend displacement • Shift left 2 places (word displacement) • Add to PC + 4 • Already calculated by instruction fetch Chapter 4 — The Processor — 27

  28. Branch Instructions Justre-routes wires Sign-bit wire replicated Chapter 4 — The Processor — 28

  29. Composing the Elements • First-cut data path does an instruction in one clock cycle • Each datapath element can only do one function at a time • Hence, we need separate instruction and data memories • Use multiplexers where alternate data sources are used for different instructions Chapter 4 — The Processor — 29

  30. Registeraccess R3<-R1 + R2; R3<-R1 - R2 Arithmeticoperations Figure5.7Data path for the R-type instructions Store M[R1+Immed]<-R2 R2<-M[R1+Immed] Store path Load path Figure5.9 Data path for load/store

  31. R-Type/Load/Store Datapath Chapter 4 — The Processor — 32

  32. PC increment branch BranchTargetAddress Major system state mux : multiple sources Fan out : multiple destinations Register reads Parallelism : speculative execution PC values load / store Data Path for ALU instructions branch ……single cycle implementation PC++ Figure5.14

  33. Full Datapath Chapter 4 — The Processor — 34

  34. 32-bit ALU (review)

  35. ALU Control Signals (Appendix C) • ALU used for • Load/Store: F = add • Branch: F = subtract • R-type: F depends on funct field §4.4 A Simple Implementation Scheme Chapter 4 — The Processor — 36

  36. opcode ALUOp Operation funct ALU function ALU control lw 00 load word XXXXXX add 0010 sw 00 store word XXXXXX add 0010 beq 01 branch equal XXXXXX subtract 0110 R-type 10 add 100000 add 0010 subtract 100010 subtract 0110 AND 100100 AND 0000 OR 100101 OR 0001 set-on-less-than 101010 set-on-less-than 0111 ALU Control • Use ALUOp to classify instructions • LW/SW/BEQ: ALU function depends solely on opcode field • R-type: ALU function depends on both the opcode and funct fields. Chapter 4 — The Processor — 37

  37. 0 rs rt rd shamt funct 31:26 25:21 20:16 15:11 10:6 5:0 35 or 43 rs rt address 31:26 25:21 20:16 15:0 4 rs rt address 31:26 25:21 20:16 15:0 The Main Control Unit • Control signals derived from instruction R-type Load/Store Branch opcode always read read, except for load write for R-type and load sign-extend and add Chapter 4 — The Processor — 38

  38. Datapath With Control Chapter 4 — The Processor — 39

  39. R-Type Instruction Chapter 4 — The Processor — 40

  40. Load Instruction Chapter 4 — The Processor — 41

  41. Branch-on-Equal Instruction Chapter 4 — The Processor — 42

  42. 2 address 31:26 25:0 Implementing Jumps Jump • Jump uses word address • Update PC with concatenation of • Top 4 bits of old PC • 26-bit jump address • 00 • Need an extra control signal decoded from opcode Chapter 4 — The Processor — 43

  43. Datapath With Jumps Added Chapter 4 — The Processor — 44

  44. Control

  45. Single Cycle Implementation • Calculate cycle time assuming negligible delays except: • memory (2ns), ALU and adders (2ns), register file access (1ns) • Find the critical path…R-type: 5ns; LW: 7ns; SW:5ns; BEQ: 5ns,…

  46. The single cycle datapath of an add instruction 100 add rd, rs, rt 0 rs rt rd 0 0X20 end • memory (2ns), ALU and adders (2ns), register file access (1ns) rs rt rd start end

  47. 0x23 rs rt Offset The single cycle datapath of load instruction 104 lw $rt, $rs, offset rs rt Offset

  48. Performance Issues • Longest delay determines clock period • Critical path: load instruction • Instruction memory  register file  ALU  data memory  register file • Not feasible to vary period for different instructions • Violates design principle • Making the common case fast • We will improve performance by pipelining Chapter 4 — The Processor — 49

  49. Pipelining Analogy • Pipelined laundry: overlapping execution • Parallelism improves performance §4.5 An Overview of Pipelining • Four loads: • Speedup= 8hr/3.5hr = 2.3 • Non-stop: • Speedup= 2n/(0.5n + 1.5) ≈ 4= number of stages Chapter 4 — The Processor — 50

  50. MIPS Pipeline • Five stages, one step per stage • IF: Instruction fetch from memory • ID: Instruction decode & register read • EX: Execute operation or calculate address • MEM: Access memory operand • WB: Write result back to register Chapter 4 — The Processor — 51