1 / 119

Chapter 4

Chapter 4. The Processor. A Basic MIPS Implementation. The memory-reference instructions load word ( lw ) and store word ( sw ) • The arithmetic-logical instructions add, sub, AND, OR, and slt • The instructions branch equal ( beq ) and jump (j). Memory. PC. Completing the action.

bart
Download Presentation

Chapter 4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 4 The Processor

  2. A Basic MIPS Implementation • The memory-reference instructions • load word (lw) and store word (sw) • • The arithmetic-logical instructions • add, sub, AND, OR, and slt • • The instructions branch • equal (beq) and jump (j) Memory PC Completing the action Read registers

  3. CPU Overview Chapter 4 — The Processor — 3

  4. Multiplexers • Can’t just join wires together • Use multiplexers Chapter 4 — The Processor — 4

  5. An overview of implementation

  6. An overview of implementation

  7. Logic Design Conventions • Datapath elements • Combinational elements • Outputs depend only on Inputs • State elements (sequential) • Tow input • One output • Signal • asserted • deasserted

  8. A Y B A A Mux I0 Y + Y Y I1 ALU B B S F Combinational Elements • AND-gate • Y = A & B • Adder • Y = A + B • Arithmetic/Logic Unit • Y = F(A, B) • Multiplexer • Y = S ? I1 : I0 Chapter 4 — The Processor — 8

  9. D Q Clk Clk D Q Sequential Elements • Register: stores data in a circuit • Uses a clock signal to determine when to update the stored value • Edge-triggered: update when Clk changes from 0 to 1 Chapter 4 — The Processor — 9

  10. Clk D Q Write Write D Clk Q Sequential Elements • Register with write control • Only updates on clock edge when write control input is 1 • Used when stored value is required later Chapter 4 — The Processor — 10

  11. Logic Design Conventions • Clocking methodology • edge triggered clocking

  12. Building datapath • What is the major components? • PC • need to be increment by 4 • Instructions memory • Program counter register

  13. Building datapath • Register file (R format) • Read two of them • Two addresses • Two outputs • Write one of them • One Address • One input data • Write control signal

  14. Building datapath • Data memory • Example: lw $t1,offset_value($t2)sw $t1,offset_value($t2) • Read from -- save to • (ALU & register bank) • sign – extend • data memory (read & write) • Address – read data - write data – control signal

  15. Building datapath • Example: beq $t1,$t2,offset • branch target address • PC+4 • Offset field ->shifted by 2 • Sign extended • branch taken or not Jump instruction 28 bit of pc replased with 26 bit

  16. R-Type/Load/Store Datapath Chapter 4 — The Processor — 16

  17. Building datapath

  18. A simple Implementation Scheme • For load and store : add • For R-type instructions: AND,OR,subt,add,slt • For branch equal: subt • ALUOp

  19. A simple Implementation Scheme

  20. A simple Implementation Scheme

  21. A simple Implementation Scheme

  22. Designing the Main Control Unit

  23. Designing the Main Control Unit 4 2 5 3 7 1 6

  24. Designing the Main Control Unit

  25. The setting control lines

  26. Operation of the Datapath (R-Format) Add $t1,$t2,$t3

  27. Operation of the Datapath (load) lw $t1, offset($t2)

  28. Operation of the Datapath (branching) beq $t1, $t2, offset

  29. Finalizing control

  30. Finalizing control

  31. Implementing Jumps

  32. Why a single Cycle Implementation is not uses today

  33. Performance Issues • Longest delay determines clock period • Critical path: load instruction • Instruction memory  register file  ALU  data memory  register file • Not feasible to vary period for different instructions • Violates design principle • Making the common case fast • We will improve performance by pipelining Chapter 4 — The Processor — 34

  34. An Overview of Pipelining • Laundry • 1. place one dirty load of clothes in the washer. • 2. place the wet load in the dryer. • 3. place the dry load on a table and fold. • 4. ask your roommate to put the clothes away.

  35. An Overview of Pipelining • Laundry • 1. place one dirty load of clothes in the washer. • 2. place the wet load in the dryer. • 3. place the dry load on a table and fold. • 4. ask your roommate to put the clothes away.

  36. datapath

  37. An Overview of Pipelining • Pipelining improves throughput of our laundry system • the speed-up due to pipelining is equal to the number of stages in the pipeline (if) • Five stages, one step per stage • IF: Instruction fetch from memory • ID: Instruction decode & register read • EX: Execute operation or calculate address • MEM: Access memory operand • WB: Write result back to register

  38. Pipeline Performance • Assume time for stages is • 100ps for register read or write • 200ps for other stages • Compare pipelined datapath with single-cycle datapath Chapter 4 — The Processor — 39

  39. Pipeline Performance Single-cycle (Tc= 800ps) Pipelined (Tc= 200ps) Chapter 4 — The Processor — 40

  40. An Overview of Pipelining • For last exampe • Nonpiplined take 2400ps • Piplined take 1400ps • For more instrauction (1,000,003) • Nonpiplined take 1,000,000*800+2400= 800,002,400ps • Piplined take 1,000,000*200+1400ps= 200,001,400ps

  41. An Overview of Pipelining • Pipelining improves performance by increasing instruction throughput, • as opposed to decreasing the execution time of an individual instruction, • instruction throughput is the important metric because real programs execute billions of instructions.

  42. Designing Instruction Sets for Pipelining • In MIPS, All instructions are the same length. (VS. x86) • It makes easier to fetch and to decode • All MIPS instructions are symmetry • Register fields being located in the same place. • Memory Operand only appear in load and store • Using the execution and memory access in the same time • Operands must be aligned in memory • In one data memory accesses data be transferred

  43. Structure Hazards • Conflict for use of a resource • In MIPS pipeline with a single memory • Load/store requires data access • Instruction fetch would have to stall for that cycle • Would cause a pipeline “bubble” • Hence, pipelined datapaths require separate instruction/data memories • Or separate instruction/data caches Chapter 4 — The Processor — 44

  44. Data Hazards • An instruction depends on completion of data access by a previous instruction • add $s0, $t0, $t1sub $t2, $s0, $t3 Chapter 4 — The Processor — 45

  45. Forwarding (Bypassing) • Use result when it is computed • Don’t wait for it to be stored in a register • Requires extra connections in the datapath Chapter 4 — The Processor — 46

  46. Load-Use Data Hazard • Can’t always avoid stalls by forwarding • If value not computed when needed • Can’t forward backward in time! Chapter 4 — The Processor — 47

  47. Reordering Code to Avoid Pipeline Stalls • Consider the following code segment in C: • a = b + e; • c = b + f; Find the hazards in the following code segment

  48. Code Scheduling to Avoid Stalls • Reorder code to avoid use of load result in the next instruction • C code for A = B + E; C = B + F; lw $t1, 0($t0) lw$t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw$t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) stall stall 13 cycles 11 cycles Chapter 4 — The Processor — 49

  49. Control Hazards • Branch determines flow of control • Fetching next instruction depends on branch outcome • Pipeline can’t always fetch correct instruction • Still working on ID stage of branch • In MIPS pipeline • Need to compare registers and compute target early in the pipeline • Add hardware to do it in ID stage Chapter 4 — The Processor — 50

More Related