1 / 18

Chapter 5: Datapath and Control (Part 3)

Chapter 5: Datapath and Control (Part 3). CS 447 Jason Bakos. Single-Cycle CPU. CPI of the single cycle CPU from the last lecture had a CPI of 1 Clock cycle is determined by the longest possible path in the machine loads are the worst – they use 5 functional units in series

mayda
Download Presentation

Chapter 5: Datapath and Control (Part 3)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 5: Datapath and Control(Part 3) CS 447 Jason Bakos

  2. Single-Cycle CPU • CPI of the single cycle CPU from the last lecture had a CPI of 1 • Clock cycle is determined by the longest possible path in the machine • loads are the worst – they use 5 functional units in series • Performance, utilization, and efficiency are not going to be good, because most instructions don’t need such a long clock cycle • A variable-speed clock could be used to solve this problem, but hinders parallelism • Pipelining overlaps instruction executions

  3. Multicycle Implementation • Break instructions into steps, where each step requires one clock cycle • We want to reuse functional units within an instruction instead of just across instructions • Reduces hardware • Use single memory for instructions and data • Single ALU instead of one ALU and two adders • Add registers to functional units to hold intermediate results (state data) for future cycles • Use within instruction executions • Register file and memory hold state data to be used across instruction executions • These are programmer-visible • We will need a FSM to control CPU

  4. Registers • Locations of registers is determined by the following: • What combinatorial units will fit in one clock cycles • Assume memory access, regfile access (two reads or one write), or ALU operation • Any data needed by these operations must be stored in a temporary register • Instruction Register, Memory Data Register, A, B, and ALUOut registers added to design • All these except IR only need to hold data between two adjacent clock cycles • What data are needed in later cycles implementing the instruction

  5. Multiplexors • Need to add extra multiplexors (or expand existing muxes) to facilitate the reuse of the ALU within instructions • Add mux to first ALU input • Expand mux to second ALU input

  6. Multicycle CPU

  7. Breaking Instruction Execution into Clock Cycles • Goal is to balance the latency of the operations performed during each clock cycle • At most one of the following can occur in series: • One ALU operation • One register file access (or multiple in parallel) • One memory access (this is a joke, but we’ll accept this for now)

  8. Execution Stages • In order to clearly define the CPU operation for each step in the operation, we’ll use RTL (register transfer language) • Architecture research has defined 5 standard phases of instruction execution • Instruction fetch • Decode • Fetch register values from register file • Execute • Perform arithmetic/logic operation • Memory • Load/Store memory • Write back • Write register result back to register file

  9. Execution Stages • Fetch • IR=Memory[PC] • PC=PC+4 • Decode • A=Reg[IR[25..21]] • B=Reg[IR[20..16]] • ALUOut=PC+(sign_extend(IR[15..0]) << 2

  10. Execution Stages • Execute • Memory access • ALUOut=A+sign_extend(IR[15..0]) • R-type • ALUOut=A op B • Branch (beq) • if (A==B) PC=ALUOut • PC=PC[31..28] || (IR[25..0]<<2)

  11. Execution Stages • Memory Access/Write Back • Load • MDR=Memory[ALUOut] • Store • Memory[ALUOut]=B • R-type • Reg[IR[15..11]]=ALUOut • Memory Read Completion • Load • Reg[IR[20..16]]=MDR

  12. Control Signals • Control Unit signals • Refer to figure 5.34 (pg. 384) in the book • ALU Control signals • Provide an appropriate ALUOp signal based on what the ALU is being used for (if for an R-type, perform lookup based on function code)

  13. Control Signals • All that’s left is for us to build the control unit as a FSM and the ALU control as a lookup table

  14. Control Unit • The fetch and decode stages are the same for every instruction...

  15. Control Unit • Here’s the states and transitions for the memory-reference instructions

  16. Control Unit • Here’s the states and transitions for R-type, branch, and jump instructions

  17. Control Unit • Final control unit FSM...

  18. Problems to Think About • How could we add bne, blt, and bgez instructions to our CPU? • Do do you calculate CPI for our CPU if we are given instruction-type distributions?

More Related