1 / 14

EE204 Computer Architecture

EE204 Computer Architecture. Single Cycle Data path Performance. Performance of Single-Cycle Machines.

cardea
Download Presentation

EE204 Computer Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EE204Computer Architecture Single Cycle Data path Performance Hina Anwar Khan 2011

  2. Performance of Single-Cycle Machines • Let's assume that the operation time for the following units is: Memory - 2 nanoseconds (ns), ALU and adders - 2 ns, Register file - 1 ns. We will assume that MUXs, control, sign-extension, PC accesses, and wires have no delays. • Which implementation is faster? 1. Every instruction operates in 1 clock cycle of fixed length.2. Every instruction operates in a varying length clock cycle. • Lets look at the time needed by each instruction: Inst. Fetch Reg. Rd ALU op Memory Reg. Wr TotalR-Type 2 1 2 0 1 6nsLoad 2 1 2 2 1 8nsStore 2 1 2 2 7nsBranch 2 1 2 5nsJump 2 2ns Hina Anwar Khan Spring 2011

  3. Fixed vs. Variable Cycle Length • Lets Assume a program has the following instruction mix: 24% loads, 12% stores, 44% R-type, 18% branches, 2% jumps. • For the fixed cycle length the cycle time is 8 ns, long enough for the longest instruction (load). Thus each instruction takes 8 ns to execute. • For the variable cycle time the average CPU clock cycle is:8*24% + 7*12% + 6*44% + 5*18% + 2*2% = 6.3 ns • It is obvious that the variable clock implementation is faster but it is extremely hard to implement. • Variable clock implementation is 8/6.3 = 1.27 times faster • When adding instructions such as multiply and divide which can take tens of cycles this scheme is too slow. Hina Anwar Khan Spring 2011

  4. Observations on the Single Cycle Design • The single-cycle datapath is straightforward, but... • It has to use 3 separate ALU’s • It has separate Instruction and Data memories • Cycle time is determined by worst-case path • A multi-cycle datapath might be better • We can reuse some of the hardware • We can combine the memories • Cycle time is still constant, but instructions may take differing numbers of cycles Hina Anwar Khan Spring 2011

  5. Multi-Cycle Implementation • Multi-Cycle Implementation • Each step in execution = 1 clock • Each Instruction of different clock cycles • Functional unit can be used more than once per instruction as long as it is used on different clock cycles • Reduce and Share Hardware units Hina Anwar Khan Spring 2011

  6. Multicycle Datapath Single Instruction & Data Memory Single ALU Registers Hina Anwar Khan Spring 2011

  7. Multicycle Execution • Instruction Register (IR) • Holds instruction until end of execution • Memory Data Register (MDR) • A Register • B Register • ALUOut Register Hina Anwar Khan Spring 2011

  8. Multicycle Datapath Branch target address Address Register Block Address Inst/Data Memory Instruction PC = PC +4 ALU Data Arithmetic/branch Instruction lw/sw Instruction Hina Anwar Khan Spring 2011

  9. Multicycle Datapath Hina Anwar Khan Spring 2011

  10. MultiCycle Datapath & Control Signals Hina Anwar Khan Spring 2011

  11. One Single ALU • One single ALU is used to perform all of the necessary functions: • An arithmetic operation on two register operands • Add a register to a sign-extended constant, for computing memory addresses in lw/sw instructions • Compute PC+4 to increment the PC • Add a sign-extended, shifted offset to (PC+4) for branches Hina Anwar Khan Spring 2011

  12. Implications of Shared Functional Units • Need to add multiplexors or expand existing multiplexors • e.g. Memory unit now contains both instructions (address in PC) and data (address in ALUOut) • e.g. ALU now must accommodate all inputs from previous ALU and adders. Hina Anwar Khan Spring 2011

  13. Two extra multiplexers • To enable all the actions listed for the ALU, two extra multiplexers are needed • A 2-to-1 mux, ALUsrcA, selects whether the first ALU input is the PC or a register • A 4-to-1 mux, ALUSrcB, selects the 2nd input from among • the register file • a constant 4 • a sign-extended constant, and • a sign-extended and shifted constant Hina Anwar Khan Spring 2011

  14. One single memory • One single memory is used in both the instruction fetch and data access stages. • The address for this memory may come from: • the PC register, when fetching an instruction • the ALU output, when doing a lw/sw instruction and need the effective memory address. • => add a 2-to-1 mux, IorD, to select whether the memory is being accessed for instructions or for data. Hina Anwar Khan Spring 2011

More Related