1 / 317

Outline Introduction Version 0 MIPS CPU : Unpipelined MIPS CPU

Outline Introduction Version 0 MIPS CPU : Unpipelined MIPS CPU It executes integer instructions Version 1 MIPS CPU : Pipelined MIPS CPU It executes integer instructions Handout to use MIPS CPU. Getting ready for CS6143 The prerequisite for CS6143 CS6133 for graduate students

oria
Download Presentation

Outline Introduction Version 0 MIPS CPU : Unpipelined MIPS CPU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Outline • Introduction • Version 0 MIPS CPU : Unpipelined MIPS CPU • It executes integer instructions • Version 1 MIPS CPU : Pipelined MIPS CPU • It executes integer instructions • Handout to use • MIPS CPU CS 6143

  2. Getting ready for CS6143 • The prerequisite for CS6143 • CS6133 for graduate students • CS2214 for undergraduate students • CS6143 students who took the prerequisite course and did not use the Hennessy & Patterson book must realize that they will put in more effort than the other CS6143 students • They will have to learn the MIPS assembly language and the MIPS pipeline by themselves ! • If you are not sure you are ready for CS6143, you can work on the execution timing of the pipelined MIPS CPU on the next slide • You learned about it when you took the prerequisite course CS6133 or CS2214 • If you do not understand the timing, you need to take CS6133 • If you understand the timing, then study the remaining slides to refresh your memory on CPU design and pipelining CS 6143

  3. Test Program • Determine when the execution of the second iteration ends if L1 cache memories take one clock period and there is no cache miss • Show all forwardings and write-in-the-first-half-read-in-the-second-half cases IF ID EX MEM WB IF ID EX MEM WB 1 2 3 4 5 10 11 12 13 14 LD R1, 500(R8) DADD R2, R3, R1 DSUB R5, R2, R1 XOR R8, R5, R2 SLT R11, R2, R5 OR R14, R11, R15 BNEZ R14, (-7)10 SD R11, 600(R14) 2 3/4 5 6 7 11 12/13 14 15 16 3/4 5 6 7 8 12/13 14 15 16 17 5 6 7 8 9 14 15 16 17 18 Pipelined MIPS CPU Design : Version 1 6 7 8 9 10 15 16 17 18 19 7 8 9 10 11 16 17 18 19 20 8 9 17 18 9 10 11 12 18 19 20 21 The second iteration ends in clock period 21 All data hazards are RAW CS 6143

  4. Introduction • On the microarchitecture layer, a computer is a collection of at least three interconnected digital systems • A central processing unit (CPU) • A (main) memory • An I/O controller to control an I/O device, such as the disk • There can be several I/O controllers to control different I/O devices Introduction CPU Disk I/O Controller Interconnection System Memory CS 6143

  5. Digital Systems • A digital system performs microoperations • It consists of a datapath (data unit) and a control unit • The datapath actually performs the microoperations • The control unit determines which microoperation happens when Introduction ALUs Registers Buses Datapath Sequencer Control Unit Status signals Control signals CS 6143

  6. Digital Systems • The datapath (data unit) has registers, ALUs and buses to perform the microoperations • Registers keep information temporarily • ALUs perform arithmetic/logic operations • Buses interconnect the registers and ALUs • Other components are used include • Multiplexers (MUXes), decoders, encoders, comparators, counters, etc. Introduction CS 6143

  7. Digital Systems • The control unit has a sequencer circuit that determines the sequence of microoperations • The sequencer needs status signals from the data unit to know what is happening there • Then, it determines which microoperations to be performed and indicates to the datapath by means of control signals Introduction CS 6143

  8. Designing Digital systems • Datapath design is simpler than the control unit since it has highly regular (duplicated) circuits • A 64-bit ADDer is composed of 4 16-bit identical ADDers • A 64-bit comparator consists of 8 8-bit identical comparators, etc. • Control unit design is more difficult due to • Large amounts of random logic • A lot of effort is needed to make sure there are no timing problems • Microoperations must start at the right time and end at the right time ! Introduction CS 6143

  9. Designing digital systems • We will use the finite-state machine (FSM) technique to design the MIPS CPU where the FSM state diagram will have states with microoperations • The state diagram shows which state follows which state precisely • Each state indicates which microoperations to perform • The state diagram shows which states are needed when for which machine language instruction Introduction CS 6143

  10. Designing the microarchitecture level of a computer • There are two tasks in this design • Develop the CPU and memory digital systems so that instructions can be run • Develop the memory and I/O controller digital systems so that I/O can happen • We will concentrate on the CPU and memory digital systems Introduction CS 6143

  11. Designing the CPU and memory digital systems • First we focus on the CPU digital system while we make a few design decisions on the memory hierarchy quickly • We will design the CPU as a slow CPU running only integer instructions : No pipelining • This is Version0 • Then, we will improve the CPU speed by using pipelining, but still running integer instructions • This is Version 1 • For both versions the memory will be a black box with a few details Introduction CS 6143

  12. Designing the CPU as a Digital System • The MIPS CPU digital system • We will concentrate on • FSM state diagram of the MIPS CPU • The FSM state diagram describes both the datapath and the control unit • Datapath of the CPU • Datapath hardware for the execution of integer MIPS instructions will be covered • We will not concentrate on the MIPS CPU control unit • It can be implemented by hardwiring and/or microprogramming Introduction CS 6143

  13. Designing the CPU digital system • To design the MIPS CPU, we will start with the MIPS architecture • What is the connection between the architecture and the CPU? • A computer processes digital information, by running machine language instructions • A program is a list of instructions each of which specifies operations on data (arguments) • An instruction specifies architectural operations • Each architectural operation is implemented by microoperations Introduction CS 6143

  14. Designing the CPU Digital System • In order to perform an architectural operation, the CPU performs a series of microoperations in a number of clock periods • That is an architectural operation is broken down into smaller operations called microoperations • That is, to run a machine language instruction, the CPU performs microoperations • The CPU performs some microoperations alone and some in cooperation with the memory and the I/O controllers Introduction CS 6143

  15. Designing the CPU Digital System • Architectural operations • An architectural operation is what we describe as the semantics of the instruction • The architectural operation specified by the DADD instruction • Rd  Rs + Rt • The architectural operation specified by the DSLLV instruction • Rd  Rs << Rt • The architectural operation specified by the MOVN instruction • If Rt < 0 then Rd  Rs • The architectural operation specified by the J instruction • PC[36-63]  (4 x Offset) • It is the CPU that contributes the most to the execution of an instruction since it performs most of the microoperations needed for an architectural operation Introduction CS 6143

  16. Designing the CPU Digital System • Typical CPU digital system microoperations • Add, subtract, multiply • In the past, a 32-bit addition was completed in 1clock period. • Today, a 64-bit addition is completed in several clock periods • AND, OR, XOR • Shift right, Shift left • Read data from memory, write data to memory • In the past, a memory access was completed in 1clock period. • Today, it is completed in several clock periods • Read instructions from memory (fetch) • Increment the program counter • Transfer a register to another register • … Introduction CS 6143

  17. Designing the CPU as a Digital System • Other machines, especially CISC machines, require other microoperations such as • Reading indirect address(es) from the memory • Effective address calculation for • Indexing • Autoincrement • Autodecrement • Alignment for • Instructions • Data • Addresses Introduction CS 6143

  18. Designing the CPU Digital System • Architecture’s effect on microoperations • The decisions made on architecture determine the microoperations needed for the execution of the instructions • General microoperations found on most CPUs • The ones mentioned on previous slides • Specific microoperations for certain CPUs • Specific microoperations for MMUs, caches, I/O controllers • The architecture also determines the characteristics of each microoperation • If the autoincrement addressing mode is used, the number to be automatically added to the base register can be 4 or 8 depending on the length of memory location and world length sizes • Whether to attach 16 bits or 32 bits during sign extension • Thus, each machine language instruction requires a number of certain microoperations taking a certain time : the CPIi Introduction CS 6143

  19. Designing the CPU Digital System • Microoperations • The CPU can perform one or more microoperations per clock period, depending on the complexity of the microoperation and the availability of the hardware resources • Most often a microoperation can be completed in one clock period unless it is a complex microoperation • If a complex microoperations is desired to be run in a clock period, the clock period needs to be longer • The more and complex the microoperations are, the longer it takes to run the machine language instruction • CISC instructions take longer time to execute (larger CPIi) because of this reason Introduction CS 6143

  20. Designing the CPU Digital System • Calculating CPIi • The time it takes to run an instruction, CPIi, is then determined by • The number of microoperations needed for it • The complexity of the microoperations • The number of clock periods for an instruction, CPIi, becomes a matter of figuring out the microoperations and how to distribute them to individual clock periods • One can come up with 5-10 simple microoperations to be performed one after another, resulting in a CPIi of 5-10 • But, since microoperations are simple, the clock period is short • Alternatively, one can come up with 2-4 complex microoperations, resulting in a CPIi of 2-4 • But, the clock period is longer Introduction CS 6143

  21. Designing the CPU Digital System • Calculating CPIi • What can we do ? • Few long clock periods vs. many but shorter clock periods ? • Since increasing the clock frequency is important for marketing purposes the second option would weigh in substantially • It turns out that if pipelining is implemented, having many shorter clock periods would not matter as we will see • CPIi figures will be large but CPIave will be close to 1 (one) ! • Today’s microprocessors have instruction CPIi values in the range of 10-30, but CPIave figures for their targeted applications even less than 1 (one) ! Introduction CS 6143

  22. Designing the CPU Digital System • Determining microoperations for a machine language instruction • Some microoperations are performed for all the instructions • Usually at the same point in time during the execution of every instruction • Fetching the instruction is always the first microoperation to perform for all CPUs • Updating PC (PC  PC + 4) so that it points at the next instruction is also universal • The other microoperations depend on the instruction, the addressing mode, where the arguments are, the length of the arguments, etc. Introduction CS 6143

  23. Designing the CPU Digital System • Determining microoperations for a machine language instruction • We would list all the microoperations for each instruction, by making sure that we are consistent in terms of • Bus usage • We often decide an approximate number of buses we need for our datapath • Today’s CPUs have at least three internal buses to complete an integer arithmetic microoperation in one clock period • Two buses carry the numbers from two registers and the third bus carries the result to a register • ALU usage • An ALU is expensive and so we try to limit the number of them Introduction CS 6143

  24. Designing the CPU Digital System • Determining microoperations for a machine language instruction • We would list all the microoperations for each instruction, by making sure that we are consistent in terms of • Register usage • Additional registers not visible to the architecture level are used to keep temporary values : microarchitecture registers • Typically, the more registers are used, the more clock periods we spend for an instruction since temporary values will be passed from one clock period to another • But, sometimes we have to use microarchitecture registers, such as the instruction register that keep the current instruction • Control unit usage Introduction CS 6143

  25. Designing the CPU Digital System • Designing the MIPS CPU digital system • Determine how each MIPS architectural operation is implemented by microoperations • Most microoperations must be simple enough to be completed in less than one clock period • A few microoperations may not be completed in a clock period • For example a memory read may take several clock periods • These microoperations should be accommodated in the FSM state diagram, the datapath and the control unit Introduction CS 6143

  26. Designing the CPU Digital System • The MIPS microoperations implied by the MIPS machine language instructions are • Instruction fetch, performedalways • Update PC for next instruction, performed always • Effective address calculation for Displacement and relative addressing modes • Sign extension or catenation of 0s for data/addresses • Reading data from the memory • Writing data to the memory • Perform an arithmetic/logic • Register transfer • Testing a condition Introduction CS 6143

  27. Unpipelined MIPS CPU : Version 0 • By using the MIPS CPU Handout • The most interesting component of a computer is the CPU • We know that the CPU has registers, buses, ALUs and a sequencer, among other • Note that whether hardwiring or microprogramming is used, the datapath stays the same, at least theoretically • The textbook gives the description of the datapath, not the control unit • We will do the same thing • The datapath performs microoperations on data • It uses registers, buses and the ALU for that purpose • The microoperations are in turn controlled by the control unit. Unpipelined MIPS CPU Design : Version 0 CS 6143

  28. Overview • We are now ready for the organizational design of the MIPS • We know the architecture of MIPS • We will design • The MIPS CPU that will have • A control unit with a sequencer • A datapath containing registers, buses and the ALU • The datapath performs the microoperations and the control unit determines the timing and sequence of these microoperations Unpipelined MIPS CPU Design : Version 0 CS 6143

  29. Overview • The way the MIPS computer is covered indicates that the authors organized the computer similar to the commercial MIPS systems where • There is an integer MIPS CPU • A system control coprocessor (CP0) responsible for memory management and cache control. • A FP coprocessor (CP1) • The integer MIPS CPU registers are either architectural or microarchitectural (temporary registers) • There are two other coprocessors, CP2 and CP3 that are reserved for future use Unpipelined MIPS CPU Design : Version 0 CS 6143

  30. Overview • Designing the MIPS CPU for all of instructions is prohibitive • First, we will design a MIPS CPU to execute only integer instructions that include • LD, SD • DADD, DSUB • DADDI • AND, OR, XOR • ANDI, ORI, XORI • SLT • SLTI • BEQZ, BNEZ • All these integer instructions use either the I-format or the R-format • We will not cover the execution of J-format instructions • Their execution hardware can be derived after learning how the hardware for R-format and I-format instructions is constructed Unpipelined MIPS CPU Design : Version 0 CS 6143

  31. Overview • The MIPS CPU will have all the architectural registers • 32 64-bit GPRs • 64-bit PC • FP registers are to be added later in the semester Unpipelined MIPS CPU Design : Version 0 CS 6143

  32. New Microarchitectural registers • These (temporary) registers are not a part of the state (hence architecture) • 32-bit instruction register, IR, to keep the current instruction • IR contains the instruction until it is completely executed • 64-bit A and B registers • They keep the content of Rs and Rt registers of the current instruction • 64-bit register Imm • It contains the sign extended value of the 16-bit Displacement/Offset/Immediate (DOImm) field of I-type instructions Unpipelined MIPS CPU Design : Version 0 CS 6143

  33. New Microarchitectural registers • 64-bit Load Memory Data, LMD, register • It keeps the data read from the memory for Load instructions • 64-bit ALUoutput register • It keeps the result of the ALU operation temporarily • 1-bit Cond register • It keeps the result of compare operation between register A and 0 • This is needed for the BEQZ and BNEZ instructions that compare register Rs with 0 Unpipelined MIPS CPU Design : Version 0 CS 6143

  34. New Microarchitectural registers • 64-bit A and B registers R format 5 16 6 5 Opcode Displacement/Offset/Immediate Rt Rs Opcode Rs Function Rd Rt Shamt Unpipelined MIPS CPU Design : Version 0 6 5 5 I format To register A To register B CS 6143

  35. New Microarchitectural registers • Even if an instruction does not have Rs and Rt fields, such as a J-format instruction, Rs and Rt field bits are used to move Rs and Rt content to A and B, respectively • The values of A and B registers will not be used ! • The reason for moving to A and B is to make the common case fast where we think most instructions are R-format or I-format and require this move ! 5 5 J format Rt Rs Opcode Offset26 6 26 Unpipelined MIPS CPU Design : Version 0 To register A To register B Jump CS 6143

  36. New Microarchitectural registers • 64-bit register Imm • Even if the current instruction is not an I-format instruction, such as an R-format or J-format instruction, DOImm field bits are used to move DOImm+ to Imm • The value of the Imm register will not be used ! • The reason for moving to Imm is to make the common case fast where we think many instructions are I-format and require this move ! 5 16 6 5 Opcode Displacement/Offset/Immediate Rt Rs I format To registerImmafter sign extension Unpipelined MIPS CPU Design : Version 0 CS 6143

  37. New Microarchitectural registers • The textbook implies in Appendix A that the Displacement used for loads and stores is signed • Similarly, the textbook is sign extending the immediate data elements of ANDI, ORI and XORI instruction • Instead of attaching zeros to the left • In order not to complicate the coverage of textbook CPU design, we will accept these and assume the 16-bit value is signed for the integer instructions we will work on • We will use DOImm+ to indicate a sign-extended value from now on Unpipelined MIPS CPU Design : Version 0 CS 6143

  38. The MIPS CPU state diagram • The design of a CPU is very complex • We have to consider the space (hardware) and time (speed) • The design, analysis, description, testing, modification, optimization, servicing and maintenance can be more efficient if there are efficient tools around • These include HDLs and CAD tools • The textbook uses a typical register transfer language (RTL) notation in Appendix A to describe the execution of instructions • We will use the same RTL notation which is also used in the handout • To quickly see the execution steps of the integer machine language instructions, a FSM state diagram and a CPU datapath figure are developed in the handout • Additionally, timing diagrams and tables are provided to understand the CPU design Unpipelined MIPS CPU Design : Version 0 CS 6143

  39. The MIPS CPU state diagram • An instruction goes through several phases when executed • We give a name to each phase of an instruction execution • A phase is also called major cycle • Each major cycle will take one or more minor cycles (clock periods) • Each minor cycle is a state • Each minor cycle takes typically one clock period • Each major cycle often has at least one microoperation • Often the name of a major cycle is derived from the major microoperation of the cycle Unpipelined MIPS CPU Design : Version 0 CS 6143

  40. The MIPS CPU state diagram • The number of major cycles and their complexity are small for RISC systems and larger for CISC systems • Often for RISC systems, the CPIi for most frequently used instructions is between 4 and 6 • However, this number has to be larger to have deep pipelining and high clock frequencies • In simple systems like RISC systems sharing of hardware among different major cycles is not necessary • A hardware resource is often needed in one major cycle only • The hardware for each major cycle can then be easily identified and often named stage • So, the execution of an instruction is the movement of the instruction through some or all of the stages of the CPU ! Unpipelined MIPS CPU Design : Version 0 CS 6143

  41. The MIPS CPU state diagram • The MIPS integer instructions go through at most five major cycles during the execution • However, even for this RISC machine, it is difficult to name 5 cycle names because not all instructions do similar things in a major cycle • Some microoperations will be performed in advance in anticipation of a frequent operation • The early operations will not alter the state and will not cause longer clock periods, but will slightly increase the hardware Unpipelined MIPS CPU Design : Version 0 CS 6143

  42. The MIPS CPU state diagram • The MIPS CPU major cycles for integer instructions (pages A-27 – A-28) • Instruction fetch cycle • Abbreviated as IF, standing for instruction fetch • Same for all MIPS instructions. • Instruction decode/Register fetch cycle • Abbreviated as ID, standing for instruction decode • Same for all MIPS instructions. • Execution/effective address cycle • Abbreviated as EX, standing for execution • Memory access/branch completion cycle • Abbreviated as MEM, standing for memory • Write-back cycle • Abbreviated as WB, standing for write-back Unpipelined MIPS CPU Design : Version 0 CS 6143

  43. The MIPS CPU state diagram • Emphasizing again that designing a CPU is determining which microoperation happens when for each architectural operation (the semantics of the instruction) • For the MIPS, like many other CPUs, the IF and ID stages are identical for all instructions • The same microoperations are performed for all instructions • These microoperations implement portions of the architectural operation • For the MIPS, the remaining portions of the architectural operation are performed in the EX, MEM and WB stages Unpipelined MIPS CPU Design : Version 0 CS 6143

  44. The MIPS CPU state diagram • Architectural operations of I-format instructions among the integer instructions • Load/Store instructions • LD Rt, Disp(Rs)  Rt  M[Rs + Disp+] • SD Rt, Disp(Rs)  M[Rs + Disp+] Rt I format 5 16 6 5 Opcode Displacement/Offset/Immediate Rt Rs Unpipelined MIPS CPU Design : Version 0 Superscript + indicates sign extension Architectural operations of Load/Store instructions CS 6143

  45. The MIPS CPU state diagram • Architectural operations of I-format instructions among the integer instructions • Arithmetic/Logic instructions • DADDI Rt, Rs, Imm+ Rt  Rs + Imm+ • ANDI Rt, Rs, Imm+ Rt  Rs Λ Imm+ • ORI Rt, Rs, Imm+ Rt  Rs ν Imm+ • XORI Rt, Rs, Imm+ Rt  Rs Ө Imm+ • SLTI Rt, Rs, Imm+ If Rs < Imm+ then Rt  1 else Rt  0 I format 5 16 6 5 Opcode Displacement/Offset/Immediate Rt Rs Unpipelined MIPS CPU Design : Version 0 CS 6143

  46. The MIPS CPU state diagram • Architectural operations of I-format instructions among the integer instructions • Branch instructions • BEQZ Rs, Offset  If Rs = 0, then PC  PC + (4 x Offset+) • BNEZ Rs, Offset  If Rs ≠ 0, then PC  PC + (4 x Offset+) I format 5 16 6 5 Opcode Displacement/Offset/Immediate Rt Rs Unpipelined MIPS CPU Design : Version 0 CS 6143

  47. The MIPS CPU state diagram • Architectural operations of R-format instructions among the integer instructions • Arithmetic/Logic instructions • DADD Rd, Rs, Rt  Rd  Rs + Rt • DSUB Rd, Rs, Rt  Rd  Rs - Rt • AND Rd, Rs, Rt  Rd  Rs Λ Rt • OR Rd, Rs, Rt  Rd  Rs ν Rt • XOR Rd, Rs, Rt  Rd  Rs Rt • SLT Rt, Rs, Rt  If Rs < Rt then Rt  1 else Rt  0 R format 5 6 5 6 5 5 Opcode Rs Function Rd Rt Shamt Unpipelined MIPS CPU Design : Version 0 CS 6143

  48. The MIPS CPU state diagram • All J-format instructions are not executed by the CPU we are designing • However, one can incorporate them to the CPU design after the design of the R-format and I-format instructions is completed J format 5 5 Rt Rs Opcode Offset26 Unpipelined MIPS CPU Design : Version 0 CS 6143

  49. The MIPS CPU state diagram • The major cycles of the DLX CPU are shown by the state diagram given in the MIPS CPU handout • Registers A and B are used to prepare operands for an ALU operation • Each state takes 1 clock period • Later, we will change it to one or more clock periods • Memory accesses and complex arithmetic operations will take more than one clock period to perform • The state that has a memory access or a complex arithmetic operation will take more than one clock period • All microoperations mentioned in a state are performed in parallel, so their order does not matter • If a state takes more than one clock period, one has to be careful about the parallel operations • We now obtain the state diagram and the datapath hardware of the MIPS CPU Unpipelined MIPS CPU Design : Version 0 CS 6143

More Related