Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch
This presentation is the property of its rightful owner.
Sponsored Links
1 / 117

Outline Introduction Version 0 EMY CPU : Unpipelined EMY CPU It executes only integer instructions PowerPoint PPT Presentation


  • 146 Views
  • Uploaded on
  • Presentation posted in: General

Outline Introduction Version 0 EMY CPU : Unpipelined EMY CPU It executes only integer instructions How a memory hierarchy can be attached to the unpipelined CPU is also studied Handout to use EMY CPU . Introduction

Download Presentation

Outline Introduction Version 0 EMY CPU : Unpipelined EMY CPU It executes only integer instructions

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Outline

    • Introduction

    • Version 0 EMY CPU : Unpipelined EMY CPU

      • It executes only integer instructions

        • How a memory hierarchy can be attached to the unpipelined CPU is also studied

  • Handout to use

    • EMY CPU

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Introduction

    • On the microarchitecture layer, a computer is a collection of at least three interconnected digital systems

      • A central processing unit (CPU)

      • A (main) memory

      • An I/O controller to control an I/O device, such as the disk

        • There can be several I/O controllers to control different I/O devices

Introduction

CPU

Disk

I/O

Controller

Interconnection

System

Memory

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Digital Systems

    • A digital system performs microoperations

      • It consists of a datapath (data unit) and a control unit

        • The datapath actually performs the microoperations

        • The control unit determines which microoperation happens when

Introduction

ALUs

Registers

Buses

Datapath

Sequencer

Control Unit

Status signals

Control signals

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Digital Systems

    • The datapath (data unit) has registers, ALUs and buses to perform the microoperations

      • Registers keep information temporarily

      • ALUs perform arithmetic/logic operations

      • Buses interconnect the registers and ALUs

      • Other components are used include

        • Multiplexers (MUXes), decoders, encoders, comparators, counters, etc.

Introduction

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Digital Systems

    • The control unit has a sequencer circuit that determines the sequence of microoperations

      • The sequencer needs status signals from the data unit to know what is happening there

      • Then, it determines which microoperations to be performed and indicates to the datapath by means of control signals

Introduction

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Designing Digital systems

    • Datapath design is simpler than the control unit since it has highly regular (duplicated) circuits

      • A 32-bit ADDer is composed of 2 16-bit identical ADDers

      • A 32-bit comparator consists of 4 8-bit identical comparators, etc.

    • Control unit design is more difficult due to

      • Large amounts of random logic

      • A lot of effort is needed to make sure there are no timing problems

        • Microoperations must start at the right time and end at the right time !

Introduction

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Designing digital systems

    • We will use the finite-state machine (FSM) technique to design the EMY CPU where the FSM state diagram will have states with microoperations

      • The state diagram shows which state follows which state precisely

        • Each state indicates which microoperations to perform

      • The state diagram shows which states are needed when for which machine language instruction

Introduction

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Designing digital systems

    • We will design the EMY CPU by using the finite-state machine (FSM) technique

      • More specifically, we will obtain the following for the complete EMY CPU design

        • A high-level-state diagram to show which microoperation happens when

        • The datapath from the high-level state diagram

        • The low-level state diagram from the high-level sate diagram and the datapath

        • The control unit from the low-level state diagram

          • It can be implemented by hardwiring and/or microprogramming

Introduction

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Designing the microarchitecture level of a computer

    • There are two tasks in this design

      • Develop the CPU and memory digital systems so that instructions can be run

      • Develop the memory and I/O controller digital systems so that I/O can happen

    • We will concentrate on the CPU and memory digital systems

Introduction

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Designing the CPU and memory digital systems

    • First we focus on the CPU digital system while we make a few design decisions on the memory quickly

      • We will design the CPU as a slow CPU running only integer instructions : No pipelining

        • This is Version0

          • We will assume the memory is fast which is not realistic today

          • Then, we will see how a memory hierarchy with cache memories, etc. can be incorporated

      • Then, we will improve the CPU speed by using pipelining, but still running integer instructions

        • This is Version 1

          • We will assume the memory is fast which is not realistic today

          • Then, we will see how a memory hierarchy with cache memories, etc. can be incorporated

        • This CPU coverage will be in another PowerPoint presentation

    • For both versions the memory will be a black box with a few details

Introduction

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Designing the CPU as a Digital System

    • The EMY CPU digital system

      • We will concentrate on designing the EMY CPU for nine integer instructions in the beginning

        • High-level state diagram of the EMY CPU

        • Datapath of the CPU

        • Low-level state diagram of the CPU

        • Control unit of the CPU

Introduction

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Designing the CPU digital system

    • To design the EMY CPU, we will start with the EMY architecture

      • What is the connection between the architecture and the CPU?

        • A computer processes digital information, by running machine language instructions

        • A program is a list of instructions each of which specifies operations on data (arguments)

          • An instruction specifies architectural operations

          • Each architectural operation is implemented by microoperations

Introduction

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Designing the CPU Digital System

    • In order to perform an architectural operation, the CPU performs a series of microoperations in a number of clock periods

      • That is an architectural operation is broken down into smaller operations called microoperations

    • That is, to run a machine language instruction, the CPU performs microoperations

      • The CPU performs some microoperations alone and some in cooperation with the memory and the I/O controllers

Introduction

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Designing the CPU Digital System

    • Architectural operations

      • An architectural operation is what we describe as the semantics of the instruction, such as

        • The architectural operation specified by the ADD instruction

          • Rd  Rs + Rt

        • The architectural operation specified by the SUB instruction

          • Rd  Rs - Rt

        • The architectural operation specified by the SLT instruction

          • If Rs < Rt then Rd  1 else Rd  0

        • The architectural operation specified by the J instruction

          • PC[27-0]  (Address * 4)

      • It is the CPU that contributes the most to the execution of an instruction since it performs most of the microoperations needed for an architectural operation

Introduction

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Designing the CPU Digital System

    • Typical CPU digital system microoperations

      • Add, subtract, multiply

        • In the past, a 32-bit addition was completed in 1clock period.

          • Today, a 32-bit addition is completed in several clock periods

      • AND, OR, XOR

      • Shift right, Shift left

      • Read data from memory, write data to memory

        • In the past, a memory access was completed in 1clock period.

          • Today, it is completed in several clock periods

      • Read instructions from memory (fetch)

      • Increment the program counter

      • Transfer a register to another register

Introduction

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Designing the CPU as a Digital System

    • Other machines, especially CISC machines, require other microoperations such as

      • Reading indirect address(es) from the memory

      • Effective address calculation for

        • Indexing

        • Autoincrement

        • Autodecrement

      • Alignment for

        • Instructions

        • Data

        • Addresses

Introduction

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Designing the CPU Digital System

    • Architecture’s effect on microoperations

      • The decisions made on architecture determine the microoperations needed for the execution of the instructions

        • General microoperations found on most CPUs

          • The ones mentioned on previous slides

        • Specific microoperations for certain CPUs

        • Specific microoperations for Memory Management Units (MMUs), caches, I/O controllers

      • The architecture also determines the characteristics of each microoperation

        • If the 26-bit PC-direct addressing mode is used, the rightmost 26 bits of IR are catenated the leftmost 4 bits of PC and the resulting 30 bits are shifted to the left by 2

      • Thus, each machine language instruction requires a number of certain microoperations taking a certain time : the CPIi

Introduction

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Designing the CPU Digital System

    • Microoperations

      • The CPU can perform one or more microoperations per clock period, depending on the complexity of the microoperation and the availability of the hardware resources

        • Most often a microoperation can be completed in one clock period unless it is a complex microoperation

          • If a complex microoperations is desired to be run in a clock period, the clock period needs to be longer

      • The more and complex the microoperations are, the longer it takes to run the machine language instruction

        • CISC instructions take longer time to execute (larger CPIi)

Introduction

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Designing the CPU Digital System

    • Calculating CPIi

      • The time it takes to run an instruction, CPIi, is then determined by

        • The number of microoperations needed for it

        • The complexity of the microoperations

      • The number of clock periods for an instruction, CPIi, becomes a matter of figuring out the microoperations and distributing them to individual clock periods

        • One can come up with 5-10 simple microoperations to be performed one after another, resulting in a CPIi of 5-10

          • But, since microoperations are simple, the clock period is short

        • Alternatively, one can come up with 2-4 complex microoperations, resulting in a CPIi of 2-4

          • But, the clock period is longer

Introduction

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Designing the CPU Digital System

    • Calculating CPIi

      • What can we do ?

        • Few long clock periods vs. many but shorter clock periods ?

          • Since increasing the clock frequency is important for marketing purposes the second option would weigh in substantially

          • It turns out that if pipelining is implemented, having many shorter clock periods would be beneficial as we will see

          • CPIi figures will be large but CPIave will be close to 1 (one) !

      • Today’s microprocessors have instruction CPIi values in the range of 10-30, but CPIave figures for their targeted applications are even less than 1 (one) !

        • Because they employ advanced pipelining techniques, such as superscalar execution, hyperthreading, etc.

Introduction

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Designing the CPU Digital System

    • Determining microoperations for a machine language instruction

      • Some microoperations are performed for all the instructions

        • Usually at the same point in time during the execution of every instruction

          • Fetching the instruction is always the first microoperation to perform for all CPUs

          • Updating PC (PC  PC + 4) so that it points at the next instruction is also universal

      • The other microoperations depend on the instruction, the addressing mode, where the arguments are, the length of the arguments, etc.

Introduction

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Designing the CPU Digital System

    • Determining microoperations for a machine language instruction

      • We would list all the microoperations for each instruction, by making sure that we are consistent in terms of

        • Bus usage

          • We often decide an approximate number of buses we need for our datapath

          • Today’s CPUs have at least three internal buses to complete an integer arithmetic microoperation in one clock period

          • Two buses carry the numbers from two registers and the third bus carries the result to a register

        • ALU usage

          • An ALU is expensive and so we try to limit the number of them

Introduction

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Designing the CPU Digital System

    • Determining microoperations for a machine language instruction

      • We would list all the microoperations for each instruction, by making sure that we are consistent in terms of

        • Register usage

          • Additional registers not visible to the architecture level are used to keep temporary values : microarchitecture registers

          • Typically, the more registers are used, the more clock periods we spend for an instruction since temporary values will be passed from one register in one clock period to another register to be used the following clock period

          • But, sometimes we have to use microarchitecture registers, such as the instruction register that keep the current instruction

        • Control unit usage

Introduction

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Designing the CPU Digital System

    • Determine how each EMY architectural operation is implemented by microoperations

      • Most microoperations must be simple enough to be completed in less than one clock period

        • A few microoperations may not be completed in a clock period

          • For example a memory read may take several clock periods since the memory is slower

          • These long microoperations should be accommodated in the high-level state diagram, the datapath, low-level state diagram and the control unit

        • We will assume in the beginning that every microoperation is completed in one clock period

Introduction

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Designing the CPU Digital System

    • The EMY microoperations implied by the EMY machine language instructions include

      • Instruction fetch, performedalways

      • Update PC for next instruction, performed always

      • Effective address calculation for Displacement and relative addressing modes

      • Sign extension or catenation of 0s for data/addresses

      • Reading data from the memory

      • Writing data to the memory

      • Perform an arithmetic/logic

      • Register transfer

      • Testing a condition

Introduction

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Unpipelined EMY CPU : Version 0

    • By using the EMY CPU Handout

      • The most interesting component of a computer is the CPU

        • We know that the CPU has registers, buses, ALUs and a sequencer, among other

        • Note that whether hardwiring or microprogramming is used, the datapath stays the same, at least theoretically

        • The datapath performs microoperations on data

          • It uses registers, buses and the ALU for that purpose

        • The microoperations are in turn controlled by the control unit.

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Overview

    • We are now ready for the organizational design of the EMY

      • We know the architecture of EMY

    • We will design

      • The EMY CPU that will have

        • A control unit with a sequencer

        • A datapath containing registers, buses and the ALU

      • The datapath performs the microoperations and the control unit determines the timing and sequence of these microoperations

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Overview

    • The way the EMY computer is covered indicates that the authors organized the computer similar to the commercial EMY systems where

      • There is an integer EMY CPU

      • A system control coprocessor (CP0) responsible for memory management and cache control.

      • A FP coprocessor (CP1)

    • The integer EMY CPU registers are either architectural or microarchitectural (temporary registers)

    • There are two other coprocessors, CP2 and CP3 that are reserved for future use

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Overview

    • Designing the EMY CPU for all of instructions is prohibitive

    • First, we will design a EMY CPU to execute only integer instructions that include

      • LW, SW

      • ADD, SUB, SLT, AND, OR

      • BEQ, J

    • These integer instructions use the three format : R, I and J formats

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Overview

    • The EMY CPU will have all the architectural registers needed by these nine integer instructions

      • 32 32-bit GPRs

      • 32-bit PC

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • New Microarchitectural registers

    • These (temporary) registers are not a part of the state (hence architecture)

    • 32-bit instruction register, IR, to keep the current instruction

      • IR contains the instruction until it is completely executed

    • 32-bit A and B registers

      • They keep the content of Rs and Rt registers of the current instruction

    • 32-bit register ALUout

      • It contains a memory address or A/L operation result

    • 32-bit Memory Data Register, MDR, register

      • It keeps the data read from the memory for Load instructions

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • New Microarchitectural registers

    • 32-bit A and B registers

I format

5

16

6

5

Opcode

Displacement/Offset/Immediate

Rt

Rs

Opcode

Rs

Function

Rd

Unpipelined EMY CPU Design : Version 0

Rt

Shamt

6

5

5

R format

To register

A

To register

B

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • New Microarchitectural registers

    • Even if an instruction does not have Rs and Rt fields, such as a J-format instruction, Rs and Rt field bits are used to move Rs and Rt content to A and B, respectively

      • The values of A and B registers will not be used !

      • The reason for moving to A and B is to make the common case fast where we think most instructions are R-format or I-format and require this move !

J format

5

5

Rt

Rs

Opcode

Offset26

6

26

Unpipelined EMY CPU Design : Version 0

To register

A

To register

B

Jump

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • New Microarchitectural registers

    • Note that the Displacement used for loads and stores is signed

    • The offset of BEQ is also signed

    • We have to sign extend the 16-bit Displacement, Offset and Immediate (DOImm) value for some of the integer instructions

      • These include LW, SW, BEQ

      • We will use DOImm+ to indicate a sign-extended value from now on

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The EMY CPU state diagram

    • The design of a CPU is very complex

      • We have to consider the space (hardware) and time (speed)

      • The design, analysis, description, testing, modification, optimization, servicing and maintenance can be more efficient if there are efficient tools around

      • These include HDLs and CAD tools

      • The textbook uses a typical register transfer language (RTL) notation in Appendix A to describe the execution of instructions

        • We will use the same RTL notation which is also used in the handout

    • To quickly see the execution steps of the integer machine language instructions, a high-level state diagram a CPU datapat, a low-level state diagram are developed in the handout

      • Additionally, timing diagrams and tables need to be studied to understand the CPU design

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The EMY CPU state diagram

    • An instruction goes through several phases when executed

      • We give a name to each phase of an instruction execution

        • A phase is also called major cycle

      • Each major cycle will take one or more minor cycles (clock periods)

        • Each minor cycle is a state

        • Each minor cycle takes typically one clock period

      • Each major cycle often has at least one microoperation

        • Often the name of a major cycle is derived from the major microoperation of the cycle

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The EMY CPU state diagram

    • The number of major cycles and their complexity are small for RISC systems and larger for CISC systems

    • Often for RISC systems, the CPIi for most frequently used instructions is between 4 and 6

      • However, this number has to be larger to have deep pipelining and high clock frequencies

    • In simple systems like RISC systems sharing of hardware among different major cycles is not necessary

      • A hardware resource is often needed in one major cycle only

        • The hardware for each major cycle can then be easily identified and often named stage

        • So, the execution of an instruction is the movement of the instruction through some or all of the stages of the CPU !

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The EMY CPU state diagram

    • The EMY integer instructions go through at most five major cycles during the execution

      • However, even for this RISC machine, it is difficult to name 5 cycle names because not all instructions do similar things in a major cycle

    • Some microoperations will be performed in advance in anticipation of a frequent operation

      • The early operations will not alter the state and will not cause longer clock periods, but will slightly increase the hardware

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The EMY CPU state diagram

    • The EMY CPU major cycles for integer instructions

      • Instruction fetch cycle

        • Abbreviated as IF, standing for instruction fetch

        • Same for all EMY instructions.

      • Instruction decode/Register fetch cycle

        • Abbreviated as ID, standing for instruction decode

        • Same for all EMY instructions.

      • Execution/effective address cycle

        • Abbreviated as EX, standing for execution

      • Memory access cycle

        • Abbreviated as MEM, standing for memory

      • Write-back cycle

        • Abbreviated as WB, standing for write-back

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The EMY CPU state diagram

    • Emphasizing again that designing a CPU is determining which microoperation happens when for each architectural operation (the semantics of the instruction)

    • For the EMY, like many other CPUs, the IF and ID stages are identical for all instructions

      • The same microoperations are performed for all instructions

      • These microoperations implement portions of the architectural operation

    • For the EMY, the remaining portions of the architectural operation are performed in the EX, MEM and WB cycles

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The EMY CPU state diagram

    • Architectural operations of I-format instructions among the integer instructions

      • Load/Store instructions

        • LW Rt, Disp(Rs)  Rt  M[Rs + Disp+]

        • SW Rt, Disp(Rs)  M[Rs + Disp+] Rt

I format

5

16

6

5

Opcode

Displacement/Offset/Immediate

Rt

Rs

Unpipelined EMY CPU Design : Version 0

Superscript + indicates sign extension

Architectural operations of

Load/Store instructions ≡ Semantics

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The EMY CPU state diagram

    • Architectural operations of I-format instructions among the integer instructions

      • Branch instruction

        • BEQ Rs, Rt, Offset  If Rs = Rt, then PC  PC + (Offset+ x 4)

I format

5

16

6

5

Opcode

Displacement/Offset/Immediate

Rt

Rs

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The EMY CPU state diagram

    • Architectural operations of R-format instructions among the integer instructions

      • Arithmetic/Logic instructions

        • ADD Rd, Rs, Rt  Rd  Rs + Rt

        • SUB Rd, Rs, Rt  Rd  Rs - Rt

        • AND Rd, Rs, Rt  Rd  Rs & Rt

        • OR Rd, Rs, Rt  Rd  Rs | Rt

        • SLT Rt, Rs, Rt  If Rs < Rt then Rt  1 else Rt  0

R format

5

6

5

6

5

5

Rs

Function

Rd

Rt

Opcode

Shamt

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The EMY CPU state diagram

    • Architectural operations of J-format instructions among the integer instructions

    • Jump instruction

      • PC[27-0]  (Address x 4)

J format

5

5

Rt

Rs

Opcode

Offset26

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The EMY CPU state diagram

    • The major cycles of the DLX CPU are shown by the high-level state diagram given in the EMY CPU handout

    • Registers A and B are used to prepare operands for an ALU operation

    • Each state takes 1 clock period

      • Later, we will change it to one or more clock periods

        • Memory accesses and complex arithmetic operations can take more than one clock period to perform

          • The state that has a memory access or a complex arithmetic operation will take more than one clock period

    • All microoperations mentioned in a state are performed in parallel, so their order does not matter

      • If a state takes more than one clock period, one has to be careful about the parallel operations

    • We now obtain the state diagram and the datapath hardware of the EMY CPU

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The EMY major cycles and states

    • The instruction fetch cycle

      • It is performed for all the instructions

      • There are two microoperations performed

      • In general, all CPUs, regardless of their architecture do these two microoperations

        • Read the machine language instruction pointed by the program counter (PC) to the instruction register (IR)

        • Update the program counter so that it points at the instruction that follows the instruction being read from the memory

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The EMY major cycles and states

    • The instruction fetch cycle

      • Read the machine language instruction pointed by the program counter (PC) to the instruction register (IR)

        • IR ← M[PC]

          • Note the RTL notation that we use an equal sign (=) if the destination is a wire or a bus and an arrow sign () if the destination is a register, such as IR

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The EMY major cycles and states

    • The instruction fetch cycle

      • Read the machine language instruction pointed by the program counter (PC) to the instruction register (IR)

        • IR ← M[PC]

        • Then, the read of the instruction in terms buses is as follows :

        • Note again the three microoperations implement the instruction read and they happen at the same and their order does not matter

        • Note the RTL notation that we use an equal sign (=) if the destination is a wire or a bus, such as MABUS and an arrow sign () if the destination is a register, such as IR

MABUS = PC ; MemRead = 1 ; IR  MRBUS

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The EMY major cycles and states

    • The instruction fetch cycle

      • Update the program counter so that it points at the next instruction

        • PC ← PC + 4

          • Since an instruction is four bytes long, we need to add 4 to PC

          • We will use the general ALU to do the addition, at the expense of increasing the complexity of the ALU input logic

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The instruction fetch cycle

    • The two microoperations of the IF cycle can be shown in state 0 as follows

      • The two microoperations are simply shown without using buses to save space

      • The instruction read and PC update microoperations happen simultaneously and complete before the end of the clock period

0

IR  M[PC] ;

PC  PC + 4 ;

IF

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The EMY major cycles and states

    • The instruction decode cycle

      • The most important goal in this cycle is to decode the instruction

        • Decoding the instruction means the CPU determines what the current instruction is

        • It is performed for all the instructions regardless of their architecture

        • Decoding is done by the control unit that checks the opcode and function bits of IR

          • They are input as status signals to the control unit

      • During this time the datapath does not do anything

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The instruction decode cycle

    • During the Decode cycle the control unit determines what the next state will be based on the type of the instruction

      • If it is a memory reference instruction (LW, SW), the next state is state 2 in the EX cycle

      • If it is a R-format A/L instruction, the next state is state 6 in the EX cycle

      • If it is a BEQ instruction, the next state is state 8 in the EX cycle

      • If it is a J instruction, the next state is state 9 in the EX cycle

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The EMY major cycles and states

    • The instruction decode cycle

      • We realize that we can perform a number of microoperations needed for some instructions in the datapath since it is not used

        • Performing these microoperations in advance can help run those instructions faster

      • Which microoperations to perform in order to be prepared ?

        • We can transfer GPR register Rs pointed by I-format and R-format instructions to register A

        • We can transfer GPR register Rt pointed by I-format and R-format instructions to register B

      • We realize that in order to save hardware we can transfer Rs and Rt to A and B registers in every clock period

        • This will cause any problem and simplify the Control Unit since it would have generate Store signals for A and B registers

      • We realize that we can transfer the output of the ALU to a microarchitectural register, ALUout, in every clock period

        • We will later see it will simplify the Control Unit

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The EMY major cycles and states

    • The instruction decode cycle

      • When we discuss BEQ, we will add one more microoperation to perform in the Decode cycle

      • By doing these in advance, we save time

        • But, not all instructions need them : J-format instructions do not need them and some of I-format instructions do not need the transfer to register B

        • This is fine since A, B and ALUout registers are not architectural registers and so changing them will not result in program errors

      • These microoperations transferring to A, B and ALUout are performed for all the instructions in every clock period

      • In general, RISC CPUs do these microoperations

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The EMY major cycles and states

    • The instruction decode cycle

      • A ← GPR[Rs] ; B ← GPR[Rt]

        • The GPR register file is designed so that two GPRs can be read simultaneously, by using the Rs and Rt fields of IR

          • This means the GPR register file has two read ports controlled by Rs and Rt

          • Note that the order of these microoperations does not matter as they happen simultaneously

          • There is also a write port to the GPR register file controlled by Rt and Rd fields : 10 bits are connected to the GPR file to determine the destination register

A  GPR[Rs] ; B  GPR[Rt]

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The instruction decode cycle

    • These three microoperations happen in every clock period

      • The GPR read ports are directly connected to register A and B and so no buses are used

      • The three microoperations happen simultaneously in every clock period and complete before the end of the clock period

    • Since these microoperations happen every clock period, we will not show them in our states

A  GPR[Rs] ;

B  GPR[Rt] ;

ALUout  OBUS

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The instruction decode cycle

    • For the time being the decode cycle state will be as follows

      • No microoperation happens other than the three microoperations that happen every clock period

      • However, we will change state 1 and place a microoperation there when we discuss BEQ instructions

1

ID

A  GPR[Rs] ;

B  GPR[Rt] ;

ALUout  OBUS

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of LW and SW

    • The LW instruction

      • LW Rt, Disp(Rs)  Rt  M[Rs + Disp+]

      • We see that to execute the LW we need to

        • Calculate the effective address, the address of the memory location we want load from : Rs + Disp+

          • Read the cache memory pointed by the effective address

        • Transfer the value to GPR register Rt

    • The SW instruction

      • SW Rt, Disp(Rs)  M[Rs + Disp+] Rt

      • We see that to execute the SW we need to

        • Calculate the effective address, the address of the memory location we want store to : Rs + Disp+

          • Write to the cache memory pointed by the effective address

        • Transfer the value from GPR register Rt to the memory pointed by the effective address

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of LW and SW

    • LW and SW both have a microoperation in common : calculating the effective address

      • Then their microoperations differ

    • In order to calculate the effective address, we need to sign extend the DOImm field and add to GPR Rs

      • Register GPR Rs has been transferred to A

      • We also realize that GPR register Rt has been transferred to register B

        • Register B will be written to the memory for the SW instruction then

    • LW requires one extra microoperation than the SW as we will soon see

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of LW and SW

    • We decide to have the effective address calculation of LW and SW in the Execution/Effective address cycle

      • The effective address is stored in a microarchitectural register called ALUout

      • Then, we separate LW and SW execution in the Memory Access/Branch completion cycle : Both access the memory

        • LW reads the memory location pointed by the effective address to a microarchitectural register called MDR

        • SW writes microarchitectural register B to a memory location pointed by the effective address and completes its execution

      • LW completes its execution by transferring the data in MDR to GPR register Rt

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of LW and SW

    • The effective address calculation

      • Rs + Disp+

        • Rs is now in register A

        • Sign extend DOImm and then add

    • As we will see shortly, A/L instructions will have their arithmetic/logic operation performed in this cycle as well

      • They need the ALU in this cycle

      • Therefore, we decide to use the adder of the ALU to do the addition for the effective address

ALUout  A + DOImm+

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of LW and SW

    • Reading from the memory

      • Note that the microoperations are performed in parallel and the order does not matter

    • This microoperation can be stated without giving the bus detail

    • Note that the memory access can take more than one clock period and so we may stay in this state more than one clock period

MABUS = ALUout ; MemRead = 1 ; MDR  MRBUS

Unpipelined EMY CPU Design : Version 0

MD R M[ALUout]

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of LW and SW

    • The LW instruction completes by transferring MDR to GPR register Rt

      • The Rt field of IR is used by the GPR register file to select the register to be written the value from MDR

      • The result would be stored on the microarchitectural register ALUout

        • Though, we could store the result of the operation directly on GPR register Rd

        • We decide to store to MDR and transfer from MDR to the GPR write port

          • This decision will help pipelining as we will see later !

    • We then go back to state 0, the IF cycle, to start executing the next instruction

GPR[Rt]  MDR

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of LW and SW

    • Storing to the data memory

      • Note that the microoperations are performed in parallel and the order does not matter

    • This microoperation can be stated without giving the bus detail

      • Note that the memory access can take more than one clock period and so we may stay in this state more than one clock period

    • SW completes its execution !

  • We then go back to state 0, the IF cycle to start executing the next instruction

MABUS = ALUout ; MemWrite = 1 ; MWBUS = B

M[ALUout]  B

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

0

  • Completing the execution of LW and SW

    • The portion of the state diagram for LW and SW

From the ID cycle

LW, SW

2

EX

ALUout  A + DOImm+

SW

LW

3

5

Unpipelined EMY CPU Design : Version 0

MEM

MDR  M[ALUout]

M[ALUout]  B

4

GPR[Rt]  MDR

WB

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of R-format A/L instructions

    • The R-format A/L instructions

      • ADD Rd, Rs, Rt  Rd  Rs + Rt

      • SUB Rd, Rs, Rt  Rd  Rs - Rt

      • AND Rd, Rs, Rt  Rd  Rs & Rt

      • OR Rd, Rs, Rt  Rd  Rs | Rt

      • SLT Rd, Rs, Rt  If Rs < Rt then Rd  1 else Rd  0

    • We see that to execute these instructions we need to perform an operation specified by the Opcode and Function fields

    • Then, we transfer the result to GPR register Rd

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of R-format A/L instructions

    • We see that we can perform the all required operations for R-format instructions in one state

      • Which one to perform would be determined by the Opcode and Function fields

        • The inputs are Rs and Rt

        • Rs is already transferred to register A and Rt is already transferred to register B

        • We see we save time by moving them in the ID stage !

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of R-format A/L instructions

    • We see that we can perform the all required operations for R-format instructions in one state

      • The result would be stored on the microarchitectural register ALUout

        • Though, we could store the result of the operation directly on GPR register Rd

        • This would require a separate bus from the output of the ALU to the write port of the GPR file

        • We decide to store to ALUout and transfer from ALUout to the GPR write port

          • This decision will help pipelining as we will see later !

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of R-format A/L instructions

    • The microoperation for the current R-format A/L operation

      • The meaning of “op” is that the type of the operation is indicated by the Opcode and Function fields of IR

        • What happens is that the control unit uses the Opcode and Function fields to generate a set of control signals

        • These control signals are connected to the ALU, telling which operation to perform

ALUout  A op B

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of R-format A/L instructions

    • The ALU output for the five A/L instructions is straightforward to understand, except when the SLT instruction is executed

    • The ALU has to output

      • 1 (31 zeros and a one) if Rs < Rt

      • 0 (32 zeros) otherwise

    • The ALU will have a functional unit called SLT that will output as such

SLT Rd, Rs, Rt  If Rs < Rt then Rd  1 else Rd  0

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of R-format A/L instructions

    • The result that is in ALUout is moved to GPR register Rd

      • The Rd field of IR is used by the GPR register file to select the register to be written the value from ALUoutput

    • We then go back to state 0, the IF cycle to start executing the next instruction

GPR[Rd]  ALUout

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

0

  • Completing the execution of R-format A/L instructions

    • The portion of the state diagram for R-format A/L instructions

From the ID cycle

R-Format A/L instructions

6

EX

ALUout  A op B

7

Unpipelined EMY CPU Design : Version 0

MEM

GPR[Rd]  ALUout

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of Control instructions

    • The BEQZ instruction

      • BEQ Rs, Rt, Offset  If Rs = Rt, then

        PC  PC + (Offset+ x4)

    • The J instruction

      • J Address  PC[27-0]  (Address x 4)

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of Control instructions

    • We see that to execute these instructions we need to

      • Calculate the effective address the address to branch/jump to

        • Add PC to the result of the multiplication of the sign extended Offset by 4

        • Move to the rightmost 28 bits of PC the result of the multiplication of the Address by 4

          • This completes the J instruction execution

      • Test if Rs is equal to equal to Rt and if yes, transfer the the effective address to PC

        • This completes the BEQ instruction execution

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of BEQ instruction

    • In order to calculate the effective address, we need to

      • Sign extend the DOImm field

      • Then multiply it by 4

      • The add it to PC

    • That is we need to do the following

      PC + (Offset+ x4)

    • We realize that the Datapath is free to do this microoperation in the ID cycle while it is doing the other microoperations as it does in every clock period

Unpipelined EMY CPU Design : Version 0

A  GPR[Rs] ;

B  GPR[Rt] ;

ALUout  OBUS

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of BEQ instruction

    • We then decide to calculate the effective address of the BEQ instruction in ID cycle

      ALUout  PC + (Offset+ x4)

    • Note that we are calculating the BEQ effective address while we are determining which instruction we have

      • If we do not have a BEQ instruction, the result is not used

      • Otherwise, we save time since we perform this operation in advance

    • Note that executing BEQ fast, by performing its microoperation in advance is important since this will help pipelining

      • As we shall see later, control instructions, all Branch and Jump instructions, slow down the pipeline CPU

        • It is therefore critical to complete their execution as quickly as possible to reduce the negative effect of these instructions on the pipeline

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of BEQ instruction

    • The BEQ effective address calculation

      • ALUout  PC + (Offset+ x4)

        • Sign extending DOImm requires a simple combinational circuit

        • Multiplying by 4 requires no logic at all

          • We just have to catenate two zeros to the right of DOImm

          • We know that shifting a number to the left by two bit positions is multiplying it by four

      • We decide to use the adder of the ALU to do the addition since it is free to use

ALUout  PC + ([Offset+]<< 2)

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of BEQ instruction

    • Since the BEQ effective address calculation is done in the decode cycle, state 1 has been modified

    • We now perform a microoperation in the decode cycle besides the three microoperations we perform every clock period

1

ALUout  PC + ([Offset+]<< 2)

ID

Unpipelined EMY CPU Design : Version 0

A  GPR[Rs] ;

B  GPR[Rt] ;

ALUout  OBUS

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of BEQ instruction

    • Testing if Rs is equal to Rt and storing to PC based on the result

      • We know that GPR register Rs has been transferred to register A

      • We know that GPR register Rt has been transferred to register B

      • We will compare Register A Register B then !

      • We decide to have a Zero circuit in the ALU to compare registers A and B

        • The ALU will have a new 1-bit output named Zero showing the result of the compare so that Zero is

          • 1 if the two registers are equal

          • 0 if the two registers are not equal

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of BEQ instruction

    • Changing PC if Zero is 1

      • This means we branch to a memory location

        • That is we take the branch

    • We then go back to state 0, the IF cycle to start executing the next instruction

If (A == B) then PC  ALUout

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of BEQ instruction

    • How do we handle conditional and unconditional stores on PC ?

      • If PC has to be stored unconditionally, such as in state 0, we use the control signal, PCWrite, to store on PC

      • The control unit generates another control signal, PCWriteCond, to conditionally store on PC

      • PCWriteCond is ANDed with Zero to conditionally store on PC

        • If Zero = 1 it means the condition is true, we will branch which means we store the effective address on PC

        • If Zero = 0 it means the condition is not true, we will not branch which means we will not store the effective address on PC

Unpipelined EMY CPU Design : Version 0

PCWrite

Zero

PCWriteCond

PC

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

0

  • Completing the execution of BEQ instruction

    • The portion of the state diagram for BEQ instruction

    • Note again that we complete the execution of BEQ fast since this will help pipelining

      • As mentioned earlier, control instructions slow down the pipeline CPU

        • It is therefore critical to complete their execution as quickly as possible to reduce the negative effect of these instructions on the pipeline

From the ID cycle

BEQ

8

If A == B then PC  ALUout

EX

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of J instruction

    • We see that all we need to do is store the effective address on PC unconditionally

      • Move to the rightmost 28 bits of PC the result of the multiplication of the Address by 4

      • This is equivalent to

        • The rightmost 4 bits of PC are not changed !

      • The above microoperation has the same effect the following one

PC[27-0]  (Address x 4)

PC PC[31-28], (Address x 4)

Unpipelined EMY CPU Design : Version 0

PC (PC[31-28], Address) x 4

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of J instruction

    • In order to calculate the effective address, we need to

      • Multiply DOImm by 4

        • Multiplying by 4 requires no logic at all

          • We just have to catenate two zeros to the right of DOImm

          • We know that shifting a number to the left by two bit positions is multiplying it by four

    • We then go back to state 0, the IF cycle to start executing the next instruction

PC  (PC[31-28], Address)<< 2

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

0

  • Completing the execution of J instruction

    • The portion of the state diagram for J instruction

    • Note again that we complete the execution of J fast since this will help pipelining

      • As mentioned earlier, control instructions slow down the pipeline CPU

        • It is therefore critical to complete their execution as quickly as possible to reduce the negative effect of these instructions on the pipeline

From the ID cycle

J

9

PC  (PC[31-28], Address)<< 2

EX

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Completing the execution of J instruction

    • How do we handle conditional and unconditional stores on PC ?

      • If PC has to be stored unconditionally, such as in state 0 and state 9 , we use the control signal, PCWrite, to store on PC

        • The J instruction uses PCWrite

      • The control unit generates another control signal, PCWriteCond, to conditionally store on PC

        • The BEQ instruction uses PCWriteCond

PCWrite

Unpipelined EMY CPU Design : Version 0

Zero

PCWriteCond

PC

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The complete state diagram

    • The high-level state diagram for integer instructions and the datapath are given in the EMY CPU handout

    • They will be modified to implement a pipelined EMY CPU

      • But, the overall CPU structure will be similar

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The complete state diagram

    • The high-level state diagram for integer instructions has a branch out from state 1

      • If the CPU receives an instruction that is not one the nine (LW, SW, ADD, SUB, AND, OR, SLT, BEQ, J), it will generate an internal interrupt (an exception) since it does not know what to do

        • In order to generate the internal interrupt, it will go to state 10 to prepare the CPU for the interrupt

          • In state 10, it will perform a number of microoperations, including moving the internal interrupt handler address 80000180 to PC

1

ALUout  PC + ([Offset+]<< 2)

ID

Invalid instruction exception (invalid opcode, not one of nine instructions)

State 10

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The complete state diagram

    • The high-level state diagram for integer instructions has a branch out from state 6

      • If the CPU performs a 2’s Complement addition or a subtraction and there is an overflow, it will generate an internal interrupt (an exception) since the result is not correct

        • In order to generate the internal interrupt, it will go to state 11 to prepare the CPU for the interrupt

          • In state 11, it will perform a number of microoperations, including moving the internal interrupt handler address 80000180 to PC

6

EX

ALUout  A op B

Arithmetic exception (signed overflow)

State 11

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • CPIi of Integer Instructions

    • With this implementation, the CPIi of the instructions can be calculated as

      • CPILW = 5 because we trace states 0, 1, 2, 3, 4

      • CPISW = 4 because we trace states 0, 1, 2, 5

      • CPIA/L R Format = 4  because we trace states

        • 0, 1, 6, 7

      • CPIBEQ = 3 because we trace states 0, 1, 8

      • CPIJ = 3 because we trace states 0, 1, 9

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Control Signals

    • The semantics of each state is that which microoperation to perform is determined by the control unit, turning on and off a few MUX select, register store, ALU control inputs and enable control signals

      • Control signals are connected to MUXes, registers and ALUs

        • They are shown as angled signals in the handout

    • The EMY low-level state diagram describes which control signal is 1 when

IR

Unpipelined EMY CPU Design : Version 0

IRWrite

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The Clock Signal

    • The clock period duration is determined by the slowest but important microoperation in the CPU

      • All the signal delays in the datapath and control unit are added up to calculate the time for this important operation

        • It is usually the integer add microoperation

        • Though it could be the memory access time if it was a little longer than the integer addition time

          • Usually, the memory is much slower than the CPU in commercial systems but we will not consider it when we calculate the clock period duration since we will deal with slow memory when we cover the memory hierarchy topic

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • The Clock Signal

    • Thus, we will assume the integer addition and memory access both take one clock period each !

      • For today’s microprocessors this is not the case though !

    • If a microoperation takes more than one clock period, we draw a loop-back arrow to indicate so

      • For example, if the memory takes more than one clock period, there will be loop back lines drawn for states 0, 3 and 5 to indicate that the CPU spends more than one clock period

0

IR  M[PC] ;

PC  PC + 4 ;

IF

Unpipelined EMY CPU Design : Version 0

SW

LW

3

5

MEM

MDR  M[ALUout]

M[ALUout]  B

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Clock Signal

    • The clock period duration is determined by the addition of all the delays in the control unit and the delays in the datpath for the integer add microoperation

      • The delays in the control unit include the delays to generate the MUX select, register clock input, ALU control and enable control signals

        • Gate networks generate these select and clock control signals if hardwiring is used

        • The micromemory and additional circuits generate these select and clock control signals if microprogramming is used

      • The delays in the datapath include

        • Delay of data travel from registers to the ALU inputs

        • Delay of the adder in the ALU

        • Delay of the data travel from the ALU to the destination register in the datapath : ALUout

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

5

6

5

6

5

5

Opcode

Rs

Function

Rd

Rt

Shamt

Opcode

Rt

Rs

Displacement/Offset/Immediate

  • Architecture-Microarchitecture Interaction

    • An example of how architectural decisions can affect the microarchitecture design is the following

      • The Rs and Rt fields of R-format and I-format instructions are in the same position

        • Therefore, we do not need to use separate read ports from the register file

          • We have one read port for Rs and one read port for Rt

5

16

6

5

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Using the state diagram

    • Consider the following piece of program in the EMY memory

---

400000 LW R8, 0(R9) ; R8  M[R9 + 0+] ; M[R9] has C

400004 ADD R10, R8, R11 ; R10  R8 + R11

400008 ADD R12, R13, R14 ; R12  R13 + R14

40000C SW R12, 0(R15) ; M[R15 + 0+]  R12 ; M[R15]  R12

400010 BEQ R12, R0, 3 ; If R12 is equal to R0, branch to 400020

---

100000150 C ; The content of this location is C

---

1000A200 ?

Unpipelined EMY CPU Design : Version 0

Assume that R9 has 10000150 and R15 has 1000A200 initially

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Using the state diagram

    • This piece of program takes 20 clock periods as the table below shows the execution of the program with respect to time

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Using the state diagram

    • If the clock frequency is 1GHz

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • We have covered unpipelined CPU

    • The remaining slides will be used when we cover the memory hierarchy topic

      • So far we have assumed that

        • The memory takes one clock period to access

        • There is one solid memory

      • What if it took two clock periods or more ?

      • What if there were instruction and datacache memories ?

        • Then we need to take a look at the execution timing in detail again

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • If we assume that

    • Instruction and data cache memories take one clock period each and there is no cache miss, the execution timing will be as before

  • What if the two cache memories took two clock periods each ?

    • The execution timing will be identical to the clock doubling case studied in class

      • LW would take 7 clock periods since we trace states 0, 0, 1, 2, 3, 3, 4

        • States 0 and 3 are repeated twice since the cache memories take two clock periods each

      • SW would take 6 clock periods since we trace states 0, 0, 1, 2, 5, 5

        • States 0 and 5 are repeated twice since the cache memories take two clock periods each

      • ADD would take 5 clock periods since we trace states 0, 0, 1, 2, 6

        • State 0 is repeated twice since the cache memory takes two clock periods

      • BEQ would take 4 clock periods since we trace states 0, 0, 1, 8

        • State 0 is repeated twice since the cache memory takes two clock periods

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Using the state diagram

    • If the cache memories are slow (they take two clock period per access) and there is no cache miss, then this piece of program will take 27 clock periods as the table below shows the execution of the program with respect to time

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • We have so far assumed that the cache memories do not have misses !

    • What if both instruction and data cache memories result is cache misses ?

      • That is, there is a cold start !

        • What is the new execution time ?

    • To calculate the new execution time we have to study the structure of the cache memories

      • The size of the physical (main) memory, the size of the cache memories, the size of cache blocks, the type of mapping (direct, associative, block-set associative), the block replacement strategy, etc.

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • What if both instruction and data cache memories result is cache misses ?

    • For this semester

      • We will concentrate on Level 1 cache memories, i.e. instruction and data cache memories

      • We will assume that there is no Level 2 cache memory miss !

      • We will indicate all physical addresses used

      • For this presentation assume that

        • The physical (main) memory has 256 Mbytes

        • The physical memory has 4 Bytes per location

        • The bus width between the physical and lowest level cache is 4 Bytes

        • The instruction cache is 8KBytes

        • The data cache is 16KBytes

        • Both cache block sizes are 32 bytes

        • Both cache memories use direct mapping

        • Both caches use write-back with write-allocate

        • Both cache memories access the needed item first

        • The physical memory latency is 4 clock periods and transferring an 4-Byte content is one clock period each

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Instruction and data cache misses ?

    • The physical memory has 256MBytes or 228 Bytes

      • The physical address is 28 bits long

      • The physical memory has 228/32 = 228/25 = 223 blocks

      • The instruction cache has 8KB/32 = 213/25 = 28 = 256 blocks

      • The data cache has 16KB/32 = 214/25 = 29 = 512 blocks

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Instruction and data cache misses ?

    • The physical address is used by the physical memory and instruction cache as follows

    • The physical address is used by the physical memory and data cache as follows

23 bits

Main memory block number

1585

Instruction

cache block #

Byte offset

Address tag

Unpipelined EMY CPU Design : Version 0

23 bits

Main memory block number

1495

Data cache

block #

Byte offset

Address tag

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Instruction and data cache misses ?

    • The instruction cache has 32-Byte blocks

      • Each block contains 8 instructions since each instruction is 4 Bytes long

      • Instructions in physical memory locations 100 through 110 are in one instruction cache block

4 bytes

00000100

LW R8, 0(R9)

Instruction cache blocks have 32 bytes and so each block holds 8 instructions !

Instructions in 100, 104, 108, 10C, 110, 114, 118 and 11C are in one instruction cache block !

ADD R10, R8, R11

00000104

00000108

ADD R12, R13, R14

Unpipelined EMY CPU Design : Version 0

0000010C

SW R12, 0(R15)

00000110

BEQ R12, R0, 3

?

00000114

00000118

?

?

0000011C

Which instruction cache block is this ?

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Instruction and data cache misses ?

    • The instruction cache has 32-Byte blocks

      • Instructions in physical memory locations 100 through 110 are in main memory block number 8 and in instruction cache memory block number 8

0000100 LW R8, 0(R9)

Main memory block number : 8

0 0 0 0 1 0 0

0000 0000 0000 0000 0001 0000 0000

Unpipelined EMY CPU Design : Version 0

Address tag

Instruction

cache block # 8 since 00001000 is 8 in decimal

5 bits !The byte offset is 5 bits long. The LW instruction has 0 offset from the beginning of the block, i.e. the first instruction of the block

Instructions in 100, 104, 108, 10C, 110 are in instruction cache block 8 !

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Instruction and data cache misses ?

    • How long does it take to access individual instructions ?

      • Both cache memories access the needed item first

      • The physical memory latency is 4 clock periods and transferring an 4-Byte content is one clock period each

Five clock periods !

00000100

LW R8, 0(R9)

Six clock periods !

00000104

ADD R10, R8, R11

00000108

Seven clock periods !

ADD R12, R13, R14

0000010C

SW R12, 0(R15)

Eight clock periods !

00000110

BEQ R12, R0, 3

Nine clock periods !

00000114

?

Ten clock periods !

00000118

?

Unpipelined EMY CPU Design : Version 0

Start access

?

0000011C

Eleven clock periods !

Twelve clock periods !

Block fill time = 12 clock periods

Time

Latency

M[100] is the needed item and accessed & transferred first!

Transfer

M[104]

Transfer

M[108]

Transfer

M[10C]

Transfer

M[110]

Transfer

M[114]

Transfer

M[118]

Transfer

M[11C]

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Instruction and data cache misses ?

    • The data cache has 32-Byte blocks

      • Each block contains 4 data elements since each data element is 4 Bytes long

      • The data element in physical memory location 1150 is in one data cache block

4 bytes

Data cache blocks have 32 bytes and so each holds 8 data elements !

Data elements in 1140, 1144, 1148, 114C, 1150, 1154, 1158 and 115C are in one data cache block !

00001140

00001144

00001148

Unpipelined EMY CPU Design : Version 0

0000114C

C

00001150

00001154

00001158

0000115C

Which data cache block is this ?

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Instruction and data cache misses ?

    • The data cache has 32-Byte blocks

      • The data element in physical memory location 1150 is in main memory block number 138 and in data cache block number 138

0001150 C

Main memory block number : 138

0 0 0 1 1 5 0

0000 0000 0000 0001 0001 0101 0000

Unpipelined EMY CPU Design : Version 0

Data cache block # 138 since 010001010 is 138 in decimal

5 bits !The byte offset is 5 bits long. The data element has 16-Byte offset from the beginning of the block, i.e. the fifth data element of the block

Address tag

Data element in 150 is in data cache block 138 !

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Instruction and data cache misses ?

    • How long does it take to access individual data element ?

      • Both cache memories access the needed item first

      • The physical memory latency is 4 clock periods and transferring an 4-Byte content is one clock period each

Nine clock periods !

00001140

Ten clock periods !

00001144

Eleven clock periods !

00001148

0000114C

Twelve clock periods !

C

00001150

Five clock periods !

00001154

Six clock periods !

00001158

Unpipelined EMY CPU Design : Version 0

Start access

0000115C

Seven clock periods !

Block fill time = 12 clock periods

Eight clock periods !

Time

Latency

M[1150] is the needed item and accessed & transferred first!

Transfer

M[1154]

Transfer

M[1158]

Transfer

M[115C]

Transfer

M[1140]

Transfer

M[1144]

Transfer

M[1148]

Transfer

M[114C]

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Instruction and data cache misses ?

    • The data cache has 32-Byte blocks

      • Each block contains 4 data elements since each data element is 4 Bytes long

      • The data element in physical memory location 2200 is in one data cache block

4 bytes

?

Data cache blocks have 32 bytes and so each holds 8 data elements ! Data elements in 2200, 2204, 2208, 220C, 2210, 2214, 2218 and 221C are in one data cache block !

00002200

00002204

00002208

Unpipelined EMY CPU Design : Version 0

0000220C

00002210

00002214

00002218

0000221C

Which data cache block is this ?

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Instruction and data cache misses ?

    • The data cache has 32-Byte blocks

      • The data element in physical memory location 2200 is in main memory block number 272 and in data cache block number 272

0002200 ?

Main memory block number : 272

0 0 0 2 2 0 0

0000 0000 0000 0010 0010 0000 0000

Unpipelined EMY CPU Design : Version 0

Address tag

Data cache block # 272 since 100010000 is 272 in decimal

5 bits !The byte offset is 5 bits long. The data element has 0 offset from the beginning of the block, i.e. the first data element of the block

Data element in 2200 is in data cache block 272 !

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Instruction and data cache misses ?

    • How long does it take to access individual instructions ?

      • Both cache memories access the needed item first

      • The physical memory latency is 4 clock periods and transferring an 4-Byte content is one clock period each

Five clock periods !

00002200

?

Six clock periods !

00002204

Seven clock periods !

00002208

0000220C

Eight clock periods !

00002210

Nine clock periods !

00002214

Ten clock periods !

00002218

Unpipelined EMY CPU Design : Version 0

Start access

0000221C

Eleven clock periods !

Block fill time = 12 clock periods

Twelve clock periods !

Time

Latency

M[2200] is the needed item & accessed & transferred first!

Transfer

M[2204]

Transfer

M[2208]

Transfer

M[220C]

Transfer

M[2210]

Transfer

M[2214]

Transfer

M[2218]

Transfer

M[221C]

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Instruction and data cache misses ?

    • How long does it take to run the program with a cold start ?

      • This piece of program will take 32 clock periods as the table below shows the execution of the program with respect to time

Unpipelined EMY CPU Design : Version 0

CS 2214


Outline introduction version 0 emy cpu unpipelined emy cpu it executes only integer instructions how a memory hierarch

  • Unpipelined EMY CPU is complete

    • We studied how the architecture affects the organization

    • We designed the EMY CPU for nine integer instructions

    • We considered how cache memories affect the unpipelined EMY CPU execution

    • Another PowerPoint presentation will cover

      • The pipelined EMY CPU

      • How cache memories affect the pipelined EMY CPU execution

CS 2214


  • Login