CMPT 250 Computer Architecture

CMPT 250 Computer Architecture Instructor: Yuzhuang Hu yhu1@cs.sfu.ca

Assembly Lines • An assembly line is a manufacture process in which parts are added into a product in a sequential manner using optimally planned logistics to create a finished product much faster than handcrafting-type methods. • The Ford Motor Company built the world’s first assembly line between 1908 and 1915. • This pipeline made the Ford Model T affordable and brought high wages to Ford workers.

Some Pictures of the Ford 1913 Assembly Line

A Calculation • Consider assembly the car. Assume it has three steps: install the engine, install the hood, and install the wheel. • One car takes 35 minutes. Three cars take 105 minutes, if only one car can be operated at once. Install the hood Install the engine Install the wheel 5 minutes 20 minutes 10 minutes

A Calculation contd. • What if we have three workers for each part? • Ideally, a car can be assembled in every 20 minutes. 5 25 35 45 55 65 75 Install the hood Install the engine Install the wheel 1st car Install the hood Install the engine Install the wheel 2nd car Install the hood Install the engine Install the wheel 3rd car

Pipeline Design • Separate the process into different stages of almost the same length. • These stages are separated by registers. • These registers provide temporary storage for data passing through the pipeline and are called pipeline platforms.

A Pipelined Datapath • Conventional:0.6, 0.6, 0.2, 0.8, 0.2 ns (new) in total: 2.4 ns rate: 416.7 MHz • Pipelined: 0.6, 0.6, 0.2, 0.2, 0.8, 0.2, 0.2 ns (new version) in total: 1 ns rate: 1 GHz 0.6 0.6 0.6 0.6 0.2 0.2 0.2 0.8 0.8 0.2 0.2 0.2

D Latch • Eliminate the undesirable undefined state in the SR latch: ensure S and R are never 1 at the same time. D Q C Q

Negative-Edge-Triggered D Flip-Flop • 1s-Catching behaviour is eliminated as S and R can not both be 0 in a D Flip-Flop. D D S C C C R

Assume no data hazards.

How much can we gain? • Conventional: 2.4 * 7 ns Pipeline: 9 * 1 ns

Assume no data and control hazards.

Pipeline contd. • In the first four clock cycles, the pipeline is filling. • In the next four clock cycles, all stages of the pipeline are active. The pipeline is fully utilized. • In the last three clock cycles, not all stages of the pipeline are active, since the pipeline is emptying.

The Reduced Instruction Set Computer (RISC) • The goal of a RISC architecture is high throughput and fast execution. To achieve these goals, accesses to memory are to be avoided. • A RISC architecture has the following properties: • Memory accesses are restricted to load and store instructions, and data-manipulation instructions are register-to-register. • Addressing modes are limited in number. • Instruction formats are all of the same length. • Instructions perform elementary operations.

A RISC Instruction Set Architecture • 32 registers R0 through R31. R0 is a special register storing the value zero.

Datapath Organization • The new datapath has 32 32-bit registers. The address inputs are therefore five bits. • The replacement of the single-bit position shifter with a barrel shifter to permit multiple-position (SH) shifting. • In the function unit, the ALU is expanded to 32 bits. • The constant unit performs zero fill for CS=0 and sign extension for CS=1. • MUX A is added to provide a path from the updated PC, PC-1, for implementation of the JML instruction.

Datapath Organization contd. • Adding an additional input to MUX D to implement the Set if Less Than (SLT) instruction. It is 1 when N is 1 and V is 0, or N is 0 and V is 1. • A final difference is that the register file is no longer edge triggered and is no longer a part of a pipeline platform at the end of the write-back (WB) stage. • In the second half of the cycle, it is possible to read data written into the register file during the first half of the same clock cycle. It is called a read-after-write register file.

Control Organization • SH is added to IR, CS is added to the instruction decoder, MD is expanded to two bits. • MUX C selects from three different sources for the next value of PC. • BrA is formed from the sum of the updated PC value for the branch instruction and the target offset. • BAA is used for the register jump. • BS, PS and Z are used to select the next PC value.

Control Organization contd. • To determine the control codes, the CPU is viewed much as is the single cycle CPU. • However, it is important to examine the timing carefully to be sure that various parts of the register transfer statement take place in the right stage of the pipeline. • Note that BrA and RAA are obtained in the EX stage.

More on Instruction Set Architecture • The format of an instruction is depicted in a rectangular box symbolizing the bits of the binary instruction. • The bits are divided into groups called fields. • An opcode field. • An address field. • A mode field, which specifies the way the address field is to be interpreted.

Operand Addressing • To illustrate the influence of the number of operands on computer programs, we will evaluate the arithmetic statement X=(A+B)(C+D). • Three address instructions: • ADD T1, A, B M[T1]<-M[A]+M[B] • ADD T2, C, D M[T2]<-M[C]+M[D] • MUL X, T1, T2 M[X]<=M[T1]*M[T2] Or • ADD R1, A, B R1<-M[A]+M[B] • ADD R2, C, D R2<-M[C]+M[D] • MUL X, R1, R2 M[X]<=R1*R2

Operand Addressing contd. • Two-Address Instructions • MOVE T1, A M[T1]<-M[A] • ADD T1, B M[T1]<-M[T1]+M[B] • MOVE X, C M[X]<-M[C] • ADD X, D M[X]<-M[X]+M[D] • MUL X, T1 M[X]<-M[X]*M[T1] • One-Address Instructions • LD A ACC<-M[A] • ADD B ACC<-ACC+M[B] • ST X M[X]<-ACC • LD C ACC<-M[C] • ADD D ACC<-ACC+M[D] • MUL X ACC<-ACC*M[X] • ST X M[X]<-ACC

Zero-Address Instructions • We use a stack. The top of the stack is referred to as TOS. The word below is TOS-1. • PUSH A TOS<-M[A] • PUSH B TOS<-M[B] • ADD TOS<-TOS+TOS-1. • PUSH C TOS<-M[C] • PUSH D TOS<-M[D] • ADD TOS<-TOS+TOS-1 • MUL TOS<-TOS*TOS-1 • POP X M[X]<-TOS

Addressing Modes • The addressing mode of an instruction specifies a rule for interpreting or modifying the address field of the instruction. • The address of the operand produced by such a rule is called the effective address. • Give programming flexibility to the user. • To reduce the number of bits in the address fields of the instruction.

Addressing Modes contd. • Implied Mode: the operand is specified implicitly in the opcode, e.g. ADD in a stack computer. • Immediate Mode: LDI R0, 3 • Register and Register-Indirect Modes • Register Mode: the address field specifies a register. • Register-Indirect Mode: the address field specifies a register whose content gives the address of the operand in memory. • Auto Increment/Decrement Mode: ADD (R1)+,3 M[R1]<-M[R1]+3, R1<-R1+1

Addressing Mode contd. • Direct Addressing Mode: the address field of the instruction gives the address of the operand in memory. • Indirect Addressing Mode: the address field of the instruction gives the address at which the effective address is stored in memory. • Relative Addressing Mode: Effective address = Address part of the instruction + PC

Addressing Mode contd. • Index Addressing Mode: the content of an index register is added to the address part of the instruction to obtain the effective address. • The index register may be a special CPU register or simply a register in a register file, e.g. for arrays. • The Base-Register Mode: the contents of a base register are added to the address part of the instruction to obtain the effective address.

Addressing Modes Examples 250 • Opcode: Load to ACC 251 252 PC=250 Memory 400 R1=400 500 ACC 752 800 900

Addressing Modes Examples contd.

CISC Architecture • The goal of the CISC architecture is to match more closely the operations used in programming language and to provide instructions that facilitate compact programs and conserve memory. • A purely CISC architecture has the following properties: • Memory access is directly available to most types of instructions. • Addressing modes are substantial in number. • Instruction formats are of different lengths. • Instructions perform both elementary and complex operations.

THANKS!

CMPT 250 Computer Architecture