processor architectures l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Processor architectures PowerPoint Presentation
Download Presentation
Processor architectures

Loading in 2 Seconds...

play fullscreen
1 / 38

Processor architectures - PowerPoint PPT Presentation


  • 298 Views
  • Uploaded on

Processor architectures SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic Overview Introduction Basic structure of a processor Basic Operations Pipelining Registers Example design on an application-specific processor

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Processor architectures' - richard_edik


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
processor architectures

Processor architectures

SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications

Miodrag Bolic

overview
Overview
  • Introduction
    • Basic structure of a processor
    • Basic Operations
    • Pipelining
    • Registers
  • Example design on an application-specific processor
  • General purpose processors
    • Example of FIR on a general purpose processor
    • Datapath of a MIPS processor
what is computer architecture
What is “Computer Architecture”?
  • Coordination of many levels of abstraction
  • Under a rapidly changing set of forces

Application

Operating

System

Compiler

Firmware

Instruction Set

Architecture

Instr. Set Proc.

I/O system

Datapath & Control

Digital Design

Circuit Design

Layout

levels of abstraction
Levels of abstraction
  • Delving into the depths reveals more information
  • An abstraction omits unneeded detail, helps us cope with complexity
basic structure of a computer
Basic Structure of a Computer

References[Patterson04]

basic structure of a computer 2
Basic Structure of a Computer (2)
  • Input Unit
    • Keyboards, joysticks, trackballs, microphones and mice
  • Output Unit
    • Printers and graphic displays
  • Memory Unit
    • Primary (cache, RAM, HDD) and secondary (CD-ROM, tape drives)
  • Arithmetic and Logic Unit (ALU)
    • Executions completed here and stored in fast-access registers
  • Control Unit (CU)
    • Provides control to all other units, including timing signals

References[Patterson04]

basic operation of a computer
Basic Operation of a Computer
  • The computer accepts information in the form of programs and data through an input unit and stores it in memory
  • The information stored in memory is fetched, under program control, and processed in an ALU
  • The processed information leaves the computer through an output unit
  • All activities inside the computer are directed by the control unit

References[Patterson04]

detailed instruction cycle
Detailed Instruction Cycle

Copied from References[Patterson04]

detailed instruction cycle 2
Detailed Instruction Cycle (2)
  • Instruction address calculation
    • Determines the address of the next instruction to be executed
  • Instruction fetch
    • Reads the instruction from its memory location into the processor
  • Instruction operation decoding
    • Analyzes the instruction to determine the type of operation to be performed and the operand(s) to be used
  • Operand address calculation
    • Determines the address of the operand (if needed)
  • Operand fetch
    • Fetches the operand from memory or read it from I/O
  • Data operation
    • Performs the operation indicated in the instruction
  • Operand store
    • Write the results into memory or out to I/O

References[Patterson04]

fast pipelined instruction interpretation
Fast, Pipelined Instruction Interpretation

Next Instruction

NI

NI

NI

NI

NI

IF

IF

IF

IF

IF

D

D

D

D

D

Instruction Fetch

E

E

E

E

E

W

W

W

W

W

Decode &

Operand Fetch

Execute

Store Results

Instruction Address

Instruction Register

Time

Operand Registers

Result Registers

Registers or Mem

Copied from References[Culler-Slides]

visualizing pipelining
Visualizing Pipelining

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Ifetch

Ifetch

Ifetch

Ifetch

DMem

DMem

DMem

DMem

ALU

ALU

ALU

ALU

Cycle 1

Cycle 2

Cycle 3

Cycle 4

Cycle 5

Cycle 6

Cycle 7

Time (clock cycles)

I

n

s

t

r.

O

r

d

e

r

Copied from References[Culler-Slides]

terminology
Terminology
  • Performance - Time
    • MIPS
    • MFLOPS
    • Cycles per Instruction (CPI)
  • Architectures
    • RISC – Reduced Instruction Set Computer
    • CISC – Complex Instruction Set Computer
    • Scalar
    • Superscalar
    • Very-long instruction word
comparison cisc risc vliw
Comparison: CISC, RISC, VLIW

Copied from [Philips]

sequential application specific processor
Sequential application specific processor
  • A processor tuned only for a particular application
  • Can be used for low-power implementations
  • Word lengths can be adjusted to the current problem.
  • Example: FIR filter
direct form fir filter
Direct form FIR filter

Copied from [Wanhammer99]

transposed fir
Transposed FIR

Copied from [Wanhammer99]

assignment
Assignment
  • Design an N-tap transposed linear-phase FIR filter as a sequential application specific processor. Use only one multiplier and show how processing time can be decreased twice.

Hint: design a transposed FIR filter structure as in the previous slide but allow for generating the sums in reversed order PSN-1, PSN-2, …, PS1, y(n).

Copied from [Wanhammer99]

general purpose processor architecture
General purpose processor architecture
  • FIR example
  • We will study RISC architectures
  • Single-cycle processor
    • Implementation of add and load instructions
  • Pipelined implementation
    • Why do all instructions have the same number of cycles
example digital filtering
Example: Digital Filtering
  • The basic FIR Filter equation is

Where h[k] is an array of constants

y[n]=0;

For (n=0; n<N;n++)

{

For (k = 0;k<N;k++)

//inner loop

y[n] = y[n] + h[k]*x[n-k];}

Only Multiply and Accumulate (MAC) is needed!

In C language

the mips instruction formats
The MIPS Instruction Formats

31

26

21

16

11

6

0

op

rs

rt

rd

shamt

funct

6 bits

5 bits

5 bits

5 bits

5 bits

6 bits

31

26

21

16

0

immediate

op

rs

rt

6 bits

5 bits

5 bits

16 bits

31

26

0

op

target address

6 bits

26 bits

  • All MIPS instructions are 32 bits long. The three instruction formats are:
    • R-type
    • I-type
    • J-type
  • The different fields are:
    • op: operation of the instruction
    • rs, rt, rd: the source and destination register
    • shamt: shift amount
    • funct: selects the variant of the operation in the “op” field
    • address / immediate: address offset or immediate value
    • target address: target address of the jump instruction

Copied from References[Shulte-Slides]

translating mips assembly into machine language
Translating MIPS Assembly into Machine Language
  • Humans see instructions as words (assembly language), but the computer sees them as ones and zeros (machine language).
  • An assembler translates from assembly language to machine language.
  • For example, the MIPS instruction add $t0, $s1, $s2 is translated as follows

Assembly Comment

add op = 0, shamt = 0, funct = 32

$t0 rd = 8

$s1 rs = 17

$s2 rt = 18

000000

10001

10010

01000

00000

100000

op

rs

rt

rd

shamt

funct

Copied from References[Shulte-Slides]

mips addressing modes instruction formats
MIPS Addressing Modes/Instruction Formats
  • All MIPS instructions are 32 bits wide - fixed length

add $s1, $s2, $s3

Register (direct)

op

rs

rt

rd

register

Immediate

addi $s1, $s2, 200

op

rs

rt

immed

Base+index

op

rs

rt

immed

Memory

register

+

lw $s1, 200($s2)

PC-relative

op

rs

rt

immed

Memory

PC

+

beq $s1, $s2, 200

Copied from References[Shulte-Slides]

slide24

Clk

PC

Instruction address

Instruction

Memory

Instruction

Rd

Rt

Rs

Imm

5

5

5

16

32

Rw Ra Rb

32 32-bit

registers

Data

address

Data

Memory

32

32

Data out

Data in

32

Clk

32

Clk

Architecture of the MIPS core

Copied from [Meerbergen-Slides]

slide25

31 26 21 16 11 6 0

Op rs rt rd shamt funct

6 bits 5 bits 5 bits 5 bits 5 bits 6 bits

  • add rd, rs, rt
    • mem[PC]
    • R[rd] = R[rs] + R[rt]
    • PC = PC + 4

Rd

Rt

Rs

5

Reg Wr

5

5

ALUctr

BusA

32

Rw Ra Rb

32 32-bit

registers

Bus W

Result

32

32

BusB

32

Clk

Example 1 : R - type : add instruction

Copied from [Meerbergen-Slides]

slide26

Critical path R-type operation

Clk

PC

Instruction address

Instruction

Memory

Instruction

Rd

Rt

Rs

Imm

5

5

5

16

32

Rw Ra Rb

32 32-bit

registers

Data

address

Data

Memory

32

32

Data out

Data in

Clk

32

Clk

Copied from [Meerbergen-Slides]

slide27

Critical path R-type operation

Clock

Clock-to-Q

PC

New value

Old value

Instruction memory access time

Rs, rt, rd

op, funct

Old value

New value

RFile access time

Bus A,B

Old value

New value

ALU delay

Bus W

Old value

New value

Set up + skew

Write into RFile

Copied from [Meerbergen-Slides]

slide28

31 26 21 16 0

Op rs rt immediate

6 bits 5 bits 5 bits 16 bits

Rd

Rt

RedDst

dc (Rt)

Rs

5

Reg Wr

5

5

ALUctr

MemtoReg

BusA

32

Rw Ra Rb

32 32-bit

registers

Bus W

Result

32

32

MemWr

BusB

32

Clk

WrEn Adr

Data

Memory

Data In

32

Imm 16

16

32

Extender

Clk

ExtOp

ALUSrc

Example 2 : I-type : load word

  • lw rs, rt, imm16
    • mem[PC]
    • addr = R[rs] + ext[imm16]
    • R[rt] = mem[addr]
    • PC = PC + 4

Copied from [Meerbergen-Slides]

slide29

Critical path load operation

Clock

Clock-to-Q

PC

Old value

New value

Instruction memory access time

Rs, rt, rd

op, funct

Old value

New value

RFile access time

Bus A,B

Old value

New value

ALU delay

address

Old value

New value

Mem access time

Bus W

Old value

New value

set up+skew

Copied from [Meerbergen-Slides]

slide30

cycle 1

cycle 2

cycle 3

cycle 4

cycle 5

Ifetch

RF read

ALU

dmem

RF write

E.g. load

5 stages

Architecture of the MIPS core

  • problem : long critical path
        • defined by the slowest instruction (load)
  • solution ?
  • = pipelining
    • break the instruction into smaller steps
    • all steps have about the same critical path

Copied from [Meerbergen-Slides]

slide31

Pipelining lw instructions

[Hennessy&Patterson]

cycle 1

cycle 2

cycle 3

cycle 4

cycle 5

cycle 6

cycle 7

lw

Ifetch

RF read

ALU

dmem

RF write

lw

Ifetch

RF read

ALU

dmem

RF write

lw

Ifetch

RF read

ALU

dmem

RF write

  • One instructions enters the pipeline every clock cycle
  • One instructions leaves the pipeline every clock cycle
  • => CPI = 1 (Cycles per Instruction)

Copied from [Meerbergen-Slides]

slide32

I

I

I

I

I

R

R

R

R

R

A

A

A

A

A

M

M

M

M

M

W

W

W

W

W

Pipelining lw instructions

I

R

A

M

W

Instructions

Data

Current CPU cycle

Copied from [Meerbergen-Slides]

slide33

4 stages of R-type instruction

cycle 1

cycle 2

cycle 3

cycle 4

Ifetch

RF read

ALU

RF write

E.g. ADD

Copied from [Meerbergen-Slides]

slide34

Resource conflict

on the write port of the Rfile

Pipelining lw and R-type instructions

[Hennessy&Patterson]

cycle 1

cycle 2

cycle 3

cycle 4

cycle 5

cycle 6

cycle 7

lw

Ifetch

RF read

ALU

dmem

RF write

add

Ifetch

RF read

ALU

RF write

Copied from [Meerbergen-Slides]

slide35

cycle 1

cycle 2

cycle 3

cycle 4

cycle 5

cycle 6

cycle 7

lw

Ifetch

RF read

ALU

dmem

RF write

add

Ifetch

RF read

ALU

dmem

RF write

add

Ifetch

RF read

ALU

dmem

RF write

Solution: stretch R-type to 5 stages

Ifetch

RF read

ALU

dmem

RF write

Dummy op (noop)

Copied from [Meerbergen-Slides]

slide36

mem

wr

Ifetch

exec

Reg/dec

RegWr

branch

Next PC

Rfile

+ 4

flags

Rs

BusA

Ra

Rt

Rb

BusB

adr

Prog

mem

Di

Rw

Data

mem

Dout

ext.

Imm16

Din

Rt

Rd

MemtoReg

[Hennessy&Patterson]

MemWr

RegDst

ALUSrc

ExtOp

ALUop

Copied from [Meerbergen-Slides]

slide37

DM

DM

DM

DM

DM

RF

RF

RF

RF

RF

IM

IM

IM

IM

IM

RF

RF

RF

RF

RF

Data dependencies : R-type instructions

[Hennessy&Patterson]

R1 = ...

… = R1 + ...

… = R1 + ...

… = R1 + ...

… = R1 + ...

Copied from [Meerbergen-Slides]

references
References

[Culler-Slides] D. E. Culler, Computer Architecture, Lecture slide, Computer Science at Berkeley.

[Hamacher01] C. Hamacher, Z. Vranesic, S. Zaky, Computer Organization, McGraw-Hill Science/Engineering/Math; 5th edition, August 2, 2001.

[Patterson04] D. A. Patterson, J. L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, Morgan Kaufmann; 3rd edition, August 2, 2004.

[Shulte-Slides] M. Schulte Computer Architecture ECE 201, Lecture slides.

The other reference can be found at: www.site.uottawa.ca/~mbolic/elg6131/References.htm