Superscalar Microprocessors

Superscalar Microprocessors • Robert Hock 4/23/02

Superscalar Microprocessors • Topics Covered • Superscalar Processor Overview • MIPS R10000 • Intel IA32 • PowerPC

What does superscalar mean? • Definition: • Superscalar machines are able to issue multiple instructions for each clock cycle from a conventional linear instruction stream

In English This Time • A superscalar processor can run code out of sequence in order to optimize it. Instructions of various lengths introduce latency into the program execution. By piplining these instructions, it is possible to execute multiple instructions out of sync.

How Does it Work? • Instructions are introduced in sequence • These instructions are scheduled dynamically by the hardware • More than one instruction can be issued each clock cycle • The number of instructions issued is also set dynamically by the hardware

Phases of the Superscalar Pipeline • Fetch • Pre-fetch • Decode • Rename • Issue • Execute • Complete • Reorder • Commit • Retire • Write-Back

Fetch & Decode • Fetching & Decoding can be done faster than Execution • Processor Fetches & Decodes more instructions than it Commits, because it discards instructions from mispredicted branch paths

Pre-Fetch & Pre-Decoding • Pre-Decoding is done when instructions are transferred from memory to the cache • The Pre-Decoded instruction is more simple than the original • The Decoder can decode this format faster than the original

Renaming • Renaming is the process of giving physical registers to take the place of logical registers

Issue • Waiting instructions are analyzed to find instructions beyond the current instructions that can be executed independantly • This is “Look-Ahead” capability • Instructions can be issued in-order or out-of-order

Execute • Instruction is Executed in either a single cycle, or may take multiple cycles • After Execution, the Completion phase is reached

Reorder • The Reorder logic sorts whether the instruction was on a predictive branch, and whether that branch was correct • Execution exceptions are marked

Commit • An executed instruction is committed when: • All previous instructions required by the program have already been committed • No interrupt has occurred • If instruction was executed from a branch prediction and the branch was correct

Retire • An instruction is Retired when: • The instruction has been committed • The instruction has been removed because of branch prediction or exception

Write-Back • As the name implies, final instruction data is written back

MIPS R10000 Overview • 64-bit instruction set • Can decode 4 instructions per cycle • Has 5 execution pipelines • Uses dynamic scheduling and out-of-order execution • Does speculative branching

MIPS R10000 Pipeline Diagram

R10000 Functional Units • Integer ALU1 • Integer ALU2 • Load/Store Unit • Float Adder • Float Multiply

R10000 Pipeline Stages • Stage 1 • Fetch 4 Instructions per cycle • Stage 2 • 4 Instructions are Decoded & Renamed • Only 1 Branch Instruction can be decoded per cycle • Stage 3 • Decoded Instructions Issued

R10000 Pipeline Stages(cont) • Stages 4-6 (dependant on instruction) • Float Multiply (3 stage pipeline) • Float Adder (3 stage pipeline) • Integer ALU1 (1 stage pipeline) • Integer ALU2 (1 stage pipeline)

Intel IA-32 Overview • 32-bit instruction set. • 3-Way Pipelined • 12 stage pipeline • “Optimized” Scheduling, that necessitates retiring instructions in linear order

IA-32 Functional Units • Integer • Float • Load • Store1 • Store2 • Jump • MMX (Multimedia Instructions)

IA-32 Pipeline Stages • Stages 1-5 • Fetch and Predecode • Stages 6&7 • Decode • Stage 8 • Renaming

IA-32 Pipeline (cont) • Stages 9&10 • Issue • Stage 11 • Execution • Stage 12 • Retirement

IA-32 Latencies • Integer Arithmetic – 1 • Integer Mult – 4 • Float Add – 3 • Float Mult – 5 • Load & Store – 3 • MMX Arithmetic –1 • MMX Mult – 3

PowerPC 750 Overview • 64-bit RISC Processor • 32-bit addressing

Functional Units • Float (3 Stage Pipeline) • Branch • Load/Store • Single Cycle Integer • Multi Cycle Integer

PowerPC Pipeline • Fetch • Issue • Integer OP (+3 Depth) • Load OP (+7 Depth) • Store OP (+5 Depth) • Float OP (+6 Depth)

Conclusion • While the R10000 and PowerPC are truly RISC based, the IA-32 has its roots in the CISC world. • The IA-32 has a deeper pipeline, allowing for increased clock cycles, which allows for increased sales. This is despite the fact that it delivers only mediocre performance.

Conclusion (cont) • For intensive numerical computation and 3D rendering the MIPS R10000 is superior • For everyday applications that would require low-voltage/heat, the PowerPC line has an edge. • For the home user, the IA-32 will be sufficient until the AMD 64-bit Hammer line is introduced.

For More Information • http://www.mips.com • http://www.intel.com • http://www.ibm.com • http://e-www.motorola.com/

Superscalar Microprocessors

Superscalar Microprocessors

Presentation Transcript

Superscalar Processors

Microprocessors

SUPERSCALAR ARCHITECTURE

Microprocessors

AccuPower: An Accurate Power Estimation Tool for Superscalar Microprocessors*

Superscalar Implementation

Superscalar Processor Design Superscalar Architecture

Microprocessors

Superscalar Processor

Superscalar Processors

Reducing Issue Logic Complexity in Superscalar Microprocessors

Superscalar - summary

Microprocessors

Banked Multiported Register Files for High-Frequency Superscalar Microprocessors

Microprocessors

Superscalar Processors

Microprocessors

Superscalar Processors

Microprocessors