Advanced Topics in Pipelining

Advanced Topics in Pipelining • Two methods to exploit instruction-level parallelism • Superpipelining: longer (deeper) pipelines. • The ideal speedup is equal to the number of pipeline stages. • 8 or more pipeline stages are common in modern processors. • Superscalar: • multiple issue (CPI can be less than one) • Instruction execution rate exceeds the clock rate. • 6 GHz four-way multiple issue  CPI = 0.25, IPC = 4 • 24 billion instructions/second

Static Multiple Issue • Two-issue 5-stage MIPS processor • (R-type or branch) AND (Load or Store) • VLIW concept • Compiler to remove dependencies between instruction pairs

Static Two-Issue MIPS

Example: Static Two-Issue MIPS 1/2 • Extra reading and writing ports to register file. • Data dependencies results in more serious stalls • In superscalar pipeline, the next two instructions cannot use the result of lw instruction without stalling. • Example:Loop: lw $t0, 0($s1) addu $t0, $t0, $s2 sw $t0, 0($s1) addi $s1, $s1, -4 bne $s1, $zero, Loop • reorder the instructions to avoid as many pipeline stalls as possible

Example: Static Two-Issue MIPS 1/2 Loop: lw$t0, 0($s1) addu $t0, $t0, $s2 sw $t0, 0($s1) addi $s1, $s1, -4 bne $s1, $zero, Loop • CPI = 4/5 = 0.8  IPC = 1.25

Loop Unrolling 1/2 Loop: lw $t0, 0($s1) addi $s1, $s1, -4 addu $t0, $t0, $s2 sw $t0, 4($s1) bne $s1, $zero, Loop Loop: lw $t0, 0($s1) addi $s1, $s1, -16 lw $t1, 12($s1) addu $t0, $t0, $s2 lw $t2, 8($s1) addu $t1, $t1, $s2 lw $t3, 4($s1) addu $t2, $t2, $s2 sw $t0, 16($s1) addu $t3, $t3, $s2 sw $t1, 12($s1) sw $t2, 8($s1) sw $t2, 8($s1) bne $s1,$zero, Loop Register Renaming

Loop Unrolling 2/2 • CPI = 8/14  0.57  IPC = 1.75

Speculation • Guessing, for example, a branch outcome and execute instructions based on this guessing • Can be done by the compiler or hardware • compiler to reorder the instructions • Recovery mechanism to fix up when the speculation turns out to be wrong • The results obtained from speculative execution are kept in temporary buffers until they are no longer speculative. • Committing them when speculation is correct • discarding them otherwise

IA-64 Architecture • RISC-style instruction set • almost like a MIPS 64 • differences • IA-64 has more registers (128 integer, 128 floating-point, 8 special registers for branch) • IA-64 places instructions into groups or bundles (VLIW) • IA-64 includes special capabilities for speculation and branch elimination • Predication – branch elimination • loop unrolling does not help in if-then-else statements

Predication in IA-64 • 64 1-bit predicate registers • Example: • CMP Ra, Rb JNE else MOV Ra, 0 JMP endelse MOV Ra, Rbend whatever • Code with predicates • CMPEQ Ra, Rb, P1/P2[P1] MOV Ra, 0[P2] MOV Ra, Rb • If the predicate is not true, the instruction becomes nop

Predicates in ARM • Almost all instructions can be conditionally executed. • Thirteen different predicates are available, • Each depending on the four flags Carry, Overflow, Zero, and Negative in some way. • The ARM's 16-bit Thumb instruction set has no branch predication, in order to save encoding space • every instruction reserves a bit-field for the predicate specifying whether that instruction should have an effect

IA-64 Characteristics Itanium :3.2 GFLOPS Itanium: 6.67 GFLOPS.

Dynamic Pipeline Scheduling • dynamic pipelining is a hardware mechanism to avoid pipeline stalls. • Example:lw $t0, 20($s2) addu $t1, $t0, $t2 sub $s4, $s4, $t3 slti $t5, $s4, 20 • Even though addu has to wait for lw to complete, the following two instructions can be started. • Out of order execution => more complicated pipeline control. • Dynamic pipeline scheduling goes past stalls to find later instructions to execute while waiting for the stall to be resolved.

Dynamic Pipeline Scheduling Instruction fetch and decode unit In-order issue Reservation Station Reservation Station Reservation Station Reservation Station Out-of-order execution Integer Integer Floating points Load/ Store Commit Unit In-order commit Reorder buffers

Dynamic Pipeline Scheduling • 5-10 functional units with reservation stations (RS)that hold the operands and the operation. • When the buffer contains all the operands and the unit is ready to execute, the result is calculated, • If necessary they are sent to other RS • The commit unit to decide when it is safe to put the result into the register file or into memory (committing). • Completion methods: • In-order completion and out-of-order completion.

Pentium 4 • After fetched, IA-32 instructions are translated into microoperations • Microoperations • dynamically scheduled • speculative pipelining • issue rate: three microoperations per cycle • deep pipelining • 20 stages • 7 functional units • support for 126 outstanding operations • trace cache

Pentium 3 vs. Pentium 4

Pentium 4 Datapath instruction prefetch and decode branch prediction Trace cache Microoperation queue Register file Dispatch & register renaming Memory operation queue Integer and floating-point operation queue Complex Instruction Integer Floating Point Load Integer Store Commit Unit Data cache

Faster Clock rate Slower Slower Faster IPC Datapath Comparison 1/2 Deeply pipelined Multiple-issue deep pipelined Multiple-issue pipelined Multi-cycle Pipelined Single-cycle

Specialized Hardware Shared 1 Several Latency in instructions Datapath Comparison 2/2 Multiple-issue deep pipelined Multiple-issue pipelined Deeply pipelined Single-cycle Pipelined Multi-cycle

Advanced Topics in Pipelining

Advanced Topics in Pipelining

Presentation Transcript

Advanced Topics In Tourism

Introduction to Advanced Pipelining

Advanced Topics in Rheumatology

Advanced Topics in ChBE

Advanced Topics in Hypertension

Advanced Topics in Routing

Advanced Topics in FOL

Advanced Topics in Pipelining

Advanced Topics

Advanced Pipelining

Advanced Pipelining

Advanced topics in databases

Advanced Pipelining

Advanced Topics in Regression

Introduction to Advanced Pipelining

Advanced Topics in Databases

Advanced Topics in Pipelining - SMT and Single-Chip Multiprocessor

Advanced Topics

Advanced Pipelining

Advanced Topics in Python

Advanced Topics in Finance

Advanced Pipelining