RISC Pipelining CS 147 Spring 2011 Kui Cheung

RISC Pipelining CS 147 Spring 2011 Kui Cheung

RISC Pipelining Classic five stage instruction Fetch – fetch instruction from memory Decode – determine what action is required Execute – execute instruction Memory – data cache access Writeback – write result to register

Arm9 If we use the basketball team analogy, we can assign the following positions to the different stages. 1)Coach give a play to the point guard. 2)Point guard pass the ball to the right person to execute the play. 3)SF or PF continue setting up the play by doing some fancy moves and then pass the ball to the center. 4)Center continue setup and pass the ball to SG for a clean shot. 5)SG takes the shot. Power Forward Shooting Guard Coach Point Guard Small Forward Center Nintendo DS 5 Stage Pipeline

Arm9 1)Fetch instruction from instruction register(IR) 4)Access cache if needed 2)Determine what action to take 3)Execute the instruction 5)Write result in register Example: MOV Reg1, Mem1 1)fetch instruction(MOV Reg1, Mem1) 2)decided it is a move instruction from memory to register 3)fetch address of memory to be move 4)fetch data from memory 5)write data to Reg1 Nintendo DS 5 Stage Pipeline

RISC Pipelining • FI - fetch instruction • DI - decode instruction • EX - execute instruction • MEM – data cache access • WB - write back

Pipeline Delay 1) move data from Mem1 to Reg1 2) move data from Reg2 to Reg1 3) move data from Reg1 to Mem2 MOV Reg1, Mem1 MOV Reg1, Reg2 MOV Mem2, Reg1 (a) No data load delay in the pipeline

Pipeline Delay Write data from Mem1 into Reg1 MOV Reg1,Mem1 MOV Reg2,(Reg1) (b)Data dependency delay Must wait for data to be loaded into Reg1 MOV Reg1,Mem1 MOV Reg2,(Reg1) Stall(bubble) 1) move data from Mem1 to Reg1 2) move data from Reg1 to Reg2

Pipeline Delay Add a NOP(no operation perform) to fill the gap MOV Reg1,Mem1 NOP MOV Reg2,(Reg1) 1) move data from Mem1 to Reg1 2) no operation perform 3) move data from Reg1 to Reg2

(c)Control dependency delay At this point Reg3 equal Reg2 + Reg1, and line 103 can compare Reg3 to Reg4 and decide jumping to 106 or not 101 ADD Reg3, Reg2, Reg1 102 NOP 103 BEQ Reg3 ,Reg4, 106 104 MOV Mem1, Reg3 105 ADD Reg4, Reg1, Reg2 106 MOV Mem1, Reg4 Data dependency delay jump Reg3 = Reg4, jump to 106 Waiting for 103 to decide going to 104 or jumping to 106 101 add Reg2 to Reg1 and put in Reg3 102 no operation perform 103 if Reg3 = Reg4, jump to 106 else 104 104 move Reg3 to Mem1 105 add Reg2 to Reg1 and put in Reg4 106 move Reg4 to Mem1

(c)Control dependency delay At this point Reg3 equal Reg2 + Reg1, and line 103 can compare Reg3 to Reg4 and decide jumping to 106 or not 101 ADD Reg3, Reg2, Reg1 102 NOP 103 BEQ Reg3 ,Reg4, 106 104 MOV Mem1, Reg3 105 ADD Reg4, Reg1, Reg2 106 MOV Mem1, Reg4 Data dependency delay Reg3 = Reg4, jump to 106, no time wasted Guess branch will happen 101 add Reg2 to Reg1 and put in Reg3 102 no operation perform 103 if Reg3 = Reg4, jump to 106 else 104 104 move Reg3 to Mem1 105 add Reg2 to Reg1 and put in Reg4 106 move Reg4 to Mem1

(c)Control dependency delay At this point Reg3 equal Reg2 + Reg1, and line 103 can compare Reg3 to Reg4 and decide jumping to 106 or not 101 ADD Reg3, Reg2, Reg1 102 NOP 103 BEQ Reg3 ,Reg4, 106 104 MOV Mem1, Reg3 105 ADD Reg4, Reg1, Reg2 106 MOV Mem1, Reg4 107 MOV Reg2, Mem2 Data dependency delay Reg3 not= Reg4, clear and fetch 104 next Guess wrong can lead to wasted time

Pure RISC Pipeline Simple primitive instructions and addressing modes Instructions execute in one clock cycle Uniformed length instructions and fixed instruction format Instructions interface with memory via fixed mechanisms (load/store) Pipelining Instruction set is orthogonal (little overlapping of instruction functionality) Hardwired control Complexity pushed to the compiler

Pure RISC Pipeline Register to register cycle 1) F: instruction fetch from register 2) E: execute , perform ALU operations with register input and output Load and Store cycle 1) F: instruction fetch from register 2) E: execute, calculates memory address 3) W: memory, register to memory, memory to register operations

Pure RISC Pipeline a) Traditional pipeline 100 MOVE Reg1, Mem1 101 ADD 1, Reg1 102 JUMP 105 103 ADD Reg1, Reg2 105 MOVE Mem2, Reg1 100 move Mem1 to Reg1 101 add 1 to Reg1 102 Jump to 105 103 add Reg1 to Reg2 105 move Reg1 to Mem2 Jump execute and 103 is cleared from the pipeline, 105 is fetch F – fetch E – execute W – write back

Pure RISC Pipeline a) RISC Pipeline with inserted NOP 100 MOVE Reg1, Mem1 101 ADD 1, Reg1 102 JUMP 105 103 NOP 105 MOVE Mem2, Reg1 100 move Mem1 to Reg1 101 add 1 to Reg1 102 Jump to 105 103 no operation 105 move Reg1 to Mem2 A NOP is added so no special circuitry is needed to clear the pipeline F – fetch E – execute W – write back

Pure RISC Pipeline a) Reversed instructions 100 MOVE Reg1, Mem1 101 JUMP 105 102 ADD 1, Reg1 105 MOVE Mem2, Reg1 Delayed branch When a branch occur, delay the execution and fetch the next instruction first. ex) fetch 102 before executing JUMP to 105, this way 102 can execute at the same time 105 is fetch 100 move Mem1 to Reg1 101 Jump to 105 102 add Reg1 to Reg2 105 move Reg1 to Mem2 F – fetch E – execute W – write back

Superpipeline Branch executed and pipeline is clear In theory, more and shorter stages could allow more instructions to be process at the same time. But a branch could lead to wasted cycles.

Arm11 Pipeline Fetch Instruction Decode Execute Memory Writeback Arm11(IPhone 3G) 8 Stage pipeline

RISC Pipelining Dynamic Branch Prediction 95% accuracy Decode(5 stages) Fetch Instruction(2 stages) Execute, Memory, Writeback(6 stages) Arm Cortex A8(IPhone3GS, Samsung Galaxy S) 13 Stage pipeline

I7(Nehalem)Superpipeline Fetch Decode 14 Stages Execute Memory, Writeback

Reference http://www.jp.arm.com/event/pdf/forum2008/t1-1.pdf http://www-cs-faculty.stanford.edu/~eroberts/courses/soco/projects/2000-01/risc/pipelining/index.html http://www.bit-tech.net/hardware/cpus/2008/11/03/intel-core-i7-nehalem-architecture-dive/5 http://qu.academia.edu/AwsYousif/Papers/120709/A_New_Trend_for_CISC_and_RISC_Architectures Course text book: Computer Organization and Architecture, 7th editions, William Stallings

RISC Pipelining CS 147 Spring 2011 Kui Cheung

RISC Pipelining CS 147 Spring 2011 Kui Cheung

Presentation Transcript

CS 147 – Parallel Processing

ECE/CS 552: Pipelining to Superscalar

CS 4102, Algorithms, Spring 2011

CS 52500, Parallel Computing Spring 2011

CS 526 (Spring 2011)

CS 460 Spring 2011

CS 4102: Algorithms Spring 2011 Aaron Bloomfield

Spring 2011

CS 460 Spring 2011

CS 49000-ST0 Software Testing Spring 2011 Review

RISC Architecture and Pipelining

CS 5513 Computer Architecture Pipelining Examples

Spring 2011

CS 3853 Computer Architecture Pipelining Examples

Spring 2011

Alice Cheung DGS 10/3/2011

CS 5331: Applied Machine Learning Spring 2011

RISC Pipelining CS 147 Spring 2011 Kui Cheung

CS 147 – Parallel Processing

Spring 2011

By Andrew Gliga Cs 147

CS 147 Cache Memory