chapter3 limitations on instruction level parallelism
Download
Skip this Video
Download Presentation
Chapter3 Limitations on Instruction-Level Parallelism

Loading in 2 Seconds...

play fullscreen
1 / 30

Chapter3 Limitations on Instruction-Level Parallelism - PowerPoint PPT Presentation


  • 279 Views
  • Uploaded on

Chapter3 Limitations on Instruction-Level Parallelism. Bernard Chen Ph.D. University of Central Arkansas. Overcome Data Hazards with Dynamic Scheduling. If there is a data dependence, the hazard detection hardware stalls the pipeline

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Chapter3 Limitations on Instruction-Level Parallelism' - lucrece


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
chapter3 limitations on instruction level parallelism

Chapter3 Limitations on Instruction-Level Parallelism

Bernard Chen Ph.D.

University of Central Arkansas

overcome data hazards with dynamic scheduling
Overcome Data Hazards with Dynamic Scheduling
  • If there is a data dependence, the hazard detection hardware stalls the pipeline
  • No new instructions are fetched or issued until the dependence is cleared
  • Dynamic Scheduling:the hardware rearrange the instruction execution to reduce the stalls while maintaining data flow and exception behavior
slide3
RAW
  • If two instructions are data dependent, they cannot execute simultaneously or be completely overlapped
  • If data dependence caused a hazard in pipeline, called a Read After Write (RAW) hazard

I: add r1,r2,r3

J: sub r4,r1,r3

overcome data hazards with dynamic scheduling4
Overcome Data Hazards with Dynamic Scheduling
  • Key idea: Allow instructions behind stall to proceed

DIV F0 <- F2/F4 ADD F10<- F0+F8 SUB F12<- F8-F14

overcome data hazards with dynamic scheduling5
Overcome Data Hazards with Dynamic Scheduling
  • Key idea: Allow instructions behind stall to proceed

DIV F0 <- F2/F4

SUB F12<- F8-F14

ADD F10<- F0+F8

overcome data hazards with dynamic scheduling6
Overcome Data Hazards with Dynamic Scheduling
  • Key idea: Allow instructions behind stall to proceed

DIV F0 <- F2/F4

SUB F12<- F8-F14

ADD F10<- F0+F8

  • Enables out-of-order execution and allows out-of-order completion(e.g., SUB)
  • In a dynamically scheduled pipeline, all instructions still pass through issue stage in order (in-order issue)
overcome data hazards with dynamic scheduling7
Overcome Data Hazards with Dynamic Scheduling
  • It offers several advantages:
    • Simplifies the compiler
    • It allows code that compiled for one pipeline to run efficiently on a different pipeline
    • (Allow the processor to tolerate unpredictable delays such as cache misses)
overcome data hazards with dynamic scheduling8
Overcome Data Hazards with Dynamic Scheduling
  • However, Dynamic execution creates WAR and WAW hazards and makes exceptions harder
  • Name dependence:when 2 instructions use same register or memory location, called a name, but no flow of data between the instructions associated with that name;
  • There are 2 versions of name dependence
slide9

I: sub r4,r1,r3

J: add r1,r2,r3

K: mul r6,r1,r7

WAR
  • InstrJ writes operand before InstrI reads it
  • If it caused a hazard in the pipeline, called a Write After Read (WAR) hazard
slide10

I: sub r1,r4,r3

J: add r1,r2,r3

K: mul r6,r1,r7

WAW
  • InstrJ writes operand before InstrI writes it.
  • If anti-dependence caused a hazard in the pipeline, called a Write After Write (WAW) hazard
example
Example

DIV r0 <- r2 / r4

ADD r6 <- r0 + r8

SUB r8 <- r10 – r14

MUL r6 <- r10 * r7

OR r3 <- r5 or r9

for you to practice
For you to practice
  • DIV r0 <- r2 / r4
  • ADD r6 <- r0 + r8
  • ST r1 <- r6
  • SUB r8 <- r10 - r14
  • MUL r6 <- r10 * r8
overcome data hazards with dynamic scheduling16
Overcome Data Hazards with Dynamic Scheduling
  • Instructions involved in a name dependence can execute simultaneously if name used in instructions is changed so instructions do not conflict
    • Register renaming resolves name dependence for regs
    • Either by compiler or by HW
limits to ilp
Limits to ILP

Assumptions for ideal/perfect machine to start:

1. Register renaming – infinite virtual registers => all register WAW & WAR hazards are avoided

2. Branch prediction – perfect; no mispredictions

3. Perfect Cache

performance beyond single thread ilp
Performance beyond single thread ILP
  • There can be much higher natural parallelism in some applications
  • Such as “Online processing system”: which has natural parallelism among the multiple queries and updates that are presented by requests
thread level parallelism tlp
Thread-level parallelism (TLP)
  • Thread: process with own instructions and data
    • thread may be a process part of a parallel program of multiple processes, or it may be an independent program
    • Each thread has all the state (instructions, data, PC, register state, and so on) necessary to allow it to execute
thread level parallelism tlp21
Thread-level parallelism (TLP)
  • TLP explicitly represented by the use of multiple threads of execution that are inherently parallel
  • Goal: Use multiple instruction streams to improve
    • Throughput of computers that run many programs
    • Execution time of multi-threaded programs
  • TLP could be more cost-effective to exploit than ILP
new approach mulithreaded execution
New Approach: Mulithreaded Execution
  • Multithreading: multiple threads to share the functional units of 1 processor via overlapping
  • Processor must duplicate independent state of each thread e.g., a separate copy of register file, a separate PC, and for running independent programs, a separate page table
new approach mulithreaded execution23
New Approach: Mulithreaded Execution
  • When switch?
    • Alternate instruction per thread (fine grain)
    • When a thread is stalled, perhaps for a cache miss, another thread can be executed (coarse grain)
fine grained multithreading
Fine-Grained Multithreading
  • Switches between threads on each instruction, causing the execution of multiples threads to be interleaved
  • Usually done in a round-robin fashion, skipping any stalled threads
  • CPU must be able to switch threads every clock
multithreaded categories
Multithreaded Categories

Fine-Grained

Thread 4

Thread 1

Thread 2

Thread 3

Thread 5

fine grained multithreading27
Fine-Grained Multithreading
  • Advantage is it can hide both short and long stalls, since instructions from other threads executed when one thread stalls
  • Disadvantage is it slows down execution of individual threads, since a thread ready to execute without stalls will be delayed by instructions from other threads
course grained multithreading
Course-Grained Multithreading
  • Switches threads only on costly stalls, such as cache misses
  • Advantages
    • Relieves need to have very fast thread-switching
    • Doesn’t slow down thread, since instructions from other threads issued only when the thread encounters a costly stall
course grained multithreading29
Course-Grained Multithreading
  • Disadvantage is hard to overcome throughput losses from shorter stalls, due to pipeline start-up costs
    • Since CPU issues instructions from 1 thread, when a stall occurs, the pipeline must be emptied or frozen
    • New thread must fill pipeline before instructions can complete
  • Because of this start-up overhead, coarse-grained multithreading is better for reducing penalty of high cost stalls, where pipeline refill << stall time
multithreaded categories30
Multithreaded Categories

Coarse-Grained

(2clock cycle)

Thread 4

Thread 1

Thread 2

Thread 3

Thread 5

ad