hardware multithreading n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Hardware Multithreading PowerPoint Presentation
Download Presentation
Hardware Multithreading

Loading in 2 Seconds...

play fullscreen
1 / 29

Hardware Multithreading - PowerPoint PPT Presentation


  • 247 Views
  • Uploaded on

Hardware Multithreading. Increasing CPU Performance. By increasing clock frequency By increasing Instructions per Clock Minimizing memory access impact – data cache Maximising Inst issue rate – branch prediction Maximising Inst issue rate – superscalar

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Hardware Multithreading' - yin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
increasing cpu performance
Increasing CPU Performance

By increasing clock frequency

By increasing Instructions per Clock

Minimizing memory access impact – data cache

Maximising Inst issue rate – branch prediction

Maximising Inst issue rate – superscalar

Maximising pipeline utilisation – avoid instruction dependencies – out of order execution

increasing parallelism
Increasing Parallelism
  • Amount of parallelism that we can exploit is limited by the programs
    • Some areas exhibit great parallelism
    • Some others are essentially sequential
  • In the later case, where can we find additional independent instructions?
    • In a different program!
hardware multithreading1
Hardware Multithreading
  • Allow multiple threads to share a single processor
  • Requires replicating the independent state of each thread
  • Virtual memory can be used to share memory among threads
cpu support for multithreading
CPU Support for Multithreading

VA MappingA

Address

Translation

VA MappingB

Inst Cache

Data Cache

PCA

PCB

Fetch Logic

Fetch Logic

Decode Logic

Fetch Logic

Exec Logic

Mem Logic

Fetch Logic

Write Logic

RegA

RegB

hardware multithreading2
Hardware Multithreading
  • Different ways to exploit this new source of parallelism
    • Coarse-grain parallelism
    • Fine-grain parallelism
    • Simultaneous Multithreading
coarse grain multithreading1
Coarse-Grain Multithreading
  • Issue instructions from a single thread
  • Operate like a simple pipeline
  • Switch Thread on “expensive” operation:
    • E.g. I-cache miss
    • E.g. D-cache miss
switch threads on icache miss
Switch Threads on Icache miss
  • Remove Inst c and switch to other thread
  • The next thread will continue its execution until there is another I-cache or D-cache miss
switch threads on dcache miss
Switch Threads on Dcache miss

Abort these

  • Remove Inst a and switch to other thread
    • Remove the rest of instructions from ‘blue’ thread
    • Roll back ‘blue’ PC to point to Inst a
coarse grain multithreading2
Coarse Grain Multithreading
  • Good to compensate for infrequent, but expensive pipeline disruption
  • Minimal pipeline changes
    • Need to abort all the instructions in “shadow” of Dcache miss  overhead
    • Resume instruction stream to recover
  • Short stalls (data/control hazards) are not solved
fine grain multithreading1
Fine-Grain Multithreading
  • Overlap in time the execution of several threads
  • Usually using Round Robin among all the threads in a ‘ready’ state
  • Requires instantaneous thread switching
fine grain multithreading2
Fine-Grain Multithreading

Multithreading helps alleviate fine-grain dependencies (e.g. forwarding?)

i cache misses in fine grain multithreading
I-cache misses in Fine Grain Multithreading

Inst b is removed and the thread is marked as not ‘ready’

‘Blue’ thread is not ready so ‘orange’ is executed

An I-cache miss is overcome transparently

d cache misses in fine grain multithreading
D-cache misses in Fine Grain Multithreading

Thread marked as not ‘ready’. Remove Inst b. Update PC.

‘Blue’ thread is not ready so ‘orange’ is executed

Mark the thread as not ‘ready’ and issue only from the other thread

d cache misses in fine grain multithreading1
D-cache misses in Fine Grain Multithreading

In an out of order processor we may continue issuing instructions from both threads

fine grain multithreading3
Fine Grain Multithreading

Improves the utilisation of pipeline resources

Impact of short stalls is alleviated by executing instructions from other threads

Single thread execution is slowed

Requires an instantaneous thread switching mechanism

simultaneous multi threading1
Simultaneous Multi-Threading
  • The main idea is to exploit instructions level parallelism and thread level parallelism at the same time
  • In a superscalar processor issue instructions from different threads
  • Instructions from different threads can be using the same stage of the pipeline
simultaneous multithreading
Simultaneous MultiThreading

Let’s look simply at instruction issue:

smt issues
SMT issues
  • Asymmetric pipeline stall
    • One part of pipeline stalls – we want other pipeline to continue
  • Overtaking – want unstalled thread to make progress
  • Existing implementations on O-o-O, register renamed architectures (similar to tomasulo)
smt glimpse into the future
SMT: Glimpse Into The Future?
  • Scout threads?
    • A thread to prefetch memory – reduce cache miss overhead
  • Speculative threads?
    • Allow a thread to execute speculatively way past branch/jump/call/miss/etc
    • Needs revised O-o-O logic
    • Needs and extra memory support
    • See Transactional Memory
simultaneous multi threading2
Simultaneous Multi Threading
  • Extracts the most parallelism from instructions and threads
  • Implemented only in out-of-order processors because they are the only able to exploit that much parallelism
  • Has a significant hardware overhead
benefits of hardware multithreading
Benefits of Hardware Multithreading
  • All multithreading techniques improve the utilisation of processor resources and, hence, the performance
  • If the different threads are accessing the same input data they may be using the same regions of memory
    • Cache efficiency improves in these cases
disadvantages of hardware multithreading
Disadvantages of Hardware Multithreading
  • The perceived performance may be degraded when comparing with a single-thread CPU
    • Multiple threads interfering with each other
  • The cache has to be shared among several threads so effectively they would use a smaller cache
  • Thread scheduling at hardware level adds high complexity to processor design
    • Thread state, managing priorities, OS-level information, …
multithreading summary
Multithreading Summary

A cost-effective way of finding additional parallelism for the CPU pipeline

Available in x86, Itanium, Power and SPARC

(Most architectures) Present additional CPU thread as additional CPU to Operating System

Operating Systems Beware!!! (why?)