1 / 16

COMP25212 CPU Multi Threading

COMP25212 CPU Multi Threading. Learning Outcomes: to be able to: Describe the motivation for multithread support in CPU hardware To distinguish the benefits and implementations of coarse grain, fine grain and simultaneous multithreading To explain when multithreading is inappropriate

rahim-moses
Download Presentation

COMP25212 CPU Multi Threading

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COMP25212 CPU Multi Threading • Learning Outcomes: to be able to: • Describe the motivation for multithread support in CPU hardware • To distinguish the benefits and implementations of coarse grain, fine grain and simultaneous multithreading • To explain when multithreading is inappropriate • To be able to describe a multithreading implementations • To be able to estimate performance of these implementations • To be able to state important assumptions of this performance model

  2. Revision: IncreasingCPU Performance Inst Cache Data Cache c f e b Fetch Logic Fetch Logic Decode Logic Exec Logic Fetch Logic Fetch Logic Mem Logic Write Logic d Clock How can throughput be increased? a

  3. Increasing CPU Performance • By increasing clock frequency • By increasing Instructions per Clock • Minimizing memory access impact – data cache • Maximising Inst issue rate – branch prediction • Maximising Inst issue rate – superscalar • Maximising pipeline utilisation – avoid instruction dependencies – out of order execution • (What does lengthening pipeline do?)

  4. Increasing Program Parellelism • Keep issuing instructions after branch? • Keep processing instructions after cache miss? • Process instructions in parallel? • Write register while previous write pending? • Where can we find additional independent instructions? • In a different program!

  5. Revision – Process States New Terminated Needs to wait (e.g. I/O) Running on a CPU Blocked waiting for event Pre-empted (e.g. timer) Dispatch(scheduler) I/O occurs Ready waiting for a CPU

  6. Revision – Process Control Block • Process ID • Process State • PC • Stack Pointer • General Registers • Memory Management Info • Open File List, with positions • Network Connections • CPU time used • Parent Process ID

  7. Revision: CPU Switch Operating System Process P1 Process P0 Save state into PCB0 Load state fromPCB1 Save state into PCB0 Load state fromPCB1

  8. What does CPU load on dispatch? • Process ID • Process State • PC • Stack Pointer • General Registers • Memory Management Info • Open File List, with positions • Network Connections • CPU time used • Parent Process ID

  9. What does CPU need to store on deschedule? • Process ID • Process State • PC • Stack Pointer • General Registers • Memory Management Info • Open File List, with positions • Network Connections • CPU time used • Parent Process ID

  10. CPU Support for Multithreading Inst Cache Data Cache GPRsA VA MappingA PCA Address Translation Fetch Logic Decode Logic Fetch Logic Fetch Logic Exec Logic Mem Logic Fetch Logic Write Logic VA MappingB PCB GPRsB

  11. How Should OS View Extra Hardware Thread? • A variety of solutions • Simplest is probably to declare extra CPU • Need multiprocessor-aware OS

  12. CPU Support for Multithreading Design Issue: when to switch threads Inst Cache Data Cache GPRsA VA MappingA PCA Address Translation Fetch Logic Fetch Logic Decode Logic Exec Logic Fetch Logic Fetch Logic Mem Logic Write Logic GPRsB PCB VA MappingB

  13. Coarse-Grain Multithreading • Switch Thread on “expensive” operation: • E.g. I-cache miss • E.g. D-cache miss • Some are easier than others!

  14. Switch Threads on Icache miss

  15. Performance of Coarse Grain • Assume (conservatively) • 1GHz clock (1nS clock tick!), 20nS memory ( = 20 clocks) • 1 i-cache miss per 100 instructions • 1 instruction per clock otherwise • Then, time to execute 100 instructions without multithreading • 100 + 20 clock cycles • Inst per Clock = 100 / 120 = 0.83. • With multithreading: time to exec 100 instructions: • 100 [+ 1] • Inst per Clock = 100 / 101 = 0.99..

  16. Switch Threads on Dcache miss Abort these Performance: similar calculation (STATE ASSUMPTIONS!) Where to restart after memory cycle? I suggest instruction “a” – why?

More Related