COMP25212 CPU Multi Threading

COMP25212 CPU Multi Threading • Learning Outcomes: to be able to: • Describe the motivation for multithread support in CPU hardware • To distinguish the benefits and implementations of coarse grain, fine grain and simultaneous multithreading • To explain when multithreading is inappropriate • To be able to describe a multithreading implementations • To be able to estimate performance of these implementations • To be able to state important assumptions of this performance model

Revision: IncreasingCPU Performance Inst Cache Data Cache c f e b Fetch Logic Fetch Logic Decode Logic Exec Logic Fetch Logic Fetch Logic Mem Logic Write Logic d Clock How can throughput be increased? a

Increasing CPU Performance • By increasing clock frequency • By increasing Instructions per Clock • Minimizing memory access impact – data cache • Maximising Inst issue rate – branch prediction • Maximising Inst issue rate – superscalar • Maximising pipeline utilisation – avoid instruction dependencies – out of order execution • (What does lengthening pipeline do?)

Increasing Program Parellelism • Keep issuing instructions after branch? • Keep processing instructions after cache miss? • Process instructions in parallel? • Write register while previous write pending? • Where can we find additional independent instructions? • In a different program!

Revision – Process States New Terminated Needs to wait (e.g. I/O) Running on a CPU Blocked waiting for event Pre-empted (e.g. timer) Dispatch(scheduler) I/O occurs Ready waiting for a CPU

Revision – Process Control Block • Process ID • Process State • PC • Stack Pointer • General Registers • Memory Management Info • Open File List, with positions • Network Connections • CPU time used • Parent Process ID

Revision: CPU Switch Operating System Process P1 Process P0 Save state into PCB0 Load state fromPCB1 Save state into PCB0 Load state fromPCB1

What does CPU load on dispatch? • Process ID • Process State • PC • Stack Pointer • General Registers • Memory Management Info • Open File List, with positions • Network Connections • CPU time used • Parent Process ID

What does CPU need to store on deschedule? • Process ID • Process State • PC • Stack Pointer • General Registers • Memory Management Info • Open File List, with positions • Network Connections • CPU time used • Parent Process ID

CPU Support for Multithreading Inst Cache Data Cache GPRsA VA MappingA PCA Address Translation Fetch Logic Decode Logic Fetch Logic Fetch Logic Exec Logic Mem Logic Fetch Logic Write Logic VA MappingB PCB GPRsB

How Should OS View Extra Hardware Thread? • A variety of solutions • Simplest is probably to declare extra CPU • Need multiprocessor-aware OS

CPU Support for Multithreading Design Issue: when to switch threads Inst Cache Data Cache GPRsA VA MappingA PCA Address Translation Fetch Logic Fetch Logic Decode Logic Exec Logic Fetch Logic Fetch Logic Mem Logic Write Logic GPRsB PCB VA MappingB

Coarse-Grain Multithreading • Switch Thread on “expensive” operation: • E.g. I-cache miss • E.g. D-cache miss • Some are easier than others!

Switch Threads on Icache miss

Performance of Coarse Grain • Assume (conservatively) • 1GHz clock (1nS clock tick!), 20nS memory ( = 20 clocks) • 1 i-cache miss per 100 instructions • 1 instruction per clock otherwise • Then, time to execute 100 instructions without multithreading • 100 + 20 clock cycles • Inst per Clock = 100 / 120 = 0.83. • With multithreading: time to exec 100 instructions: • 100 [+ 1] • Inst per Clock = 100 / 101 = 0.99..

Switch Threads on Dcache miss Abort these Performance: similar calculation (STATE ASSUMPTIONS!) Where to restart after memory cycle? I suggest instruction “a” – why?

COMP25212 CPU Multi Threading

COMP25212 CPU Multi Threading

Presentation Transcript

COMP25212 CPU Multi Threading

Chapter 32 Multi-threading

Multi-core systems System Architecture COMP25212

Multi-Threading and Load Balancing

Multi-threading and other parallelism options

Multi-threading in HDF5: Paths Forward

Multi-core systems System Architecture COMP25212

Multi-Threading in Java

A Multi-Threading Architecture…

Multi-Threading

Fast Multi-Threading on Shared Memory Multi-Processors

Multi Threading Models

Multi Cycle CPU

COMP25212: Virtualization

Best Practices for Multi-threading

Multi-core systems System Architecture COMP25212

제 13 주 멀티쓰레딩 (Multi-threading)

Multi-core and Beyond COMP25212 System Architecture

Why multi-threading/multi-core?

Multi-core systems COMP25212 System Architecture

Multi-CPU Video Processing