CS 7810 Lecture 21

CS 7810 Lecture 21 Threaded Multiple Path Execution S. Wallace, B. Calder, D. Tullsen Proceedings of ISCA-25 June 1998

Leveraging SMT • Recall branch fan-out from “Limits of ILP” • Future processors will likely have no shortage of • idle thread contexts • Spawned threads are parallel, but have • dependences with earlier instructions: registers, • uncommitted stores, data cache values • SMT may be an ideal candidate as threads share • the same set of resources

SMT Vs. CMP • A multi-threaded workload (on an SMT) is more • tolerant of branch mpreds – TME makes most • sense if there is a shortage of threads • Power overheads are enormous – on an SMT, • we may not have the option to execute speculative • threads on low-power pipelines • What about energy? • Is CMP a better candidate?

Renaming Overview r1 maps to p1 r1  …  r1 br …. r1  p5  …  p5 br …. p3  • Every branch causes a checkpoint of mappings, so • we can recover quickly on a mis-predict • Each thread in the SMT can have 8 checkpoints

Threaded Multi-Path Execution • Key elements in TME: • Identifying low-confidence branches • Efficient thread spawning • Efficient recovery on branch resolution • Fetch priorities for each thread on SMT

Path Selection • Only the primary path can spawn threads • (prevents an exponential increase in threads) • For each bpred entry, keep track of successive • correct predictions (reset on mispredict) – if the • counter is less than a threshold, the branch is • low-confidence – note that a small counter size • is more selective in picking low-confidence • branches

Register Mappings • In SMT, each thread can read any physical register • Thread spawning requires a copy of the register • mappings at that branch • A copy involves transfer of (32 x 9 bits) – the new thread • cannot begin renaming until this copy is complete – the • copy may also hold up the primary thread if map table • read ports are scarce • Every new mapping can be placed on a bus and • idle threads can snoop and keep pace

Spawning Algorithm

Spawning Algorithm • When threads are idle, they keep pace and spawn a thread • as soon as a low-confidence branch is encountered • When a thread context becomes free and a low-confidence • checkpoint already exists, the new context synchronizes • mappings with the primary context and executes the • primary path, while the old primary context executes the • alternate path after reinstating the checkpoint • If a newly idle thread has a low-confidence checkpoint, • it starts executing the alternate path

Introduced Complexity • Book-keeping to manage checkpoint locations – every • branch has to track the location of its checkpoint • Who frees a register value? • What about memory dependences? • Loads can ignore stores that are not predecessors • Maintain an array of bits to represent the path taken (each basic block corresponds to a bit in the array) • Check for memory dependences only if the store’s path is a subset of the load’s path (p5) r1  (p7) r1  (p8) r1 

Processor Parameters • Eight-wide processor with up to eight contexts; each • context has eight checkpoints • 32-entry issue queues, 4Kb gshare branch predictor, • 7 cycle mpred penalty, memory latency of 62 cycles • ICOUNT 2.8: first thread can bring in up to 8 instrs and • the second thread fills in unused slots; occupancy in the • front-end determines priority • Focus on branch-limited programs: compress (20%), • gcc (18%), go (30%), li (6%)

Results: Spare Contexts

Results: Bus Latency

Results: Branch Confidence

Results: Path Selection

Results: Fetch Policy

Results: Mpred Penalty

Conclusions • Too much complexity/power overhead, too little benefit? • Benefits may be higher for deeper pipelines; larger windows • (this paper evaluates 8 windows of 48 instrs; does 2 x 192 • yield better results?); longer memory latencies • There is room for improvement with better branch • confidence metrics • CMPs will incur greater cost during thread spawning, but • may be more power-efficient

Title • Bullet

CS 7810 Lecture 21

CS 7810 Lecture 21

Presentation Transcript

CS 7810 Lecture 19

CS 7810 Lecture 17

CS 7810 Lecture 22

CS 7810 Lecture 25

CS 7810 Lecture 9

CS 7810 Lecture 2

CS 7810 Lecture 14

CS 7810 Lecture 8

CS 7810 Lecture 13

CS 7810 Lecture 21

CS 7810 Lecture 23

CS 7810 Lecture 9

CS 7810 Lecture 13

CS 7810 Lecture 3

CS 7810 Lecture 25

CS 7810 Lecture 8

CS 7810 Lecture 5

CS 7810 Lecture 12

CS 7810 Lecture 19

CS 7810 Lecture 22

CS 7810 Lecture 2