110 likes | 227 Views
This outline delves into the advancements and challenges of Speculative Computation (SC++) as a method for optimizing parallel processing. We will discuss various optimization techniques, including store buffering and speculation support, and how SC++ offers advantages over Relaxed Consistency (RC) by achieving faster program execution in certain scenarios, while also facing unique challenges like rollback management and data races. Through qualitative analyses, we'll assess performance implications, considering factors such as network latency and cache size to understand the operational viability of SC++ in modern computing environments.
E N D
Is SC + ILP = RC? Quinn Gaumer Duke University
Outline • Motivation • Previous SC optimizations • SC++ Implementation • SC++ Analysis
SC vs. RC • SC • Easy Programming Model (No different than uniprocessor) • Slower programs • RC • Faster programs(20%) • Software Assistance
Previous SC optimizations • Load forwarding • Loads can return values even if other mem ops are pending • When is this good? • As long as its not exposed to other processors • When is wrong? • Invalidations received before the speculative load is retired. • Problem: ROB can still fill up due to store at head…
Previous SC optimizations • Store Buffering • Waiting Stores moved to LSQ • Problem: reorder buffer still must stop retiring loads if stores are pending.
SC++ • Store-Store bypassing • Speculative State for Memory • Speculation Support • Rollbacks infrequent
Store-Store Bypassing • Speculative History Queue • Holds Stores and completed instructions • Also holds information needed to rollback operations • SHiQ Store OP Store Head Head
Memory Order Violations • When is SC Violated? • Speculative load or store is invalidated, read, or read. • How is Violation Detected? • Block Lookup Table(BLT) contains addresses of speculative memory ops • Invalidations, Replacements, Downgrades cause search of BLT for address
Rollback • Processor and Memory state must be rolled back to first memory operation that accessed offending block • Guarantee Forward Progress? • Speculation prohibited until all pending stores performed. • Rollback can be slow • Requires flushing pipeline, move data between local caches. • Optimizations • Rollback multiple instructions/cycle • Sending responses to invalidations immediately
Qualitative Analysis • Must hold state to allow roll back of both processor and memory • Detect rollbacks quickly • Rollbacks are extremely slow…Does it matter? • Data Races • False Sharing • Cache Conflicts
Results • SC++ theoretically performs as well as RC • SC++ can be physically limited in several ways • Network Latency • SHiQ Size • Cache Size • Why does each affect the speed of SC++(relative to SC or RC)