1 / 16

Extended Memory Semantics for Thread Synchronization

Extended Memory Semantics for Thread Synchronization. Sheng Li, Ying Zhou Operating System Progress Report Nov 1 st , 2007 . Problems. Hardware multithreading is no longer a privilege of supercomputing, it is already part of the major microprocessors.

finola
Download Presentation

Extended Memory Semantics for Thread Synchronization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extended Memory Semantics for Thread Synchronization Sheng Li, Ying Zhou Operating System Progress Report Nov 1st, 2007

  2. Problems • Hardware multithreading is no longer a privilege of supercomputing, it is already part of the major microprocessors. • E.g. In Sun Niagara 2 has 64 threads/chip and 256 threads/server. • Concurrency management is one of the biggest challenges in multithreaded system • Key requirement:Low overhead and scalable thread synchronization • Synchronization mechanisms • Atomic primitives (Test-and-Set, Compare-and-Swap, LL-SC) • Software routines built on them have poor performance and scalability • Empty/Full bits, using extension bit for each memory location to denote the empty/full state. • Better performance [1], but still not enough

  3. Our Goal • Solve the synchronization bottleneck by using Extended Memory Semantics • Better performance and scalability • Quantify the performance gain when using EMS, compared to other synchronization mechanisms (e.g Empty/Full bits)

  4. 64 bits of data/metadata Extension bit Extended Memory Semantics Memory instructions are characterized synchronization behavior. • Load.ff, Load.fe, Store.xf, Store.ef, Store.xe. (F--- Full, e---empty, x---don’t care)

  5. EMS handler • There is no free lunch… EMS handler has overhead • Creating the handler threads • To queue up memory requests, to build the data structure

  6. What we have done so far • Build the EMS model on both architecture and OS aspects in the Structural Simulation Toolkit (SST) • SST is the simulation environment for massively lightweight multithreading , developed at Notre Dame and Sandia Lab • Modified the glibc to use EMS • Especially pthread library • Design benchmarks for different categories • Run the simulations to evaluate EMS performance

  7. Tightly Coupled Parallel • Each thread competes with the others for the only lock before updating the counter • Very high contention, worst case

  8. Loosely Coupled Parallel • Each thread competes locks with the others before updating the counters. • Mild contention

  9. Embarrassingly Parallel • No contention, no locks

  10. Embarrassingly parallel and loosely coupled parallel • Low synchronization overhead--- guaranteed by EMS • EMS shows very good scalability Synchronization distribution

  11. Tightly Coupled Parallel • Bad performance for EMS in the worst case • Most of threads are used for synchronization, not for real job

  12. The Road Ahead • Build/complete other synchronization mechanisms (e.g. Empty/Full bits and etc) into SST • Modify glibc to make it support for other synchronization mechanisms • Compare performance between EMS and other synchronization mechanisms

  13. Thank you! Questions?

  14. Bibliography [1] Performance and Programming Experience on the Tera MTA, Larry Carter, John Feo, Allan Snavely, PPSC, 1999

  15. Back up Slides

  16. Lightweight Threads • Thread context (frame) is 32 double words (256 bytes) • Two double words are reserved for the thread status; 30 general purpose registers. • No other per thread state, easy for multithreading . • Frames are stored in memory (No Register File) • Registers are aliases for memory locations

More Related