1 / 20

Conditional Memory Ordering

Conditional Memory Ordering. Christoph von Praun, Harold W.Cain, Jong-Deok Choi, Kyung Dong Ryu Presented by: Renwei Yu. Published in Proceedings of the 33nd International Symposium on Computer Architecture, 2006. Motivation.

cleta
Download Presentation

Conditional Memory Ordering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Conditional Memory Ordering Christoph von Praun, Harold W.Cain, Jong-Deok Choi, Kyung Dong Ryu Presented by: Renwei Yu Published in Proceedings of the 33nd International Symposium on Computer Architecture, 2006.

  2. Motivation • Modern multiprocessor systems need memory barrier instructions in the program to specify the memory ordering • Conventionally , we can guarantee memory ordering by using locks or barriers, it leads to superfluous memory barriers in programs. • We need a mechanism to reduce unnecessary memory ordering.

  3. Redundancies of memory ordering in conventional locking algorithms • Lock operation on lock variable l • Unlock operation on lock variablel Neither private nor shared caches provide both goals

  4. Source of memory ordering redundancy • Thread-confinement of lock variables. • Memory ordering that occurs for lock variables that are solely accessed by a single thread are redundant • Thread locality of locking. • Locality of locking is a situation where consecutive acquires of a lock variable are made by the same thread • Eager releases and repetitive acquires. CMPs change Latency-Capacity Tradeoff in two ways

  5. CMO-conditional memory ordering • CMO is demonstrated on a lock algorithm that identifies those dynamic lock/unlock operations for which memory ordering is unnecessary, and speculatively omits the associated memory ordering instructions. • When ordering is required, this algorithm relies on a hardware mechanism for initiating a memory ordering operation on another processor.

  6. CMO-conditional memory ordering • Acquire of lock l with conditional memory ordering

  7. CMO-conditional memory ordering • Release of lock l with conditional memory ordering

  8. CMO-conditional memory ordering • Memory synchronization model is different: the release synchronization is omitted at the unlock operation and “recovered” at the lock operation – only if necessary. • Necessity is determined according to a release number that is communicated between the thread that unlocks l and the thread that subsequently locks l.

  9. Release numbers • relnum ⇐(id & release ctr. of current proc) • a value that reflects a combination of a processor id and a counter of the release synchronization operations(release counter) that the respective processor performed at a certain stage during the execution of a program.

  10. Conditional memory ordering • Based on the release number, the system arranges that release synchronization is recovered at the processor that previously released the lock, but only if necessary. • (sync conditional) implies (sync acquire) at the processor that issues the instruction.

  11. Hardware support for CMO • Logical operation • Release vector entry • Register operand • Comparison of release counters • Release vector support • To support low latency reads, a copy of the release vector is mirrored in local storage at each processor. • Broadcast operation • Release hints • Instruct a processor to increment its release counter as soon as the conditions are met

  12. Evaluation • S-CMO: A software CMO prototype • The result show that CMO avoids memory ordering operations for the vast majority of dynamic acquire and release operations across a set of multithreaded Java workloads, leading to significant speedups for many. • However, performance improvements in the software prototype are hindered by the high cost of remote memory ordering.

  13. Experimental Methodology • Use a set of single and multi-threaded Java benchmarks from Java Grande and SPEC benchmark suites. • Run these applications on IBM’s J9 productive virtual machine. • Performed on both Power4 and Power5 multiprocessor systems running AIX, with 4 and 6 processors respectively.

  14. Software CMO prototype with hardware support • Hardware-based (sync conditional) and (sync remote) implementation

  15. Software CMO prototype with hardware support • CMO performance while varying remote sync latency in high-cost (Power4)memory ordering implementation.

  16. Software CMO prototype with hardware support • CMO performance while varying remote sync latency in high-cost (Power5)memory ordering implementation.

  17. Future Proposal • Hardware Proposal • Software Proposal

  18. Summary • It developed a algorithm called conditional memory ordering (CMO), that can eliminates redundant memory ordering operations and improves the performance of the system effectively. • It summaries the characters the of synchronization and memory ordering operations in lock intensive Java workloads and demonstrate that a lot of memory ordering operations occur superfluously. • It evaluates the performance improvement of CMO. • It gives a Hardware proposals of CMO and its hardware implementation using a software prototype and an analytical model.

  19. Conclusions • CMO can significantly improve the performance of multiprocessor systems. • With hardware support, CMO offers significant performance benefits across our set of Java benchmarks when assuming a reasonable remote synchronization latency.

  20. Thank you Questions?

More Related