RCU Implementation

RCU Implementation The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with Linux Guniguntala et al.

RCU Overview • Publish-Subscribe • insertion • reader-writer synchronization • Wait for pre-existing readers to complete • deletion • change – wait for readers – free • safe memory reclamation • Maintain multiple versions of update objects • for readers

RCU – List Update– Basic Strategy Starting List New Node Copy B to B’ and Modify Move A.Next to B’ B still visible, but not for new readers Readers complete, remove B RCU Semantics: A First Attempt McKenney & Walpole

RCU – Reader, Writer, GC • Reader-Writer • rcu_assign_pointer() • rcu_dereference() • Memory barriers embedded in API • Writer-Collection • rcu_synchronize() blocks caller until safe to collect • call_rcu() is asychronous call for collection • Reader-Collection (?) p->a = 1; p->b = 2; p->c = 3; rcu_assign_pointer(gp, p);

Memory Reclamation • General issues in non-blocking & swap-free • When is it safe to free memory? • Memory reclamation tracking can be relatively costly • Expensive atomic operations / memory barriers required • Non-blocking queue • Atomic operation expense • CAS (15-25 clock cycles on P4) • Retry on contention • Non-blocking synchronization • Atomic operation expense • store_conditional • Data structure copy expense

RCU - Memory Reclamation • With interactions between reader, writer and collector, when is it time to reclaim memory? • Writer identifies what to collect and trigger collection to occur (synchronously or asynch) • Readers (indirectly) indicate when to collect by no longer referencing the freed object

RCU – Memory Reclamation • One solution for collector: • Track copies of global pointer into thread-local memory • Each thread maintains a list of it’s currently active pointers • Collector checks the thread-local list prior to memory reclamation • Sounds a lot like the hazard pointer !

RCU & Hazard Pointers • Hazard Pointer Disadvantages: • Required manual identification of hazard references • Expensive on the read path • Requires two memory barriers on the read path • Copy of the global pointer to local reference • Entry of hazard pointer into the list • Every read thread incurs this extra overhead as the cost for correct memory reclamation. Expensive for many-reader situations

RCU – Memory Reclamation • RCU -> Collection based on ‘quiescent state’ • Threads prevent the occurrence of quiescent state while their local memory is alive • Collector indirectly observes state of all threads to infer when safe to reclaim memory • The definition chosen for ‘quiescent state’ will significantly impact performance • Best choice: Infer by operations that occur anyway

RCU – Reader - Collection • Reader-Collection • rcu_read_lock() • rcu_read_unlock() • read-side critical section • Non-preemptible kernel • Programming convention is to avoid yielding in the read-side critical section • Memory reclamation on voluntary context switch • rcu_read_lock/unlock calls do nothing in non-preemptible kernel rcu_read_lock(); retval = rcu_dereference(gbl_foo)->a; rcu_read_unlock(); return retval;

RCU – Memory Reclamation • ‘Simple case’: Non-preemptible kernel • All threads use read-side critical section with no voluntary yield • no context switch within a read-side critical section • Collector observes all CPU to determine when all threads have undergone a context switch • Indicates a pass into a quiescent state • All previous read-side critical sections are now guaranteed to have exited • Any new threads no longer have visibility to removed object • Safe–conservative-imprecise–degrades real-time • Detection of quiescent state occurs after last reader use • Collector waits for all readers to finish even if not all readers were accessing the memory to be reclaimed • Delay real-time response due to refusal to yield within read-side critical

Preemptible kernels • Read-side critical section • Readers can now be preempted in their read-side critical • Disable preemption on entry and re-enable on exit • Memory freed using synchronize_sched() • Counts scheduler preemptions • Benefits and trade-offs • Allows use of RCU with preemptible kernel • Read-side critical section won’t be preempted by RT events, negative consequences for RT responsiveness • Additional read-side work to disable/enable preemption

RCU – RT with counters • Global counter • Atomic increment in rcu_read_lock() • Atomic decrement in rcu_read_unlock() • Quiescent state defined as global counter=0 • Not practical • As CPU count increases, counter may never reach 0

RCU – RT with counters • Use two-element array as counter • Atomically increment/decrement as matched pair of ‘current’ and ‘last’ counter • Grace period starts – swap sense of ‘current’ and ‘last’, proceed to only decrement the ‘last’ counter • Counter eventually reaches 0, marking end of grace period • High overhead due to memory contention / cache misses

RCU – Avoiding the cache-miss • 2xN arrays, N=thread count (2 per thread) • Global index • Updated with rcu_read_lock() and rcu_read_unlock() • Requires a grace-period detection state machine

RCU – Avoiding the cache-miss • Improves read-side performance • Avoids cache-miss • Does not require (expensive) atomic instructions • Does not require (expensive) memory barriers • Requires state-machine for grace period detection

RCU – Priority inheritance • Indefinite delays in read-side critical sections • Extends grace period • Exhausts memory since no collection can occur • Writers cannot allocate memory • Need to prevent low-priority threads from being indefinitely preempted • Priority boost would work – but relatively expensive and not required for every reader • Solution is to defer priority boosting • Preempted read-side critical threads added to list • List serves as an ‘aging’ tracker

RCU – Priority Inheritence • Issue List

RCU – Sleepable RCU • Global definition of grace period • Single delayed thread in read-side critical section can stall memory reclamation for everyone • Stall occurs even though reader’s data is unrelated to memory trying to be reclaimed • RCU Control Block • Reader/updater invocations share defined control blocks • Readers won’t block reclamation for unrelated control blocks idx = srcu_read_lock(&scb) /* read-side critical */ srcu_read_unlock(&scb, idx) /* collection */ synchronize_srcu(&scb)

RCU Performance Comparisons Fast concurrent reads Relatively slow writers Preemption & RT support requires increased read-side work

RCU Implementation

RCU Implementation

Presentation Transcript

Implementation

NetApp Rapid Cloning Utility (RCU) Internal Training August, 2009

Ownership, performance and reward at RCU - what works for us

RCU in the Linux Kernel: One Decade Later

What is RCU, Fundamentally?

RCU Usage in Linux

RCU Status

RCU Status

RCU Status

RCU - Final Prototyping The local Slow Control

Firmware for the CPLD on the RCU

TPC Electronics Meeting - RCU DataFlow - Bernardo Mota 13/01/05

Progress on the RCU Prototyping Bernardo Mota CERN PH/ED

RCU DCS interface

Who are RCU Ltd?

RCU – DCS system in ALICE

What is RCU, fundamentally?

6 Chron e RCU 1

RCU to FECs CONNECTORS