280 likes | 417 Views
Edelweiss presents a novel approach to managing mutable shared state in distributed applications by automating storage reclamation and using event logging as a core design pattern. Through a Datalog-based language, Edelweiss eliminates the need for manual mutation and deletion, allowing users to define queries that dictate which log entries are relevant for current operations. This system enhances concurrency, replication, and ease of use while significantly reducing the burden of managing garbage collection in stateful applications. The approach leads to concise, high-level programs that automatically handle storage management.
E N D
Edelweiss:Automatic Storage Reclamation for Distributed Programming Neil Conway Peter Alvaro Emily Andrews Joseph M. Hellerstein University of California, Berkeley
Mutable shared state Frequent sourceof bugs Hard to scale
Accumulate& exchange sets of immutable events • No mutation/deletion • To delete: add new event • “Event X should be ignored” • Current state: query over event log EventLogging
Example: Key-Value Store Event Logging i_log = Set.new d_log = Set.new Insert(k, v): i_log << [k,v] Delete(k): d_log << k View(): i_log.notin(d_log, :k => :k) Mutable State tbl = Hash.new Insert(k, v): tbl[k] = v Delete(k): tbl.delete(k) View(): tbl Update-in-place Set union Deletion Compute “live” keys
Benefits of Event Logging • Concurrency • Replication • Undo/redo • Point-in-time query, audit trails (Sometimes: performance!)
Example Applications • Multi-version concurrency control (MVCC) • Write-ahead logging (WAL) • Stream processing • Log-structured file systems Also: CRDTs, tombstones, purely functional data structures, accounting ledgers.
Observation: Logs consume unbounded storage Solution: Discard log entries that are“no longer useful”(garbage collection)
Observation: Logs consume unbounded storage Challenge: Discard log entries that are“no longer useful”(garbage collection)
Traditional Approach “No longer useful” defined by application semantics • No framework support • Every system requires custom GC logic • Reinvented many times • >25 papers propose ~same scheme!
Engineering Challenges • Difficult to implement correctly • Too aggressive: destroy live data • Too conservative: storage leak • Ongoing maintenance burden • GC scheme and application code must be updated together
Our Approach • New language: Edelweiss • Based on Datalog • No constructs for deletion or mutation! • Automatically generate safe, application-specific distributed GC protocols • Present several in-depth case studies • Reliable unicast/broadcast, key-value store, causal consistency, atomic registers
Base Data (“Event Logs”) Derived Data ( “Live View”) Query
A log entry is useful iff it might contribute to the view. The queries define how log entries contribute to the view. Goal:Find log entries that will never contribute to the viewin the future.
Semantics of Base Data • Accumulate and broadcast to other nodes • Datalog: monotonic • Set union: grows over time • CALM Theorem [CIDR’11]: event log guaranteed to be eventually consistent
Semantics of Derived Data Growsand shrinksover time • e.g., KVS keys added and removed Hence,not monotonic
Common Pattern Live View = set difference between growing sets
Semantics of Set Difference X= Y – Z • Z grows: Xshrinks • If tappears in Z, t will never again appear in X • “Anti-monotone with respect to Z” i_log = Set.new d_log = Set.new Insert(k, v): i_log << [k,v] Delete(k): d_log << k View(): i_log.notin(d_log, :k => :k) Can reclaim from i_logupon match in d_log
Other Analysis Techniques • Reclaim from negative notin input • Often called “tombstones” • E.g., how to reclaim from d_log in the KVS • Reclaim from join input tables • DisseminateGC metadata automatically • Exploit user knowledge for better GC • Punctuations [Tucker & Maier ‘03]
Whole Program Analysis • For each query q, find condition when input t will never contribute to q’s output • “Reclamation condition” (RC) • For each tuple t, find the conjunction of the RCs for t over all queries • When all consumers no longer need t: safe to reclaim
Input program + deletion rules “Positive” program:no deletion or statemutation Edelweiss Input Program Source To Source Rewriter Datalog Output Program Datalog Evaluator Compute RCs, add deletion rules
Comparison of Program Size Only19 rules!
Takeaways • No storage management code! • Similar tomalloc/free vs. GC • Programs are concise and declarative • Developer: just compute current view • Log entries removed automatically • Reclamation logic application code always in sync
Conclusions • Event logging: powerful design pattern • Problem: need for hand-written distributed storage reclamation code • Datalog: natural fit for event logging • Storage reclamation as a compiler rewrite? Results: • Automatic, safe GC synthesis! • High-level, declarative programs • No storage management code • Focus on solving domain problem
Future Work: Checkpoints • Closely related to simple event logging • Summarize many log entries with a single “checkpoint”record • View = last checkpoint + Query(¢Logs) • General goal: reclaim space by structural transformation, not just discarding data
Future Work: Theory • Current analysis is somewhat ad hoc • If program does not reclaim storage, two possibilities: • Program is “not reclaimable” in principle • (Possible program bug!) • Our analysis is not complete • (Possible analysis bug!) How to characterize the class of “not reclaimable” programs?
Reclaiming KVS Deletions • Good question • X.notin(Y): how to reclaim from Y? • Y is a dense ordered set; compress it. • Prove that each Y tuple matches exactly oneX tuple i_log = Set.new d_log = Set.new Insert(k, v): i_log << [k,v] Delete(k): d_log << k View(): i_log.notin(d_log, :k => :k) k is a keyof i_log