Garbage Collecting the World Bernard Lang Christian Queinnec Jose Piquer Presented by Yu-Jin Chia See also: pp 234-237 text
Objective • Fault Tolerant • Processor Independent • Non-centralized • Non-blocking • Multiple instances can run • No object passing • Complete garbage reclamation
Dynamic Group Structure • Processors cooperate to GC the group • Works faster with small groups • Groups can reorganize dynamically • Scales well. • Works well despite failures or additions. • Suitable for: large heterogeneous networks, distributed symbolic computation, distributed file systems, distributed databases.
Processor Level • Can use any kind of local GC scheme (e.g. Mark and Sweep) • Use Reference Counters to track remote objects • Terminology: Entry item (a.k.a. ‘skeleton,’ a reference to a local object that’s another node’s remote object), Exit item (a.k.a. ‘proxy,’ a reference to remote object). • Entry items have an RC = # exit items referring to it. • Use a scheme like Generational RC.
Group Negotiation • Each node chooses when to join a GC group, and which group to join. • Each node in a group is aware of the others. • Groups last until the algorithm terminates (a.k.a. a ‘GC cycle’) • Any group formation method suffices. • Unique identifier for each group and GC cycle.
Initial Marking - Marks are assigned by group, and only meaningful to that group. Similar scheme to ‘tracing in groups’ p. 234 text. Skeleton: soft or hard. Proxy: soft, hard, or none.
Christopher’s Algorithm • Copy the RC for each node in the group, for each skeleton. • Check all proxies in the group, decrement RCs on corresponding skeletons. • When all are done, any skeletons with a positive RC are hard, else soft.
Propagation (2-Phase Marking) Local (inside each node): • Set proxy marks to none. • Trace from hard skeletons and root nodes, marking all proxies hard. • Trace from soft skeletons, changing none to soft. Hard or Root Node Soft Hard Soft
Global Propagation • Propagation always occurs within the group under consideration. • Upon completion of a GC cycle, a node may start a new one once its proxies have received hard marks. • If a new remote reference is created, mark its skeleton hard in advance. Hard Node A Hard or root Hard: referenced by local hard item or root Node B
Stabilization Group Stability: • All nodes are ‘stable.’ • No messages in transit that would mark a skeleton hard. Stable: • No new data that would harden more skeletons in the group. • Can be lost if a new proxy is made, or when a skeleton is mobile. • Given no node failure and the finite number of skeletons that may be allocated, this will happen eventually.
Dead Cycles Removal • Done individually on each node, without group synchronization. • Modify soft skeletons to reference nil. • Reclaim the related proxies locally. • Send decrement messages to skeletons. • When a skeleton’s RC hits 0, reclaim. • When GC is finished, group may disband.
Failure • Detection is done separately. • Can either wait for the node to wake up, or reorganize the group. • If the latter, keep hard marks, start at skeleton propagation. • If multiple failures, multiple groups may form from an old one.
Non-Recoverable Failure • What do we do about objects referenced by the failed node? • What do we do about references to the failed node? • What do we do about objects on the failed node that are possibly recoverable?
#1 Assuming your skeleton RCs are up to date… • Run Christopher’s Algorithm on the new group. • Those skeletons whose RCs have changed reference the missing node. • Do something about it.
A Problem G3 G1 G2
Simultaneous Group Collections • Don’t want to run GC at a node multiple times if it’s in more than one GC group. • Local GCs can just track the marks for each group GC. Fast if the marks tend to be similar. • However, if one group is a subgroup of another, the marks at skeletons may conflict. This can slow down group GC. • Nodes in an overlapping area between largely unrelated groups is even worse- 50% performance on average.
Hierarchical Group Cooperation Definitions: • Universal Group: The set of all nodes. • Level Index: For each group, the number of larger groups for which it is a subgroup. This may change and must be updated. - By definition, at any time for any node, the level indexes uniquely identify the groups to which the node belongs.
Level Indexes Level 0 Level 1 Level 2
Objective • Want each local GC to contribute to all GC groups in which it is a part. • By previous definitions, if a skeleton is marked hard in one group, it can be safely marked hard in its subgroups. • Define Markx(N) as the mark on level x, which is the lowest level index for which node N is marked hard.
Multi-Phase Marking • Propagate Mark() entries to a proxies, which record the lowest level (biggest group) that has a reference to it. • The proxies are now hard for this level, and any higher ones. • Stability is reached on a node for a particular level when all resident proxies won’t have their marking level reduced to less than or equal that level. • When all groups stabilize for a particular level at a node, all resident skeletons with a higher level can be deleted.
Next Cycle • For next cycle, reinitialize by checking skeleton references. • If there are references external to the group, the marking level is set to that of the group, unless it’s already smaller.
True? “If a skeleton is marked hard in one group, it can be safely marked hard in its subgroups.” Not necessarily. Low markings might be incorrect due to the disappearance of a link from a larger group. Jobs aren’t the only thing being outsourced…
Keeping Reference Counts • Avoids re-running Christopher’s Algorithm if groups don’t change. • Useful for failure recovery. How? • Instead of counting all references at each skeleton, have a global count for the whole network (level 0).
The Equation • i≥1, DiffN[i, x] = CountN[i-1, x] – CountN[i, x] Where N= the node, i = the level, x = the skeleton • Vector of differences between the RC for each skeleton at this level and the next lower level (group and supergroup). • Initial = the smallest i s.t. DiffN[i, x] ≠ 0 (or existing one if smaller). 0 if all are 0. • Fewer updates required- only global count and one difference count with each added proxy. • Counts often smaller than conventional RCs.