1 / 89

Epidemics

Epidemics. by Charles Yang & Ted Pongthawornkamol 9/16/20. Prelude: Multicasting. Many protocols MBONE, 6BONE, XTP, etc. Principally designed for scalability Fault tolerance really isn’t addressed. Multicasting (cont…). Scalable Reliable Multicast

ila-riley
Download Presentation

Epidemics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Epidemics by Charles Yang & Ted Pongthawornkamol 9/16/20 Epidemics, CS 598 IG, Fall 2004

  2. Prelude: Multicasting • Many protocols • MBONE, 6BONE, XTP, etc. • Principally designed for scalability • Fault tolerance really isn’t addressed Epidemics, CS 598 IG, Fall 2004

  3. Multicasting (cont…) • Scalable Reliable Multicast • But as graph shows, not that scalable Epidemics, CS 598 IG, Fall 2004

  4. So now what? Epidemics! • Recap from Indy’s 1st lecture: • Definitions: • Infective – node with update it wants to share • Susceptible – node which has not yet received the update • Removed – previously infective node which is no longer sharing Epidemics, CS 598 IG, Fall 2004

  5. Recap (cont…) • Infective node n receives a msg and forwards with probability p to a susceptible node • Can be shown that spreads quickly with high probability • Lightweight • Highly fault-tolerant Epidemics, CS 598 IG, Fall 2004

  6. Outline of Presentation • Epidemic Algorithms for Replicated Database Maintenance • Bimodal Multicast • Gossip-Based Ad Hoc Routing Epidemics, CS 598 IG, Fall 2004

  7. Epidemic Algorithms for Replicated Database Maintenance • Xerox’s Corporite Internet (CIN), Clearinghouse Servers, about 1986-1987 • Name resolution service • several hundred ethernets, connected by gateways and phone lines • DB’s were filling up bandwidth for replication Epidemics, CS 598 IG, Fall 2004

  8. The Problem • Inject an update at one server, and have it propagate to all other servers • how to make it robust and scale well? • important factors: • convergence time – time req’d for update to propagate to all sites • network traffic – traffic req’d to propagate a single update (want to minimize!) Epidemics, CS 598 IG, Fall 2004

  9. 3 Methods for Spreading Updates • direct mail (basically multicast or flooding) • anti-entropy (epidemic) • rumor mongering/gossiping (epidemic) Epidemics, CS 598 IG, Fall 2004

  10. CIN’s Initial Configuration • Direct Mail to send updates • Anti-entropy to bring DB’s to sync • Re-mailing if previous anti-entropy disagreed • Anti-entropy Run once/day between 12am to 6am • Eventually, anti-entropy couldn’t complete in allowed time due to traffic • For instance, for a domain stored at 300 sites, 90,000 messages might be introduced 1 night Epidemics, CS 598 IG, Fall 2004

  11. Direct Mail s Epidemics, CS 598 IG, Fall 2004

  12. Direct Mail s Epidemics, CS 598 IG, Fall 2004

  13. Direct Mail s Epidemics, CS 598 IG, Fall 2004

  14. Direct Mail Issues • a lot of b/w - n messages per update • not quite reliable: message can be lost (crashes, buffer overflows) • s may also not have current knowledge of S (set of all sites) Epidemics, CS 598 IG, Fall 2004

  15. Anti Entropy • Run in bg to recover from errors • initially from direct mail, later from rumor mongering • Executed periodically FOR SOME s’  S DO ResolveDifference[s, s’] ENDLOOP Epidemics, CS 598 IG, Fall 2004

  16. Anti-Entropy (after direct mail) s Epidemics, CS 598 IG, Fall 2004

  17. Anti-Entropy (Cycle 1, start) s Epidemics, CS 598 IG, Fall 2004

  18. Anti-Entropy (Cycle 1, end) s Epidemics, CS 598 IG, Fall 2004

  19. Anti-Entropy (Cycle 2, start) s Epidemics, CS 598 IG, Fall 2004

  20. Anti-Entropy (Cycle 2, end) s Epidemics, CS 598 IG, Fall 2004

  21. Anti-Entropy (Cycle 3, start) s Epidemics, CS 598 IG, Fall 2004

  22. Anti-Entropy (Cycle 3, end) s Epidemics, CS 598 IG, Fall 2004

  23. Anti Entropy (cont…) • Assume s’ is chosen uniformly (talk about spatial distribs later) • slow and expensive, but reliable • since usually used as backup, the # of susceptible sites is small • Pull, Push-pull, push Epidemics, CS 598 IG, Fall 2004

  24. Pull • pi is prob that site remains susceptible in ithcycle • A site remains susceptible after i+1stcycle if: • it was susceptible after ith cycle • and it contacted a susceptible site in i+1st cycle  pi+1 = (pi)2, • converges rapidly to 0 when pi is small • In other words: very unlikely that susceptible sites will remain after a while Epidemics, CS 598 IG, Fall 2004

  25. Push • A site remains susceptible after i+1stcycle if: • it was susceptible after ith cycle • and no infectious site contacted it in i+1st cycle pi+1 = pi(1-1/n)n(1-pi) • Approximately: pi+1 = pie-1 • Converges too, but not nearly as quick as pull • Hence: pull, or push-pull is preferred to just push Epidemics, CS 598 IG, Fall 2004

  26. Some Anti-Entropy Optimizations • Comparing DB’s is expensive, but since most DB’s are pretty similar… • Could maintain checksum of db • compare checksums • If don’t match, then start comparing DB’s • Naïve! Epidemics, CS 598 IG, Fall 2004

  27. Optimizations (cont…) • Define time window  (time that updates should be spread by) • Keep checksums of database AND a recent update list w/age <  • 2 sites first exchange checksums and recent update list • compute new checksums, and then compare •  must be chosen well • If n grows too much • expected time for msg spread >  • recent update lists likely to be diff • Another variation: inverted index of db by timestamp • sites can exchange updates in reverse timestamp order until the checksums match Epidemics, CS 598 IG, Fall 2004

  28. Complex Epidemics / Rumor Mongering / Gossip • Replace multicasting • At the expense of slightly larger convergence time • And a distinct, though very small probability of failure • Called complex just to distinguish from simple epidemics like anti-entropy Epidemics, CS 598 IG, Fall 2004

  29. Basic (Complex) Epidemic • Susceptible site receives a hot rumor and becomes infective • Randomly shares with another susceptible site • “Uniform at Random” • When contacts a site that knows rumor already • probability 1/k lose interest in sharing the rumor (and become removed) • After a while, high probability that everyone knows Epidemics, CS 598 IG, Fall 2004

  30. Can model with differential equations (fun!) • s+i+r=1 • Differentiate… Epidemics, CS 598 IG, Fall 2004

  31. c is determined by i(1-)= • For large n,  goes to zero… • Giving a solution: • i(s) is zero when: s=e-(k+1)(1-s) • Yeah, yeah… so what does it mean? • implicit equation for s • s decreases exponentially with k (1/k = prob site becomes removed) • k=1, 20% will miss • k=2, 6% will miss • So with each consecutive round, high probability there will be no susceptibles left Epidemics, CS 598 IG, Fall 2004

  32. Can vary complex epidemics • Concerned with: • Residue – when i is zero, what’s s? (people who never heard the rumor) • Traffic • Delay • tavg - time for a random node to receive the msg • tlast - time for the last node who will receive the msg, to receive it Epidemics, CS 598 IG, Fall 2004

  33. Variations (cont…) • Blind vs Feedback • blind loses interest with 1/k no matter if contacted node knew msg or not • Counter vs Coin • With counter, can lost interest after k unnecessary contacts • Push vs Pull • Basic used push, but can use pull • will work if high number of independent updates • but when db is quiescent, more useless overhead than push Epidemics, CS 598 IG, Fall 2004

  34.  Variations (cont…) • Minimization • Use a push and pull together, and if both sides know update, then the site with smaller counter is incremented (equality, both incremented) • Connection limit • If there’s a lot of updates, need a connection limit • Pull gets worse but push gets better! • Hunting • If one connection rejected, try another Epidemics, CS 598 IG, Fall 2004

  35. So instead of mailing & anti-entropy • Use rumor mongering • And back up with anti-entropy Epidemics, CS 598 IG, Fall 2004

  36. Death Certificates • With anti-entropy, deletion doesn’t really work • absence of entry will be replaced by an old version • Death Certificates • carry timestamps • when compared with older entry, the older entry is deleted • they take up space • but if you delete them, risk chance of seeing old resurrected data • Enter: Dormant Death Certificates Epidemics, CS 598 IG, Fall 2004

  37. Dormant Death Certificates • Two thresholds 1 and 2 • Each server retains DC within 1 • After 1 , most sites delete DC, while a few keep it • If old data meets dormant DC, propagate the DC again • After 1 + 2 , delete the dormant DC Epidemics, CS 598 IG, Fall 2004

  38. Dormant DCs (cont…) • Does not scale indefinitely • n grows so much, time to propagate DCs exceeds 1 • More likely to activate dormant DCs, which are propogated adding to overhead… • “The ultimate result is catastrophic failure.” Epidemics, CS 598 IG, Fall 2004

  39. Dormant DCs (cont…) • Don’t spread dormant DC • And if reactivated, can reset timestamp • But this is wrong (might cancel a legitimate update) • So use second ts called activation timestamp which is set if it’s reactivated Epidemics, CS 598 IG, Fall 2004

  40. Spatial Distributions • networks aren’t heterogeneous • some links are slower than others • can be broken up into different types of zones • we want to favor locality as we spread updates to minimize traffic Epidemics, CS 598 IG, Fall 2004

  41. Spatial Distributions (cont…) • probability of connecting to a site at distance d is 1/da, where a is to be determined • intuitively, a indicates the amount of locality you’re going to be connecting at • So: increase in a -> increase in locality • w/ increased locality, need to compensate in order to “break out of” locality • more connections • more rounds • Also generalized to more more dimensions 1/d-2D Epidemics, CS 598 IG, Fall 2004

  42. Spatial Distribution • Anti-Entropy • notice Bushey (trans-Atlantic) traffic • uniform (75.74) vs a=2 (2.38) • For gossiping: • since rumors eventually become inactive, it needs to spread a lot in the beginning • hence, pump up k Epidemics, CS 598 IG, Fall 2004

  43. Summary for Demers et al • Direct Mailing • Rumor Mongering • Anti-Entropy • Issues • Research into effect of and optimizing for topology • Need to know S • Scalability with n • churn • Bimodal Multicast will address: • What about throughput stability • What about higher rate of msgs? Epidemics, CS 598 IG, Fall 2004

  44. Bimodal multicast • A technique to apply epidemic concept to achieve scalable and reliable multicast • Use epidemic in term of anti-entropy • Randomly choose members in the group • Synchronize state Epidemics, CS 598 IG, Fall 2004

  45. Two classes of multicast • strong reliability • atomicity • delivery ordering • virtual synchrony • security • real-time • more overhead, unpredictable behavior under some situations • best-effort reliability • scalable • provide no end-to-end delivery • No strong membership view • Certain level failure discovery • SRM,MUSE,RMTP,etc. Epidemics, CS 598 IG, Fall 2004

  46. Multicast : Examples • Virtual synchrony • Strong reliable • significant degradation even just few node failures • suitable for small groups, limited to short bursts of multicasts • SRM • Best-effort reliable • Error-prone to stochastic failures • Meltdown can occur in large network • None of them addresses stability problem under failures Epidemics, CS 598 IG, Fall 2004

  47. Fault-tolerance problem • Virtual synchrony perform badly under failures Epidemics, CS 598 IG, Fall 2004

  48. Bimodal multicast • Also called probabilistic broadcast (pbcast) • fill the gap between two approaches • scalable • predictably reliable even under bad conditions • Complement with existing mechanism, such as Virtual Synchrony • Atomic • Provide stability • Throughput stability • Multicast stability Epidemics, CS 598 IG, Fall 2004

  49. Pbcast protocol • consists of two concurrent subprotocols • Optimistic dissemination protocol , such as IP-multicast • Two-phase anti-entropy protocol to deal with synchronization problem • first phase detect packet message loss • second phase corrects losses Epidemics, CS 598 IG, Fall 2004

  50. Optimistic dissemination protocol • each nodes must possess the list of all members • generate set of spanning trees • Simple algorithms • Randomly choose a spanning tree • every node uses the same spanning tree to forward the message • A set of spanning trees is needed to calculate each time nodes join or nodes leave Epidemics, CS 598 IG, Fall 2004

More Related