1 / 31

Topics in Reliable Distributed Systems 048961

Topics in Reliable Distributed Systems 048961. Fall 2003-2004 Dr. Idit Keidar. Course Overview. Graduate level. Prerequisite: an introductory course on distributed computing. Please see me if you didn’t take one. Format: reading group & seminar.

vadin
Download Presentation

Topics in Reliable Distributed Systems 048961

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topics in Reliable Distributed Systems048961 Fall 2003-2004 Dr. Idit Keidar

  2. Course Overview • Graduate level. • Prerequisite: an introductory course on distributed computing. • Please see me if you didn’t take one. • Format: reading group & seminar. • Covering “hot” topics in reliable distributed computing from recent research papers. • Discussion and evaluation of papers.

  3. So What Are Those “Hot” Topics • Peer-to-peer systems. • Application-level multicast. • Overlay networks. • Gossip-based (epidemic) protocols. • Distributed file systems. • Security (e.g., of all the above). • Solutions of the above for wireless networks.

  4. Requirements and Grading • Reading the papers (one a week). • Handing in short paper summaries – 15% • Participating in class discussions – 10% • Presenting one of the papers – 75% • Select a paper within the next 2 weeks.

  5. Reading The Papers • This is a reading group. • This means that you should read each paper before it is being discussed. • Read the entire paper and be familiar with all its content. • Most will be conference papers. • You don’t need to understand everything, check previous work, or memorize details. • Hand in a short summary of the paper (unless you are presenting it) by e-mail to me the night before the lecture. • Any time before 8:00am the morning of the lecture is considered part of the night before.

  6. Paper Summaries • Total of ½ a page to 1 page long (no more!!). • One paragraph overview • What question is the paper is trying to answer? • What are the main results? • What did you learn? • What questions remain unanswered? • What didn’t you understand? • Short discussion of the paper’s strengths and weaknesses.

  7. Evaluating Paper Strengths and Weaknesses • Is the paper answering the “right” question? • Does it make reasonable assumptions? • How novel is the solution? • Is the solution technically sound? • How well is the solution evaluated? • Expected impact. (Hard to guess). • Writing level: is the paper clearly written? Is it self-contained?

  8. Paper Presentations • You should fully understand the paper, be familiar with previous work, and be able to compare the paper with other similar work. • The presentation should include: • Summary and evaluation. • Comparison with other work. • List of topics to discuss in class. • It is highly recommended to discuss the presentation with me beforehand.

  9. Contact Me • Idit Keidar <idish@ee> • Please send me e-mail with 048961 in the subject, and I’ll add you to the course mailing list. • Office hours: Tue 10:30-11:30 Mayer 960. • Let me know in the coming two weeks what you would like to present. • Schedule will be posted on the course web page.

  10. Background: Multicast

  11. Unicast, Broadcast, Multicast • Unicast – point-to-point communication. • Most network services focus on unicast. • Broadcast – sending to all hosts. • Common and efficient in radio networks, LANs. • Inappropriate for very large widely distributed networks (e.g., the Internet). • Multicast – sending to a selected group of hosts. • mcast(group, msg) – multicast a message. • deliver(msg) – deliver a message.

  12. Multicast Groups • Hosts choose to be members of a group. • Group membership = set of group members. • Group membership is usually dynamic. • Nodes join and leave over time. • Messages should be delivered by the current members. • The term “current” is not very accurate in a distributed system.

  13. Why Multicast? • Content distribution. • E.g., stock prices, live video broadcast, media-on-demand, web caching (Akamai). • Multi-user applications. • E.g., chat, multi-user games, on-line conferencing, and collaborative computing. • Replication of data/services for fault-tolerance. • Multicast to all replicas for state-machine replication. • Parallel programs running on clusters, Grid computing.

  14. Multicast Characteristics • Number of sources – • Single source – point-to-multipoint. • E.g., content distribution (one or few sources). • Multiple sources – multipoint-to-multipoint. • E.g., chat and collaborative computing. • Types of Guarantees – • Best effort – like UDP. • Reliable. What exactly does this mean? • QoS - real time latency and guaranteed bandwidth.

  15. IP Multicast • Best effort. • Extend hosts’ IP protocol stack to support multicast addressing. • Class D addresses. • Extend routers or add multicast routers. • Use hardware broadcast where available (for efficiency). • Minimize inter-LAN traffic by forwarding over gateways only once.

  16. IGMP Internet Group Membership Protocol • Hosts can join and leave groups. • Multicast gateways keep track of which groups have subscribers in their subnet. • Use broadcast on the subnet.

  17. Multicast Routing • Based on multicast trees. • Multiple protocols for maintaining the trees. • Source-based tree for a single active source. • Optimized for shortest paths, e.g., DVMRP (Distance Vector Multicast Routing Protocol). • Shared trees (core-based) for multiple senders. • Messages sent down-stream only. • Trees change as membership changes.

  18. MBone – The Internet’s Multicast Backbone • Hosts and subnets supporting IP multicast. • Virtual topology covering a subset of the Internet with “islands” • Routing between islands in “tunnels”. • Virtual point-to-point links encapsulating multicast messages as IP over IP. • Incremental deployment of IP multicast in islands.

  19. Status of IP Multicast • Gained popularity in the 90s. • 1992: 20 subnets on the MBone. • 1996: 2800 subnets on the MBone. • But did not “catch on” further. • Avoided by organizations fearing flooding of their networks with multicast traffic. • Now mostly unavailable.

  20. Application-Level Multicast (ALM) • The new kid on the block. • A.k.a. End-host multicast. • Do-it-yourself multicast – • No network support or multicast routers. • Usually use unicast communication only. • Hosts organize themselves in a multicast group. • Fits well with the peer-to-peer philosophy: self-organizing dynamic systems.

  21. Pros and Cons?

  22. Overlay Networks • A virtual structure imposed over the physical network (Internet). • Over the Internet, there is a (high-level) unicast link between every pair of hosts. • An overlay uses a fixed subset of these unicast links. • Most popular approach to ALM. • Many recent examples: Narada (end-host-multicast), Overcast, ScatterCast, ALMI, Scribe, Bayeux, Yoid, NICE, SelectCast, RelayCast, Jungle Monkey, I3, Bullet, SplitStream… • Some will be presented in this course.

  23. Characteristics of ALM Overlays • Most multicast on a spanning tree. • Pros and cons? • Balanced trees are important. Why? • But many have extra links in the overlay for control, and for back-up when a spanning tree link fails. (E.g., Yoid, Overcast, SelectCast, HMTP). • Most are intended for single-source multicast. • Most provide best-effort reliability. • Replacing IP Multicast. • Some provide reliability via TCP links, buffering, loss detection, and retransmissions.

  24. Reliable Multicast • In the “old days”, solutions based on IP Multicast. • RMTP (Reliable Multicast Transport Protocol) • Single source only. • SRM (Scalable Reliable Multicast) • Multiple sources. • Let the application decide what “reliability” is, determine policy for buffer management and retransmissions. • Not so scalable after all.

  25. Group Communication Toolkits • Supporting strong reliability. • Virtual Synchrony model. • Addressing membership, reliability, flow control, message ordering, etc. • First in LAN only, later in WANs. • Example systems: Isis, Horus, Ensemble, Transis, Psync, Phoenix, Relacs, Newtop, Totem, NavTech, RMP, Spread, Xpand.

  26. Virtual Synchrony Semantics[Birman, Joseph 87] • Group members all see events in same order • Events: messages, process crash/join. • Basic component: group membership • Reports changes in set of group members. • Each member knows of all the others. • Powerful abstraction for fault-tolerant “state-machine” replication. • Connected members go through same states. • New members get state transfer. • Inherently not scalable.

  27. Gossip-Based Multicast • Spread information by gossiping about it with your friends. • Also called epidemic algorithms. • Another family of ALM systems, although often not thought of this way. • Randomized algorithms with probabilistic reliability guarantees.

  28. Gossip-Based Multicast • Each node divides its time into gossip rounds. • In each round, the node exchanges information with F random nodes. • Push-based: send contents of buffer to them. • Pull-based: ask them for their buffer content. • Push and pull can be combined. • Optimization: send digest of message names, request missing ones. Pros and cons? • Upon receipt of a message (from other node or application), insert into message buffer. • Purge old messages from buffer (different policies).

  29. Requirements and Guarantees(From the Math) • In order for gossip to “work”, each node sends each message O(logN) times. • F*(num rounds in buffer) is O(log N). • This requires each node to know at least O(logN) others (partial membership view). • Reliability very close to 100%. • Graceful degradation with increasing node crash and message loss rates. • Expected latency O(log N).

  30. A Brief History of Gossip • First used for anti-entropy in maintaining consistency of databases of mobile users. • Demers et al. 1987. • Used for probabilistic reliable multicast (pbcast) over IP Multicast in Ensemble using complete group membership. • Recent work uses gossip by itself, without IP multicast, using its own membership. • E.g., Lightweight probabilistic broadcast (lpbcast). • Not only for multicast.

  31. Preview • We’ll look closely at specific systems. • ALM overlays, reliable and best-effort. • Gossip-based algorithms. • We’ll also look at security considerations and mobility considerations of such systems. • We’ll put ALMs in the context of peer-to-peer computing.

More Related