1 / 49

Reliable Multicasting with JGroups

Reliable Multicasting with JGroups. Bela Ban, Jan 2004 belaban@yahoo.com http://www.jgroups.org. Overview. API, architecture Protocols Building Blocks Performance Future, Conclusion. What Is It ?. Toolkit for reliable multicasting Fragmentation Message retransmission Ordering

trula
Download Presentation

Reliable Multicasting with JGroups

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reliable Multicasting with JGroups Bela Ban, Jan 2004 belaban@yahoo.com http://www.jgroups.org

  2. Overview • API, architecture • Protocols • Building Blocks • Performance • Future, Conclusion EBIG, Oakland Jan 21 2004

  3. What Is It ? • Toolkit for reliable multicasting • Fragmentation • Message retransmission • Ordering • Group membership, membership change notification • LAN or WAN based EBIG, Oakland Jan 21 2004

  4. License • JGroups is a toolkit (JAR), to be linked against an application • Open Source under LGPL • Commercial products can use JGroups without having to LGPL their code • Modifications to JGroups itself need to be LGPL'ed (if distributed) • Dual licensing in the future EBIG, Oakland Jan 21 2004

  5. API • Channel: similar to java.net.MulticastSocket • plus group membership, reliability • Operations: • Create a channel with a set of properties • Connect to a group X. Everyone that connects to X will see each other • Send a message to all members of X • Send a message to a single member EBIG, Oakland Jan 21 2004

  6. API • Receive a message • Retrieve membership • Be notified when members join, leave (including crashes) • Disconnect from the group • Close the channel EBIG, Oakland Jan 21 2004

  7. API JChannel channel=new JChannel("file://home/bela/default.xml"); channel.connect("demo-group"); System.out.println("members are: " + channel.getView().getMembers()); Message msg=new Message(null, null, "Hello world"); channel.send(msg); Message m=(Message)channel.receive(0); System.out.println("received msg from " + m.getSrc() + ": " + m.getObject()); ch.disconnect(); ch.close(); EBIG, Oakland Jan 21 2004

  8. Group topology EBIG, Oakland Jan 21 2004

  9. Architecture of JGroups

  10. Demo • Draw • ReplicatedTree: shared state EBIG, Oakland Jan 21 2004

  11. Stats • JGroups has ~ 90KLOC • 30KLOC protocols • 45KLOC main + building blocks • 15KLOC unit tests • ~ 90 protocols shipped with JGroups • Set of well-tested stacks (in XML files) EBIG, Oakland Jan 21 2004

  12. Available protocols I • Transport • UDP, TCP, TCP_NIO, TUNNEL, JMS, LOOPBACK • Discovery • PING, TCPPING, TCPGOSSIP, UDPPING • Group membership • Reliable delivery & FIFO • NAKACK, SMACK, UNICAST EBIG, Oakland Jan 21 2004

  13. Available protocols II • Failure detection • FD, FD_SOCK, FD_PID, FD_SIMPLE, FD_PROB, VERIFY_SUSPECT • Security • ENCRYPT, SSL ConnectionTable (n/a) • Fragmentation (FRAG) • State transfer (STATE_TRANSFER) EBIG, Oakland Jan 21 2004

  14. Available protocols III • Ordering • FIFO, CAUSAL, TOTAL, TOTAL_TOKEN • Virtual Synchrony • FLUSH, QUEUE, VIEW_ENFORCER • Probabilistic Broadcast • PBCAST • Merging: • MERGE(2), MERGEFAST EBIG, Oakland Jan 21 2004

  15. Available protocols IV • Distributed message garbage collection • STABLE • Debugging • PERF, TRACE, PRINTOBJS, SIZE, BSH • Simulation • SHUFFLE, DELAY, DISCARD, DEADLOCK, LOSS, PARTITIONER EBIG, Oakland Jan 21 2004

  16. Available protocols V • Dynamic configuration • AUTOCONF • Flow control • FLOW_CONTROL, FC • Misc • PIGGYBACK, COMPRESS EBIG, Oakland Jan 21 2004

  17. Transport • Task • Send messages from above to all members in the group, or to a single member • Receive messages from NW, pass up stack • UDP: multicast and multiple UDP unicast • TCP: mcast done by multiple TCP unicasts • TUNNEL: send to external router, e.g. through firewall EBIG, Oakland Jan 21 2004

  18. Discovery • Task • Initial discovery of members • Used by GMS to determine coordinator to send JOIN request to • Each member returns its own addr, plus the addr of the coordinator • Typical response ({A,A}, {B,A}, {C,A}) • Wait for n milliseconds or m responses EBIG, Oakland Jan 21 2004

  19. Discovery - UDP • Multicast discovery request • Each member responds with a unicast UDP datagram (local-addr, coord-addr), back to the sender EBIG, Oakland Jan 21 2004

  20. Discovery - TCPGOSSIP • Can be used by both UDP and TCP • External GossipServer • org.jgroups.stack.GossipServer • Maintains table of <group, members> • Each member registers (groupname, own addr) • Lease based - members have to periodically renew registration • Multiple GossipServers possible EBIG, Oakland Jan 21 2004

  21. Discovery - TCPGOSSIP • To obtain initial membership for a given group, TCPGOSSIP contacts the GossipServer • Membership info does not need to be accurate - only goal is to determine coord to send JOIN request to EBIG, Oakland Jan 21 2004

  22. Discovery - TCPPING • Give a set of well known members • For discovery, those members are pinged • If at least 1 responds, we can find the coordinator • Does not require additional process EBIG, Oakland Jan 21 2004

  23. Group Membership • Task • Maintain a list of members • Notify members when a new member joins, or an existing member leaves (or crashes) • Each member has the same ordered list • List can be retrieved by Channel.getView() • First (= oldest) member is coordinator • If coord crashes, 2nd oldest takes over EBIG, Oakland Jan 21 2004

  24. Group Membership - JOIN • New member uses discovery to find coord • If first member -> become coord • Else: sends JOIN to coord • Coord adds new member to list, multicasts new view (member list) to all members • If 2 initial members are started at the same time, MERGE protocol merges them into a single group EBIG, Oakland Jan 21 2004

  25. Group Membership - LEAVE • Member sends LEAVE to coord • Coord multicasts new view to all members EBIG, Oakland Jan 21 2004

  26. Group membership - CRASH • Failure detection protocol sends up SUSPECT event • VERIFY_SUSPECT double checks • GMS multicasts new view (not containing crashed member) • If member resurfaces, it will be shunned • Has to leave and rejoin group EBIG, Oakland Jan 21 2004

  27. Failure detection • Task • Detect if a member has crashed and send SUSPECT event up the stack (to be handled by GMS) • Logical ring over membership • Each member pings its neighbor to the right EBIG, Oakland Jan 21 2004

  28. Failure detection - FD EBIG, Oakland Jan 21 2004

  29. Reliable delivery & FIFO • Lossless and FIFO delivery for multicast and unicast messages • Multicast: NAK and ACK • Unicast: ACK • Missing messages (gaps) are retransmitted • Sender resends or • Receiver requests retransmission EBIG, Oakland Jan 21 2004

  30. Encryption • Uses public/private encryption to join new member and get shared group key • Shared key is used to encrypt all messages • Group key is recomputed on joins/leaves • SSL ConnectionTable • As alternative, to be used in TCP • Uses SSLSocket rather than Socket EBIG, Oakland Jan 21 2004

  31. Properties configuration • Plain string format • "UDP(mcast_addr=228.8.8.8;mcast_port=45566;ip_ttl=32;" + • "mcast_send_buf_size=64000;mcast_recv_buf_size=64000):" + • "PING(timeout=2000;num_initial_members=3):" + • "MERGE2(min_interval=5000;max_interval=10000):" + • "FD_SOCK:" + • "VERIFY_SUSPECT(timeout=1500):" + • "pbcast.NAKACK(max_xmit_size=8096;gc_lag=50;retransmit_timeout=600,1200,2400):" + • "UNICAST(timeout=600,1200,2400,4800):" + • "pbcast.STABLE(desired_avg_gossip=20000):" + • "FRAG(frag_size=8096;down_thread=false;up_thread=false):" + • "pbcast.GMS(join_timeout=5000;join_retry_timeout=2000;" + • "shun=false;print_local_addr=true)" • URL / XML EBIG, Oakland Jan 21 2004

  32. Advantages of protocol stacks • Each property is implemented by 1 prot • Fragmentation, retransmission, ordering • Protocols are assembled into a stack • Stack has exactly the properties needed by the appl / required by the network • Can‘t get this with java.net.Socket, always comes with full TCP/IP EBIG, Oakland Jan 21 2004

  33. Advantages of protocol stacks • Small scope: a protocol does just one job, but does it well • Protocol stacks are fashionable: • Servlet 2.3 filters • Interceptors (Corba, JBoss) • AOP: separation of concerns, e.g. fragmentation should not be an application concern EBIG, Oakland Jan 21 2004

  34. Benefits • Same application code, different protocol stacks (deployment issue) • Application requirements reflected in protocol stack specification • App focuses on domain specific issues EBIG, Oakland Jan 21 2004

  35. Building Blocks • Replicated Cache • NotificationBus • Group RPC EBIG, Oakland Jan 21 2004

  36. Replicated Cache • Shared state across a group • Any change is replicated to all members • New members acquire initial state from coord • Structures supported • Tree • Hashmap • Queues EBIG, Oakland Jan 21 2004

  37. NotificationBus • Thin layer on Channel • Notifications sent to all members • Callback when notification is received • Hook for state sharing EBIG, Oakland Jan 21 2004

  38. Group RPC • Invoke a method call in all members • Get a list of responses • Wait for all responses, majority, first, or none response (use optional timeout) • Handles crashed members correctly (no blocking) EBIG, Oakland Jan 21 2004

  39. Serverless JMS • JMS based on JGroups • Peer-to-peer architecture rather than C/S • Client publishing to a topic • Instead of sending msg to server, and server distributes to multiple clients: publisher multicasts message • JMS Server just another member • Handles persistent messages (DB) EBIG, Oakland Jan 21 2004

  40. Serverless JMS Cost: 4 unicasts Cost: 1 multicast EBIG, Oakland Jan 21 2004

  41. Serverless JMS • Clients are still able to publish even when server is down • Caveat: works in scenario where client and server are in same multicast-reachable NW • Status • Topics/Queues available • No TX/XA, no durable subscriptions, no persistent messages • Download (standalone) beta at jboss.org EBIG, Oakland Jan 21 2004

  42. Where is JGroups used ? • JBoss • Clustering • Replication of entity beans, SLSBs and SFSBs • HA-JNDI • Cache invalidation • Session repl (integrated Tomcat, Jetty) • Serverless JMS • Cache • Replicated transactional clustered cache EBIG, Oakland Jan 21 2004

  43. Where is JGroups used ? • Jonas appserver (clustering) • GroupPac (FT-CORBA impl) • GCT: port to .NET • Replicated Caching • OpenSyphony OSCache • Jakarta Turbine's JCS • Swarmcache EBIG, Oakland Jan 21 2004

  44. Where is JGroups used ? • Session replication • Jetty • Tomcat 4.x • Work in progress on plugin architecture for Tomcat 5.x • Unofficial ones... EBIG, Oakland Jan 21 2004

  45. Performance • 4 nodes, 1 or 2 senders • 750MHz SunBlade 1000 512MB, 100MB switched ethernet • JGroups 2.1 • 8000 10K msgs, in 200 bursts of 20 (2 senders), sleep after burst = 5ms • 451 msgs/s == 4.5MB/s throughput • Resident heap size 35MB max (-Xmx128m) EBIG, Oakland Jan 21 2004

  46. Performance • 1.4 billion messages total • 4 nodes, 2 senders • Message size = 10K • Average msgs/s: 350 • Max resident mem: 35M (-Xmx128m) • Tests available as part of JG distro • Includes gnuplot scripts to generate graphs EBIG, Oakland Jan 21 2004

  47. Current and future projects • JBossCache, Serverless JMS • Port to J2ME (first version available on www.jgroups-me.org) • hsqldb (HyperSonic) database replication • JCache JSR 107 compliant impl (JBoss Cache) • Potential work on GroupComm JSR • jcluster project on dev.java.net EBIG, Oakland Jan 21 2004

  48. Links • www.jgroups.org • "Papers and Articles": link to IBM devworks EBIG, Oakland Jan 21 2004

  49. Questions ?

More Related