1 / 39

Scalable Group Communication In Heterogeneous Cluster

Scalable Group Communication In Heterogeneous Cluster. Filip Hanik Apache Software Foundation June 30 th , 2006. Who am I. fhanik@apache.org Tomcat Committer / ASF member Responsible for session replication and clustering Been involved with ASF since 2001. What we will cover.

mcron
Download Presentation

Scalable Group Communication In Heterogeneous Cluster

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Group CommunicationIn Heterogeneous Cluster Filip Hanik Apache Software FoundationJune 30th, 2006

  2. Who am I • fhanik@apache.org • Tomcat Committer / ASF member • Responsible for session replication and clustering • Been involved with ASF since 2001

  3. What we will cover • Introduction to group communication • Challenges in group/cluster communication • Today’s Solutions • Detailed Tribes overview • Tribes – design/configuration/usage • Problems and their solutions • Q & A

  4. What is Group Communication • 1-to-n communication between software/hardware nodes • Designed to reduce packets compared to 1-to-1 (point to point) communication • Also referred to as broadcasting and/or multicasting • broadcast != multicast • broadcast – all nodes receive • multicast – interested (subscribed) nodes receive • Popular academic research topic!! Lots of information available

  5. Challenges in Group Communication • Multicast is most commonly used • Group consistency and leadership • Delivery guarantee • Group delivery guarantee • Ordering and total ordering • Flow control • Multiple networks

  6. Today’s Solutions • Dozens if not hundreds academic products • Not maintained, Not supported, Proprietary • Many open source projects • Appia, Spread, Erlang, JGroups…list goes on • Most multicast based to solve the 1-to-n packet reduction problem

  7. What is uniform group model? • Nodes are identical • All nodes process, send and receive message in the same way • All nodes have the same applications • Total ordering is based on the complete group • Note: Not the official definition for what uniformity in a group setting is

  8. When isn’t the uniformity enough? • When processes on each node are dynamic - activate, passivate, short and long lived • Example, Tomcat webapps • Example, heterogeneous hardware environments • Application management vs. application data replication • Messages with different priorities • Example, session attribute being replicated vs. a 25MB war file being transferred • Need different guarantee levels • When most messages are 1-to-m m<n

  9. Challenges in heterogeneous clusters • Same challenges as in homogeneous environments • Node attributes change runtime • Nodes carry different responsibilities • Total order messages that are sent 1-to-m where m < n

  10. What is Tribes? • Tribes is a messaging framework with group communication capabilities • 100% Java, Apache Licensed (2.0) • Born out of the cluster/session replication code from Tomcat 5.0-5.5 early 2006 • Currently alpha, will become the communication framework for Tomcat’s next cluster implementation • Ideas from 2001

  11. Why Tribes? • Many frameworks are not flexible enough • Not enough features • Messages were guaranteed, without delivery feedback • Static configurations for message delivery • Based on 1-to-m delivery, where m<n • License, license, license…

  12. Why Tribes? • Research gap - platforms are proprietary and often suggest protocols that are not standard • Opportunities for httpd & Tomcat and other ASF software integration for more advanced and intelligent clusters • Separation of communication layer • Did I say Apache License?

  13. Why not Tribes • TCP is connection based • When you always want to send 1-to-n • Unique scenario where a highly customized solution might be the best fit • Its not the one fit all solution, if such exists

  14. Goals • Simplify peer-to-peer and peer-to-group communication for distributed applications • Flexible enough to support a wide range of applications under one runtime configuration • Provide instant feedback on message delivery • Concurrent message delivery, even between two nodes • Parallel delivery to multiple nodes • Clean, intuitive and easy to use, even for complex tasks • All this with low overhead

  15. Feature Overview • Pluggable Modules • Guaranteed Messaging • Different Guarantee Levels • Per message delivery semantics(!) • Pluggable Interceptors (runtime) • Delivery feedback – even for async • Concurrent and parallel delivery • Fixed node hierarchy

  16. Feature: Pluggable Modules • All major components can be swapped out, simple interfaces defined • Needed when customization is required for lower level IO operations • Example • Multicast not available • Proprietary network protocols • SSL • Goal: Default Implementation to be enough for 80% of applications that require messaging

  17. Feature: Guaranteed Msg Delivery • Assume 1-to-m delivery, (m < n) • Default implementation is TCP based • java.io & java.nio • Most cases, TCP(java) will outperform UDP when flow control and ack/nack for guaranteed delivery is implemented • java.io support for platforms with poor NIO implementations • java.nio preferred

  18. Feature: Guarantee Levels • By default supports 3 levels • NO_ACK – message was sent • Relies on TCP to deliver without node feedback • ACK – message was received • Remote node replies with an ACK • SYNC_ACK – message was processed • Remote node replies with ACK/FAIL_ACK when message has been processed • Allows for message process feedback

  19. Feature: Per message delivery semantics • Most unique feature, what makes Tribes really stand out • Allows for each message to be delivered differently • Per message guarantee level • Sync vs. async • Not ordered, ordered, totally ordered • 27 flags - 2ⁿ (n=27) combinations • Based on interceptors configured • Each message with its own uniquedelivery guarantee

  20. Feature: Pluggable Interceptors • React on message attributes (flags) • If not modifying message bytes, can be inserted run time • Intercept any events through defined methods • ChannelInterceptorBase available to minimize redundant code for non intercepted methods

  21. Feature: Delivery Feedback • Tribes aims to deliver feedback for each message and each delivery semantic • NO_ACK, ACK, SYNC_ACK • Synchronous and asynchronous delivery • Asynchronous gets feedback through callback • Example, recoverable transactions can now be implemented since we always know if the remote node received the message

  22. Feature: Concurrent & Parallel Delivery • Concurrent • More than one message sent or received a any point in time • No “message blocking” ie 10mb message with SYNC_ACK will not stop 10kb NO_ACK • Parallel • Able to send a message to multiple destinations in parallel using one thread (NIO) • Prioritized • Future feature

  23. Feature:Fixed Node Hierarchy • Absolute Order Algorithm • Always be able to determine leadership • No message exchanges (chat free) • Non coordinated • Also provides “Coordination” algorithm • Chatty, but efficient • Auto merge groups • Enhance node discovery where multicast might glitch • Can connect different subnets when used together with the StaticMembershipInterceptor

  24. Feature:Absolute Failure Detection • Simple interceptor TcpFailureDetector • Instant feedback on member down • No need to wait for timeout • No risk of node pings getting stuck on a busy network • Verifies timeouts against “false positives” • 3 levels • Connect • Send • Read

  25. Feature RPC messaging • Ability to collect responses to a message • NO_REPLY, FIRST_REPLY, MAJORITY_REPLY & ALL_REPLY • Absence reply(!) – rather than timeout • Callback left over delivery • Support for multiple RPC channels on top of one Tribes channel

  26. Feature – JNDI Channel • Ability to bind a channel into a JNDI tree • Share the channel between objects • Ideal for J2EE messaging • Coming soon: • Ability to download client stub • Out of process invocation • Not yet implemented…

  27. Architecture - Overview Application Application Application Application Tipi Tipi RpcChannel RpcChannel TX RX Channel Interceptor Interceptor Coordinator Membership Sender Receiver

  28. Architecture - Channel • 1 instance per Tribes runtime setup • Is the first interceptor • Holds a list of one or more ChannelListeners & MembershipListeners • Serializes and deserializes messages • Supports ByteMessage for transfer of pure byte[] data • RpcChannel instanceof ChannelListener

  29. Architecture - Interceptors • Linked list invocation • Strongly typed – one method per event • No events need to travel through the stack to coordinate interceptors • Examples • Failure detection • Static membership • Total order or per member order • Throughput measurements and statistics • Leadership election • Message data encryption • Message dispatch – asynchronous messaging • All or none delivery guarantee

  30. Architecture - Interceptors • Trigger on ChannelData.getOptions() • Pass through a ChannelData object • Using XByteBuffer – optimized byte[] handling • Membership & Message interceptions • Threadless

  31. Architecture - Coordinator • Last interceptor • Coordinates IO components • Sender • Receiver • Membership • Receiver uses thread pool • Sender piggy backs on application thread

  32. Code Structure • org.apache.catalina.tribes • Application and Component interfaces • group – default implementation • transport – RX/TX components • membership – membership service • group.interceptors – supplied interceptors • io – protocol utilities and optimizations • tipis – utilities on top of Tribes core

  33. Quick Start Channel myChannel = new GroupChannel(); ChannelListener msgListener = new MyMessageListener(); MembershipListener mbrListener = new MyMemberListener(); myChannel.addMembershipListener(mbrListener); myChannel.addChannelListener(msgListener); myChannel.start(Channel.DEFAULT); //start the channel Serializable myMsg = new MyMessage(); Member[] group = myChannel.getMembers(); channel.send(group,myMsg,Channel.SEND_OPTIONS_DEFAULT);

  34. Data Replication • ReplicatedMap – one to all replication • LazyReplicatedMap – primary/backup replication • Cookie based replication map • ideal for HTTP session replication • Backup location stored in cookies • Versioned delta replication • Example: org.apache.catalina.ha

  35. Tribes Demos • Demo • Code Example • Discussion around common problems and how Tribes could solve them

  36. Future Work • Security - SSL Support and node authentication • Many processes – one channel • Language independent • WAN membership discover • TCP Based multicaster for large clusters • 2*n packet reduction for the sender, not total • Intelligent membership broadcasting • httpd as a load balancer

  37. Q & A • fhanik@apache.org • http://people.apache.org/~fhanik/tribes • Tomcat SVN repository • Interested to use? • Interested to help?

  38. Folientitel • Font: Trebuchet MS, 32 pt • Font: Trebuchet MS, 28 pt • Font: Trebuchet MS, 24 pt • Font: Trebuchet MS, 20 pt • Lorem ipsum dolor sit amet, consectetur adipscing elit, sed diam nonnumy eiusmod tempor incidunt ut labore et dolore magna aliquam erat volupat.

  39. Folientitel Lorem ipsum dolor sit amet, consectetur adipscing elit, sed diam nonnumy eiusmod tempor incidunt ut labore et dolore magna aliquam erat volupat. Et harumd dereud facilis est er expedit distinct. Nam liber a tempor cum soluta nobis eligend optio comque nihil quod a impedit anim id quod maxim placeat. Lorem ipsum dolor sit amet, consectetur adipscing elit, sed diam nonnumy eiusmod tempor incidunt ut labore et dolore magna aliquam erat volupat. Et harumd dereud facilis est er expedit distinct.

More Related