1 / 39

Hans-Arno Jacobsen June 23, 2011

Resource Allocation Algorithms for Publish/Subscribe Systems. Hans-Arno Jacobsen June 23, 2011. Joint work with Alex King Yeung Cheung . http ://padres.msrg.org. Green Resource Allocation Algorithms for Publish/Subscribe Systems. http ://padres.msrg.org. Publish/Subscribe in Practice.

eben
Download Presentation

Hans-Arno Jacobsen June 23, 2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Resource Allocation Algorithms for Publish/Subscribe Systems Hans-Arno Jacobsen June 23, 2011 Joint work with Alex King Yeung Cheung http://padres.msrg.org

  2. GreenResource Allocation Algorithms for Publish/Subscribe Systems http://padres.msrg.org

  3. Publish/Subscribe in Practice (Distributed and brokered publish/subscribe) • GooPS • Google’s internal pub/sub messaging middleware to integrate applications across data centers • Hundreds of brokers with tens of thousands of pub/sub clients • Yahoo Message Broker • Yahoo’s pub/sub middleware • Used for example in PNUTS key/value-store (cf. VLDB’08) • SuperMontage • Tibco’s pub/sub distribution network for NASDAQ’s quote and order-processing • GDSN (Global Data Synchronization Network) • A global pub/sub network that allows retailers and suppliers (i.e., Walmart, Target, Metro, etc.) to exchange timely and accurate supply chain data ICDCS 2011

  4. Problem Input Output Deployment strategy that uses the least number of brokers? P P P P Brokers Overload! P P P P Publishers S S S S Subscribers S S S S ICDCS 2011

  5. Challenges • Brokers have limited and heterogeneous resource capacities • Computational • I/O or bandwidth • Memory and storage • Publishers publish at different message rates • Subscribers have unique interests that sink zero or more publications from zero or more publishers ICDCS 2011

  6. Challenges When Scaling Up How to connect the publishers if subscribers sink traffic from >2 publishers? How to connect the publishers if subscribers sink traffic from >2 publishers? P P P P P P P P How to connect the brokers to minimize traffic while avoiding overload? How to allocate subscribers to brokers? This is an NP-complete problem! How to allocate subscribers to brokers? S S S S S S S S ICDCS 2011

  7. Additional Requirements • Minimize • Amount of processing • Amount of messages forwarded • Work effectively under any workload distribution (defined or undefined) • Readily adaptable to any pub/sub system by being language independent • Content-based (XPath, regex, ranged, SQL, composite subscriptions, etc.) • Topic-based pub/sub ICDCS 2011

  8. Summary of Our Approach (A customizable framework ) • Phase 1: Subscription profiling (& publisher) • Record publications delivered to each subscription • Phase 2: Subscription to broker allocation • Allocate subscriptions to brokers depending on the load induced by each subscription • Phase 3: Broker overlay construction • Construct and configure broker overlay • Apply publisher re-allocation (GRAPE, cf. ICDCS’2010) ICDCS 2011

  9. Phase 1: Subscription Profiling Profile of each subscription per advertisement maintained at the subscriber’s first broker Message ID B34-M213 B34-M215 Message ID of first index B34-M213 Start of bit vector B34-M216 Publications delivered to subscription 1 0 0 1 0 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 1 0 0 0 B34-M217 Fixed vector size; shift left if next publication is out of bit vector range B34-M220 B34-M222 Cardinality of bit vector approximates bandwidth requirement of subscription B34-M225 Used to compute “closeness” between any two subscriptions in the allocation phase based on clustering algorithm. E.g, closeness = |si∩sj| B34-M226 ICDCS 2011

  10. Phase 2: Subscription Allocation Algorithms • MANUAL & AUTOMATICas baseline • Tree with fanout of 2; random placement of clients (manual) • Random allocation (automatic) • Fastest Broker First (FBF) • Assign subscriptions randomly to the next most powerful broker • Bin Packing • Like FBF, but assigns the next highest traffic subscription • PAIRWISE-N, PAIRWISE-K (Riabovet al. ICDCS’02) • Pairwise subscription clustering where the number of clusters is specified beforehand • CRAM (Clustering with Resource Awareness and Minimization) • Dynamically determines the number of clusters • Utilizes a novel one-to-many clustering scheme • Evaluated with 4 different subscription closeness metrics, with one derived from Banavaret al. ICDCS '99 ICDCS 2011

  11. Allocation with Bin Packing S S S S S S ICDCS 2011

  12. Allocation Result (Bin Packing) S S S S S S ICDCS 2011

  13. Allocation with CRAM (Basic version) • Find and cluster a pair of subscriptions having next highest non-zero “closeness” • Run BIN PACKING algorithm with new pairing • Allocation fails, if: • More brokers are allocated than without this pairing • Not all subscriptions can be allocated to brokers • On failure, undo and remember incompatible pairing • Repeat loop until no more pairings can be found • Initially BIN PACKING is run to determine initial allocation • Pairings found are combined and re-inserted in sub pool • Final subscription clustering is last successful allocation ICDCS 2011

  14. Summary of Optimizations • Grouping of subscriptions with equal profiles • Apply CRAM an groups • In our experiments, reductions of up to 61% • Limit closeness computations among groups • Exploit covering relationships among subscriptions • Disregard groups with small closeness • In our experiments, a 20x improvement, roughly • One-to-many clustering • Cluster groups of subscriptions & covered subs ICDCS 2011

  15. Closeness Metrics • Intersect: |si∩sj| Good for highest overlap Good for least non-overlapping traffic • XOR: |siXOR sj|-1 } (If value is 0, defined as MAXVAL) Good for both conditions, yield 0 for empty relationships, favour clustering higher traffic subs • IOS:|si∩sj|2 / |si| + |sj| • IOU:| si∩sj|2 / |siUsj| (Intersection over sum & … over union) Ideally, find subscriptions sharing highest overlap in traffic, while introducing least amount of non-overlapping traffic. XOR is derived from Banavaret al. ICDCS '99) ICDCS 2011

  16. Traditional One-to-One Clustering C = 82/(36+24) = 1.07 S1a • |si∩ sj|2 • |si| + |sj| C = 42/(36+4) = 0.4 S2a S2b S1b S2c S2d Bit Vector of S1 Bit Vector of S2 S2e S2f S1c S2g S2h C = 12/(24+1) = 0.04 Closeness, C = ICDCS 2011

  17. New One-to-Many Clustering C = 82/(36+24) = 1.07 S1a • |si∩ sj|2 • |si| + |sj| C = 42/(36+4) = 0.4 C = 122/(36+12) = 3 S2a S2b S1b S2c S2d Bit Vector of S1 Bit Vector of S2 S2e S2f S1c S2g S2h C = 82/(24+8) = 2 C = 12/(24+1) = 0.04 C = ICDCS 2011

  18. Phase 3: Broker Overlay Construction S S S S S S S S S ICDCS 2011

  19. Bin Packing’s Final Overlay P P (( GRAPE )) (( GRAPE )) S S S S S S S S S ICDCS 2011

  20. Greedy Relocation Algorithmfor Publishers of Events(GRAPE) • Distributed algorithm that dynamically relocates publishers to minimize • Broker message rates, and/or • Delivery Delay • Similar three phased design: • Profile load of subscriptions matching each publisher • Determine the placement strategy that minimizes the specified metric • Transparently migrate the publisher • Cf. GRAPE paper from ICDCS 2010 ICDCS 2011

  21. Evaluation http://padres.msrg.org • Implemented on the PADRES open source content-based publish/subscribe system • Evaluated on a cluster testbed using 80 brokers • Evaluated on SciNet using 1000 brokers • Comparison against two related approaches (Riabovet al. ICDCS’02, Banavaret al. ICDCS’99) • Homogeneous and heterogeneous scenarios • Workload saturates the initial deployment (MANUAL) ICDCS 2011

  22. Output Utilization Ratio Resource aware algorithms make full use of allocated resources ICDCS 2011

  23. Broker Message Rate Allocating fewer brokers does not help CRAM reduced message rate by up to 92% Clustering significantly reduces message rate ICDCS 2011

  24. Number of Allocated Brokers Reduces number of allocated brokers by up to 91% Uses all resources ICDCS 2011

  25. Computation Time 91% improvement at only 30% higher computation time ICDCS 2011

  26. Impact of Publisher Relocation & Subscription Clustering 50% reduction in broker message rate ICDCS 2011

  27. Broker Message Rates Using Various Closeness Metrics XOR closeness metric cannot identify empty-relations ICDCS 2011

  28. Conclusions • CRAM combines the benefits of • Subscription clustering from PAIRWISE-N/K • Resource awareness from Bin Packing by simultaneously reducing both • Broker message rate (up to 92%) • Number of allocated brokers (up to 91%) to meet green IT objectives! • By using bit vectors, CRAM is • Language independent (XPath, regex, topics) • Effective for any workload distribution ICDCS 2011

  29. Q & A ICDCS 2011

  30. ICDCS 2011

  31. Future Work • React dynamically by growing and shrinking the network in incremental steps • Improve runtime of the CRAM algorithm by parallelization or reducing its computational complexity • Model workload with more sophisticated methods, such as stochastic processes, to improve accuracy of load estimation • Address fault resiliency ICDCS 2011

  32. Related Works - Clustering • Riabovet al. (ICDCS’02) • The number of clusters K is pre-specified • Each cluster is a multicast address, thus there is no upper limit on its size • Event space is divided into grids • Supports only ranged subscriptions • Their pairwise clustering considers each subscription individually • Gryphon (ICDCS'99) • Supports only equal and * subscriptions • Each cluster is stored in memory, the upper bound limit is not a major concern • SUB-2-SUB (IPTPS'06) • Supports only ranged subscriptions • Each cluster is a p2p network, thus there is no upper limit on the cluster size ICDCS 2011

  33. Related Works – Broker Overlay Construction, Publisher and Subscriber Placement Algorithms • Baldoni et al. (The Computer Journal), • Jaeger et al. (SAC'07) • Migliavacca et al. (DEBS’07) • Reconfigure broker overlay to reduce delivery delay and broker processing load • Cheung et al. (Middleware’06, ICDCS’10) • Load balancing by relocating subscriber clients • Reduce delivery delay and broker processing load by relocating publisher clients ICDCS 2011

  34. Hop Count Using Various Closeness Metrics ICDCS 2011

  35. Computation Time vs. Bit Vector Size ICDCS 2011

  36. Allocated Brokers vs. Bit Vector Size ICDCS 2011

  37. Average Hop Count ICDCS 2011

  38. 108% higher computation time using Gryphon-derived closeness metric (XOR). Computation Time Using Various Closeness Metrics ICDCS 2011

  39. Delivery Delay Overload with Pairwise-K ICDCS 2011

More Related