Constructing scalable overlays for pub sub with many topics
Download
1 / 24

Overlays for Pub/Sub With Many Topics - PowerPoint PPT Presentation


  • 1620 Views
  • Uploaded on

Constructing Scalable Overlays for Pub/Sub With Many Topics Problems, Algorithms, and Evaluation G. Chockler, R. Melamed, Y. Tock , IBM Haifa Research Lab R. Vitenberg , University of Oslo Publish/Subscribe (Pub/Sub) {A,B,C,E,} Subscription (N1)={B,C,D} N2 {A,D} N1 N3 M1 Message Bus

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Overlays for Pub/Sub With Many Topics' - libitha


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Constructing scalable overlays for pub sub with many topics l.jpg

Constructing Scalable Overlays for Pub/Sub With Many Topics

Problems, Algorithms, and Evaluation

G. Chockler, R. Melamed, Y. Tock, IBM Haifa Research Lab

R. Vitenberg, University of Oslo


Publish subscribe pub sub l.jpg
Publish/Subscribe (Pub/Sub)

{A,B,C,E,}

Subscription(N1)={B,C,D}

N2

{A,D}

N1

N3

M1

Message Bus

M1

{A,X}

Publish(M1, A)

N5

M1

N4

{A,B,X}


Scalability of pub sub l.jpg
Scalability of Pub/Sub

  • Most traditional pub/sub systems are geared towards small scale deployment

    • E.g., Isis MDS, TIB, MQSeries, Gryphon

  • New generation of applications…

    • Large data centers: Amazon, Google, Yahoo, EBay,…

    • RSS, feed/news readers, on-line stock trading and banking

    • Web 2.0, Second Life

  • …drive dramatic growth in scale

    • 10,000s of nodes, 1000s of topics, Internet-wide distribution

  • Emerging systems address this trend using P2P techniques


Overlay based pub sub l.jpg
Overlay-Based Pub/Sub

Relay

{A,B,C,E}

{B,C,D}

(M1, A)

N2

{A,D}

N1

N3

(M1, A)

(M1, A)

(M1, A)

  • SCRIBE

  • Corona

  • Feedtree

  • Sub-2-Sub

  • TERA

  • ...

N5

(M1, A)

{A,X}

N4

{A,B,X}


Overlay topologies for pub sub l.jpg
Overlay Topologies for Pub/Sub

  • “Good”overlay will allow for efficient and simple publication routing

    • Small routing tables, low load on relays,

    • low latency

  • Ideally, overlay is topic-connected: i.e., one connected component for each topic-induced sub-graph

    • Most existing implementations construct topic-connected overlays


Topic connectivity l.jpg

Topics B,C,X,E are connected

Topics A and D are disconnected

Topic-Connectivity

{A,B,C,E}

{B,C,D}

N2

{A,D}

N1

N3

N5

{A,X}

N4

{A,B,X}


Topic connectivity simple solution l.jpg

Topic-Connectivity: Simple Solution

{A,B,C,E}

{B,C,D}

N2

{A,D}

N1

N3

N5

{A,X}

N4

{A,B,X}


Scalability of the simple solution l.jpg
Scalability of the Simple Solution

  • Negative impact on performance due to

    • CPU load: neighbor monitoring, message processing

    • Connection maintenance and header overhead

    • Memory overhead: per-link state associated with routing and/or compression schemes being used, etc.

  • Scalability barrier for large systems offering a wide range of subscription choices

Can we do better?


The min tco problem l.jpg
The Min-TCO Problem

  • Minimum Topic-Connected Overlay (Min-TCO) problem:

    • For a set of nodes V, set of topics T, and Interest: V  T {true, false}

    • Construct a topic-connected overlay G with the minimum possible number of edges (or average degree)

  • TCO (decision version):

    • Decide whether there is a topic-connected overlay consisting of k edges (for a given k)


Complexity of tco l.jpg
Complexity of TCO

{B,C,D}

{A,B}

Lemma: TCO(V,T,Interest,k)NP

Proof: Topic connectivity is verifyable in polynomial time

Lemma: TCO(V,T,Interest,k) is NP-hard

Proof:

  • Define an auxiliary problem Single Node TCO (SN-TCO) which is to decide if there is a topic-connected overlay in which the degree of single given node  d

  • Set Cover is polynomially reducible to SN-TCO

  • SN-TCO is polynomially reducible to TCO

    Theorem: TCO is NP-complete

N5

N2

{A,D}

N3

N1

N4

{A,B,C,D}

{A,C}


Approximating min tco l.jpg
Approximating Min-TCO

  • The idea: exploiting subscription overlaps

    • Connecting the nodes with overlapping interests improves connectivity of several topics at once

  • Greedy Merge (GM) algorithm:

    • Start from a singleton connected component for each (v, t)  V  T

    • At each iteration: add an edge that reduces the number of connected components for the biggest number of topics

    • Stop, once there is a single connected component for each topic


Greedy merge l.jpg
Greedy Merge

{B,C,D}

{A,B,C,E}

N1

N2

{A,D}

N3

N5

{A,X}

N4

{A,B,X}


Greedy merge13 l.jpg
Greedy Merge

{B,C,D}

{A,B,C,E}

N1

N2

{A,D}

N3

N5

{A,X}

N4

{A,B,X}


Greedy merge14 l.jpg
Greedy Merge

{B,C,D}

{A,B,C,E}

N1

N2

{A,D}

N3

N5

{A,X}

N4

{A,B,X}


Greedy merge15 l.jpg
Greedy Merge

{B,C,D}

{A,B,C,E}

N1

N2

{A,D}

N3

N5

{A,X}

N4

{A,B,X}


Greedy merge16 l.jpg
Greedy Merge

{B,C,D}

{A,B,C,E}

N1

N2

{A,D}

N3

N5

{A,X}

N4

{A,B,X}


Greedy merge17 l.jpg

Greedy Merge

{B,C,D}

{A,B,C,E}

N1

N2

{A,D}

N3

N5

{A,X}

N4

{A,B,X}


Gm running time l.jpg
GM Running Time

  • O(|V|4|T|)

    • At most |V|2 iterations

    • At most |V|2 edges inspected at each iteration

    • At most |T| steps to inspect an edge

  • Can be optimized to run in O(|V|2|T|)

    • For each e  V  V, weight(e) = the number of connected components merged by e

    • At each iteration, output the heaviest edge and adjust the other edge weights accordingly

    • Stop once there are no more edges with weight > 0


Approximability results l.jpg
Approximability Results

Lemma:

  • The number of edges in the overlay constructed by GM  log(|V||T|) OPT

    Proof: Similar to that of the approximation ratio of the greedy algorithm for Set Cover

  • There exists an input on which GM’s output meets this ratio

    Theorem: No algorithm can approximate Min-TCO within a constant factor (unless P=NP)

    Proof: Existence of such an algorithm would imply existence of the constant factor approximation for Set Cover which is known to be impossible (unless P=NP)



More overlay design problems l.jpg
More Overlay Design Problems

  • Filtering: Given an upper bound d on the node degree, minimize the number of relays used to connect each topic

    • Captures the cases when full topic-connectivity is infeasible because of resource constraints

  • Diameter: Given an upper bound d on the node degree, minimize the diameter of each topic in the overlay

    • Latency optimal routing under resource constraints


Conclusions l.jpg
Conclusions

  • Initiated formal study of the problem of designing efficient and scalable overlay topologies for pub/sub

  • Defined a representative problem (Min-TCO) capturing the cost of constructing topic-connected overlays

    • NP-Completeness, polynomial approximation, inapproximability results

  • Empirical evaluation showed effectiveness of our approximation algorithm on practical inputs


Future directions l.jpg
Future Directions

  • Study dynamic case

  • Investigate other overlay design problems

  • Study distributed case

    • Partial knowledge of other node interest

    • Dynamically changing interest assignments



ad