constructing scalable overlays for pub sub with many topics l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Constructing Scalable Overlays for Pub/Sub With Many Topics PowerPoint Presentation
Download Presentation
Constructing Scalable Overlays for Pub/Sub With Many Topics

Loading in 2 Seconds...

play fullscreen
1 / 24

Constructing Scalable Overlays for Pub/Sub With Many Topics - PowerPoint PPT Presentation


  • 1633 Views
  • Uploaded on

Constructing Scalable Overlays for Pub/Sub With Many Topics Problems, Algorithms, and Evaluation G. Chockler, R. Melamed, Y. Tock , IBM Haifa Research Lab R. Vitenberg , University of Oslo Publish/Subscribe (Pub/Sub) {A,B,C,E,} Subscription (N1)={B,C,D} N2 {A,D} N1 N3 M1 Message Bus

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Constructing Scalable Overlays for Pub/Sub With Many Topics' - libitha


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
constructing scalable overlays for pub sub with many topics

Constructing Scalable Overlays for Pub/Sub With Many Topics

Problems, Algorithms, and Evaluation

G. Chockler, R. Melamed, Y. Tock, IBM Haifa Research Lab

R. Vitenberg, University of Oslo

publish subscribe pub sub
Publish/Subscribe (Pub/Sub)

{A,B,C,E,}

Subscription(N1)={B,C,D}

N2

{A,D}

N1

N3

M1

Message Bus

M1

{A,X}

Publish(M1, A)

N5

M1

N4

{A,B,X}

scalability of pub sub
Scalability of Pub/Sub
  • Most traditional pub/sub systems are geared towards small scale deployment
    • E.g., Isis MDS, TIB, MQSeries, Gryphon
  • New generation of applications…
    • Large data centers: Amazon, Google, Yahoo, EBay,…
    • RSS, feed/news readers, on-line stock trading and banking
    • Web 2.0, Second Life
  • …drive dramatic growth in scale
    • 10,000s of nodes, 1000s of topics, Internet-wide distribution
  • Emerging systems address this trend using P2P techniques
overlay based pub sub
Overlay-Based Pub/Sub

Relay

{A,B,C,E}

{B,C,D}

(M1, A)

N2

{A,D}

N1

N3

(M1, A)

(M1, A)

(M1, A)

  • SCRIBE
  • Corona
  • Feedtree
  • Sub-2-Sub
  • TERA
  • ...

N5

(M1, A)

{A,X}

N4

{A,B,X}

overlay topologies for pub sub
Overlay Topologies for Pub/Sub
  • “Good”overlay will allow for efficient and simple publication routing
    • Small routing tables, low load on relays,
    • low latency
  • Ideally, overlay is topic-connected: i.e., one connected component for each topic-induced sub-graph
    • Most existing implementations construct topic-connected overlays
topic connectivity
Topics B,C,X,E are connected

Topics A and D are disconnected

Topic-Connectivity

{A,B,C,E}

{B,C,D}

N2

{A,D}

N1

N3

N5

{A,X}

N4

{A,B,X}

topic connectivity simple solution

Node degree grows linearly with the subscription size

    • Roughly twice as big as the average subscription size for rings/trees
Topic-Connectivity: Simple Solution

{A,B,C,E}

{B,C,D}

N2

{A,D}

N1

N3

N5

{A,X}

N4

{A,B,X}

scalability of the simple solution
Scalability of the Simple Solution
  • Negative impact on performance due to
    • CPU load: neighbor monitoring, message processing
    • Connection maintenance and header overhead
    • Memory overhead: per-link state associated with routing and/or compression schemes being used, etc.
  • Scalability barrier for large systems offering a wide range of subscription choices

Can we do better?

the min tco problem
The Min-TCO Problem
  • Minimum Topic-Connected Overlay (Min-TCO) problem:
    • For a set of nodes V, set of topics T, and Interest: V  T {true, false}
    • Construct a topic-connected overlay G with the minimum possible number of edges (or average degree)
  • TCO (decision version):
    • Decide whether there is a topic-connected overlay consisting of k edges (for a given k)
complexity of tco
Complexity of TCO

{B,C,D}

{A,B}

Lemma: TCO(V,T,Interest,k)NP

Proof: Topic connectivity is verifyable in polynomial time

Lemma: TCO(V,T,Interest,k) is NP-hard

Proof:

  • Define an auxiliary problem Single Node TCO (SN-TCO) which is to decide if there is a topic-connected overlay in which the degree of single given node  d
  • Set Cover is polynomially reducible to SN-TCO
  • SN-TCO is polynomially reducible to TCO

Theorem: TCO is NP-complete

N5

N2

{A,D}

N3

N1

N4

{A,B,C,D}

{A,C}

approximating min tco
Approximating Min-TCO
  • The idea: exploiting subscription overlaps
    • Connecting the nodes with overlapping interests improves connectivity of several topics at once
  • Greedy Merge (GM) algorithm:
    • Start from a singleton connected component for each (v, t)  V  T
    • At each iteration: add an edge that reduces the number of connected components for the biggest number of topics
    • Stop, once there is a single connected component for each topic
greedy merge
Greedy Merge

{B,C,D}

{A,B,C,E}

N1

N2

{A,D}

N3

N5

{A,X}

N4

{A,B,X}

greedy merge13
Greedy Merge

{B,C,D}

{A,B,C,E}

N1

N2

{A,D}

N3

N5

{A,X}

N4

{A,B,X}

greedy merge14
Greedy Merge

{B,C,D}

{A,B,C,E}

N1

N2

{A,D}

N3

N5

{A,X}

N4

{A,B,X}

greedy merge15
Greedy Merge

{B,C,D}

{A,B,C,E}

N1

N2

{A,D}

N3

N5

{A,X}

N4

{A,B,X}

greedy merge16
Greedy Merge

{B,C,D}

{A,B,C,E}

N1

N2

{A,D}

N3

N5

{A,X}

N4

{A,B,X}

greedy merge17

Average degree of 2 vs. almost 3 for ring-per-topic!

Greedy Merge

{B,C,D}

{A,B,C,E}

N1

N2

{A,D}

N3

N5

{A,X}

N4

{A,B,X}

gm running time
GM Running Time
  • O(|V|4|T|)
    • At most |V|2 iterations
    • At most |V|2 edges inspected at each iteration
    • At most |T| steps to inspect an edge
  • Can be optimized to run in O(|V|2|T|)
    • For each e  V  V, weight(e) = the number of connected components merged by e
    • At each iteration, output the heaviest edge and adjust the other edge weights accordingly
    • Stop once there are no more edges with weight > 0
approximability results
Approximability Results

Lemma:

  • The number of edges in the overlay constructed by GM  log(|V||T|) OPT

Proof: Similar to that of the approximation ratio of the greedy algorithm for Set Cover

  • There exists an input on which GM’s output meets this ratio

Theorem: No algorithm can approximate Min-TCO within a constant factor (unless P=NP)

Proof: Existence of such an algorithm would imply existence of the constant factor approximation for Set Cover which is known to be impossible (unless P=NP)

more overlay design problems
More Overlay Design Problems
  • Filtering: Given an upper bound d on the node degree, minimize the number of relays used to connect each topic
    • Captures the cases when full topic-connectivity is infeasible because of resource constraints
  • Diameter: Given an upper bound d on the node degree, minimize the diameter of each topic in the overlay
    • Latency optimal routing under resource constraints
conclusions
Conclusions
  • Initiated formal study of the problem of designing efficient and scalable overlay topologies for pub/sub
  • Defined a representative problem (Min-TCO) capturing the cost of constructing topic-connected overlays
    • NP-Completeness, polynomial approximation, inapproximability results
  • Empirical evaluation showed effectiveness of our approximation algorithm on practical inputs
future directions
Future Directions
  • Study dynamic case
  • Investigate other overlay design problems
  • Study distributed case
    • Partial knowledge of other node interest
    • Dynamically changing interest assignments