Loading in 2 Seconds...

Divide and Conquer Algorithms for Pub/Sub Overlay Design

Loading in 2 Seconds...

- 102 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Divide and Conquer Algorithms for Pub/Sub Overlay Design' - jared-haynes

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Divide and Conquer Algorithms for Pub/Sub Overlay Design

Chen Chen 1

joint work with Hans-Arno Jacobsen 1,2, Roman Vitenberg3

1 Department of Electrical and Computer Engineering

2 Department of Computer Science

University of Toronto

3 Department of Informatics

University of Oslo

ICDCS’10 Genoa, Italy

Pub/Sub

- A communication paradigm
- Subscribers express their interests
- Publishers disseminate messages
- Many applications and industry standards
- Application integration, financial data dissemination,

RSS feed distribution, business process management

- WS Notifications, WS Eventing,

OMGs’ Real-time Data Dissemination Service

- Topic-based pub/sub
- TIBCO RV
- Google’s GooPS

ICDCS’10 Genoa, Italy

Two componentsin pub/sub implementation

Design of routing protocols

Construction of overlay

The construction of the overlay topology such that network traffic is minimized.

Chockler et al., PODC’07

Onus et al., INFOCOM’09

- The design of protocols so that publications and subscriptions are sent most efficiently across the overlay network.
- G. Li et al., ICDCS’08
- M. Castro et al., JSAC’02

ICDCS’10 Genoa, Italy

Desirable properties for overlays

Low average node degree

Low fan-out of a node

Low diameter

Topic-connectivity

Efficiency to construct

Adaptability to churn

Ease of distributed implementation

ICDCS’10 Genoa, Italy

Our contributions

ICDCS’10 Genoa, Italy

Topic-connectivity

{b,c,d}

{b,c,d}

V1

V1

{a,c}

{a}

{a}

{a,c}

V5

V2

V5

V2

V4

V3

V4

V4

V3

{a,b}

{b,d}

{a,b}

{b,d}

{a,b}

Suboverlay Ga is

topic-connected

Suboverlay Gbis

NOT topic-connected

An overlay G

ICDCS’10 Genoa, Italy

MinAvg-TCO problem

{b,c,d}

{b,c,d}

V1

V1

{a,c}

{a,c}

{a}

{a}

V5

V2

V5

V2

V4

V3

V4

V3

{a,b}

{b,d}

{a,b}

{b,d}

TCO1 has 5 edges

TCO2 has 10 edges

ICDCS’10 Genoa, Italy

MinAvg-TCO problem

{b,c,d}

V1

{a}

V2

{a,c}

V5

{a,b}

V3

V4

{b,d}

- A high-quality overlay
- Topic-connectivity
- Total number of edges
- Input:
- a set of nodes V,
- a set of topics T,
- the interest function Int
- MinAvg-TCO(V,T,Int) (optimization version)

Construct a TCO(V,T,Int,E) such that |E| is minimum.

- Avg-TCO(V,T,Int,k) (decision version)

Is there a TCO(V,T,Int,E) such that |E|=k?

- Theorem: MinAvg-TCOis NP-complete

ICDCS’10 Genoa, Italy

Greedy-Merge (GM) algorithm

- Greedy:

always making the choice that looks best at the moment

- GM for MinAvg-TCO:

always adding an edge with maximum link contribution

- Running Time: O(|V|2|T|)
- Approximation Ratio: O(log(|V||T|))

ICDCS’10 Genoa, Italy

Our contributions

ICDCS’10 Genoa, Italy

TCO join problem

- Given p TCOs: TCOd (Vd,Td,Intd,Ed), d=1,..,p
- MinAvg-TCO-Join(V,T,Int,p) (optimization version)

Construct a TCO(V,T,Int,E) such that |E| is minimum

- Avg-TCO-Join(V,T,Int,p,k) (decision version)

Is there a TCO(V,T,Int,E) such that |E|=k?

- MinAvg-TCO is a special case of MinAvg-TCO-Join:

Theorem: MinAvg-TCO-Join is NP-complete

ICDCS’10 Genoa, Italy

Solving MinAvg-TCO-Join

- MinAvg-TCO-Join could be solved by GM,

but NOT practical:

- Tear down all existing links
- Rebuild the overlay from scratch using GM
- It is better to preserve all existing edges and only add edges incrementally.

ICDCS’10 Genoa, Italy

Bad case for incremental addition of edges

Vall : interested in all topics in T

Constructing incrementally

Constructing from scratch

Vall

Vall

V1

V1

V1

Vn

V2

Vn

V2

Vn

V2

Vn-1

Vi

Vn-1

Vi

Vn-1

Vi

TCO0 :

TCO2 :

TCO1 :

ICDCS’10 Genoa, Italy

Naive Merge (NM) algorithm

GM algorithm

NM algorithm

Input: (Vd,Td,Intd,Ed), d=1,...,p

Output: one TCO

Algorithm:

- Start with existing internal-TCO links;

- Always add a cross-TCO edge with maximum link contribution.

Running time:

NM is based on the same greedy heuristic as GM.

- Input: (V,T,Int)
- Output: one TCO
- Algorithm:

- Start with an empty edge set;

- Always add an edge with maximum link contribution.

- Running time:

ICDCS’10 Genoa, Italy

Example of NM

{c}

{a}

V0

V1

{c}

{a,c,d}

V4

{d}

V3

V12

{a,b,c}

V13

V7

{c}

V6

V9

V10

{d}

{a,b,c}

{c}

Still a prohibitively high running time!!!

{a,b,c}

V2

V11

{b,c,d}

{a,b,d}

V8

V14

V5

{a}

{a,b,d}

ICDCS’10 Genoa, Italy

Star set

Given a TCO (V,T,Int,E)

A Star set S is a subset of V that covers all V’s topics.

{b,c,d}

{b,c,d}

{b,c,d}

V1

V1

V1

{a}

{a}

{a}

V5

V2

V5

V2

V5

V2

{a,c}

{a,c}

{a,c}

V4

V3

V4

V3

V4

V3

{a,b}

{a,b}

{b,d}

{b,d}

{a,b}

{b,d}

{v3, v5} is a star set which

covers all topics {a,b,c,d}

{v2, v3, v4} is not a star set; it only covers {a,b,d}

A topic-connected overlay

ICDCS’10 Genoa, Italy

Star set

- Star set nodes
- Represents the interests of all the nodes
- Can function as bridges to determine cross-TCO links
- Observation: minimal star sets tend to be substantially smaller than the total number of nodes.
- How to find a minimum star set S* for (V,T,Int)?
- Equal to classic set cover problem: NP-complete
- Could be approximated with a log approximation ratio

ICDCS’10 Genoa, Italy

Star Merge (SM) algorithm

NM algorithm

SM algorithm

Input: (Vd,Td,Intd,Ed), d=1,..,p

Output: one TCO

Algorithm:

- Start with existing internal-TCO links;

- Find a star set for each sub-TCO;

- Always add a cross-Star edge with maximum link contribution.

- Input: (Vd,Td,Intd,Ed), d=1,..,p
- Output: one TCO
- Algorithm:

- Start with existing internal-TCO links;

- // Do nothing;

- Always add a cross-TCO edge with maximum link contribution.

ICDCS’10 Genoa, Italy

Example of SM

{c}

{a}

V0

V1

{c}

{a,c,d}

V4

{d}

V6

V12

{a,b,c}

V13

V7

{c}

V9

{a,b,c}

V10

V3

{d}

{c}

Running time largely improved because

#stars << #nodes

for most cases.

{a,b,c}

V2

V11

{b,c,d}

{a,b,d}

V8

V14

V5

{a}

{a,b,d}

ICDCS’10 Genoa, Italy

Divide and Conquer (DC) for MinAvg-TCO

- The number of nodes is a dominant factor for the running time of the GM algorithm.
- Divide-and-conquer
- Divide the MinAvg-TCO problem into several sub-overlay construction problems
- Conquer the sub-MinAvg-TCO problems independently and build sub-overlays into sub-TCOs
- Combine these sub-TCOs to one TCO

ICDCS’10 Genoa, Italy

Design of DC algorithm

- How to divide the node set V:
- Node clustering vs. random partitioning
- The number of partitions p
- The balance between conquer and combine
- p = 1 (single partition): conquer only = GM
- p = |V| (each node is a partition): combine only = GM
- How to decentralize DC:
- Note the DC algorithm as presented is fully centralized.
- However, it is possible to decentralize it.
- Theoretical analysis: not straightforward.

ICDCS’10 Genoa, Italy

Example of DC

{c}

{a}

V0

V1

{c}

{a,c,d}

V4

{d}

V6

V12

{a,b,c}

V13

V7

{c}

V9

{a,b,c}

V10

V3

{d}

{c}

- Divide overlay based on V

- Conquer each sub-TCO by GM

- Combine TCO into one by SM

{a,b,c}

V2

V11

{b,c,d}

{a,b,d}

V8

V14

V5

{a}

{a,b,d}

ICDCS’10 Genoa, Italy

Experiment setting

- The number of nodes

|V| = 1000 ranging from 1000 to 8000

- The number of topics

|T| = 100 ranging from 100 to 1000

- The number of topics that subscribed by a node

NodeIntSize=20 ranging from 10 to 100

- Topic distribution uniform, zipf, exponential

ICDCS’10 Genoa, Italy

Experiment design

- Evaluation:average node degree, running time
- Star Merge for MinAvg-TCO-Join
- DC for MinAvg-TCO
- Random node partitioning
- The effects of the number of nodes
- The effects of the number of topics
- The effects of average subscription size of a node
- Comparison with RingPT

RingPT is an algorithm that mimics the common practice of building separate overlay for each topic.

ICDCS’10 Genoa, Italy

Star MergeSM vs NM vs GM

ICDCS’10 Genoa, Italy

Divide-and-conquerThe effect of the number of nodes

ICDCS’10 Genoa, Italy

Divide-and-conquerDC vs GM vs RingPT

ICDCS’10 Genoa, Italy

Algorithm summary

ICDCS’10 Genoa, Italy

Minimal Number of Links

- A typical pub/sub system combines a number of protocols, many of which maintaining per-link state
- A node must constantly monitor the availability of each of its neighbors (heartbeats and keep-alive state)
- If the links are maintained using TCP, there is the cost of connection state for each link
- The more links there are, the fewer topics can be routed over each individual link, thereby diminishing cross-topic aggregation benefits
- If sequential-diff-based compression scheme is used, there is an extra cost associated with a history table

Download Presentation

Connecting to Server..