Divide and conquer algorithms for pub sub overlay design
Download
1 / 31

Divide and Conquer Algorithms for Pub/Sub Overlay Design - PowerPoint PPT Presentation


  • 99 Views
  • Uploaded on

Divide and Conquer Algorithms for Pub/Sub Overlay Design. Chen Chen 1 joint work with Hans-Arno Jacobsen 1,2 , Roman Vitenberg 3 1 Department of Electrical and Computer Engineering 2 Department of Computer Science University of Toronto 3 Department of Informatics University of Oslo.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Divide and Conquer Algorithms for Pub/Sub Overlay Design' - jared-haynes


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Divide and conquer algorithms for pub sub overlay design

Divide and Conquer Algorithms for Pub/Sub Overlay Design

Chen Chen 1

joint work with Hans-Arno Jacobsen 1,2, Roman Vitenberg3

1 Department of Electrical and Computer Engineering

2 Department of Computer Science

University of Toronto

3 Department of Informatics

University of Oslo

ICDCS’10 Genoa, Italy


Example pub sub
Example: Pub/Sub

Interests: boy

boy

Interests: boy

girl

Interests: girl

ICDCS’10 Genoa, Italy


Pub sub
Pub/Sub

  • A communication paradigm

    • Subscribers express their interests

    • Publishers disseminate messages

  • Many applications and industry standards

    • Application integration, financial data dissemination,

      RSS feed distribution, business process management

    • WS Notifications, WS Eventing,

      OMGs’ Real-time Data Dissemination Service

  • Topic-based pub/sub

    • TIBCO RV

    • Google’s GooPS

ICDCS’10 Genoa, Italy


Two components in pub sub implementation
Two componentsin pub/sub implementation

Design of routing protocols

Construction of overlay

The construction of the overlay topology such that network traffic is minimized.

Chockler et al., PODC’07

Onus et al., INFOCOM’09

  • The design of protocols so that publications and subscriptions are sent most efficiently across the overlay network.

  • G. Li et al., ICDCS’08

  • M. Castro et al., JSAC’02

ICDCS’10 Genoa, Italy


Desirable properties for overlays
Desirable properties for overlays

Low average node degree

Low fan-out of a node

Low diameter

Topic-connectivity

Efficiency to construct

Adaptability to churn

Ease of distributed implementation

ICDCS’10 Genoa, Italy


Our contributions
Our contributions

ICDCS’10 Genoa, Italy


Topic connectivity
Topic-connectivity

{b,c,d}

{b,c,d}

V1

V1

{a,c}

{a}

{a}

{a,c}

V5

V2

V5

V2

V4

V3

V4

V4

V3

{a,b}

{b,d}

{a,b}

{b,d}

{a,b}

Suboverlay Ga is

topic-connected

Suboverlay Gbis

NOT topic-connected

An overlay G

ICDCS’10 Genoa, Italy


Minavg tco problem
MinAvg-TCO problem

{b,c,d}

{b,c,d}

V1

V1

{a,c}

{a,c}

{a}

{a}

V5

V2

V5

V2

V4

V3

V4

V3

{a,b}

{b,d}

{a,b}

{b,d}

TCO1 has 5 edges

TCO2 has 10 edges

ICDCS’10 Genoa, Italy


Minavg tco problem1
MinAvg-TCO problem

{b,c,d}

V1

{a}

V2

{a,c}

V5

{a,b}

V3

V4

{b,d}

  • A high-quality overlay

    • Topic-connectivity

    • Total number of edges

  • Input:

    • a set of nodes V,

    • a set of topics T,

    • the interest function Int

  • MinAvg-TCO(V,T,Int) (optimization version)

    Construct a TCO(V,T,Int,E) such that |E| is minimum.

  • Avg-TCO(V,T,Int,k) (decision version)

    Is there a TCO(V,T,Int,E) such that |E|=k?

  • Theorem: MinAvg-TCOis NP-complete

ICDCS’10 Genoa, Italy


Greedy merge gm algorithm
Greedy-Merge (GM) algorithm

  • Greedy:

    always making the choice that looks best at the moment

  • GM for MinAvg-TCO:

    always adding an edge with maximum link contribution

  • Running Time: O(|V|2|T|)

  • Approximation Ratio: O(log(|V||T|))

ICDCS’10 Genoa, Italy


Our contributions1
Our contributions

ICDCS’10 Genoa, Italy


Tco join problem
TCO join problem

  • Given p TCOs: TCOd (Vd,Td,Intd,Ed), d=1,..,p

  • MinAvg-TCO-Join(V,T,Int,p) (optimization version)

    Construct a TCO(V,T,Int,E) such that |E| is minimum

  • Avg-TCO-Join(V,T,Int,p,k) (decision version)

    Is there a TCO(V,T,Int,E) such that |E|=k?

  • MinAvg-TCO is a special case of MinAvg-TCO-Join:

    Theorem: MinAvg-TCO-Join is NP-complete

ICDCS’10 Genoa, Italy


Solving minavg tco join
Solving MinAvg-TCO-Join

  • MinAvg-TCO-Join could be solved by GM,

    but NOT practical:

    • Tear down all existing links

    • Rebuild the overlay from scratch using GM

  • It is better to preserve all existing edges and only add edges incrementally.

ICDCS’10 Genoa, Italy


Bad case for incremental addition of edges
Bad case for incremental addition of edges

Vall : interested in all topics in T

Constructing incrementally

Constructing from scratch

Vall

Vall

V1

V1

V1

Vn

V2

Vn

V2

Vn

V2

Vn-1

Vi

Vn-1

Vi

Vn-1

Vi

TCO0 :

TCO2 :

TCO1 :

ICDCS’10 Genoa, Italy


Naive merge nm algorithm
Naive Merge (NM) algorithm

GM algorithm

NM algorithm

Input: (Vd,Td,Intd,Ed), d=1,...,p

Output: one TCO

Algorithm:

- Start with existing internal-TCO links;

- Always add a cross-TCO edge with maximum link contribution.

Running time:

NM is based on the same greedy heuristic as GM.

  • Input: (V,T,Int)

  • Output: one TCO

  • Algorithm:

    - Start with an empty edge set;

    - Always add an edge with maximum link contribution.

  • Running time:

ICDCS’10 Genoa, Italy


Example of nm
Example of NM

{c}

{a}

V0

V1

{c}

{a,c,d}

V4

{d}

V3

V12

{a,b,c}

V13

V7

{c}

V6

V9

V10

{d}

{a,b,c}

{c}

Still a prohibitively high running time!!!

{a,b,c}

V2

V11

{b,c,d}

{a,b,d}

V8

V14

V5

{a}

{a,b,d}

ICDCS’10 Genoa, Italy


Star set
Star set

Given a TCO (V,T,Int,E)

A Star set S is a subset of V that covers all V’s topics.

{b,c,d}

{b,c,d}

{b,c,d}

V1

V1

V1

{a}

{a}

{a}

V5

V2

V5

V2

V5

V2

{a,c}

{a,c}

{a,c}

V4

V3

V4

V3

V4

V3

{a,b}

{a,b}

{b,d}

{b,d}

{a,b}

{b,d}

{v3, v5} is a star set which

covers all topics {a,b,c,d}

{v2, v3, v4} is not a star set; it only covers {a,b,d}

A topic-connected overlay

ICDCS’10 Genoa, Italy


Star set1
Star set

  • Star set nodes

    • Represents the interests of all the nodes

    • Can function as bridges to determine cross-TCO links

  • Observation: minimal star sets tend to be substantially smaller than the total number of nodes.

  • How to find a minimum star set S* for (V,T,Int)?

    • Equal to classic set cover problem: NP-complete

    • Could be approximated with a log approximation ratio

ICDCS’10 Genoa, Italy


Star merge sm algorithm
Star Merge (SM) algorithm

NM algorithm

SM algorithm

Input: (Vd,Td,Intd,Ed), d=1,..,p

Output: one TCO

Algorithm:

- Start with existing internal-TCO links;

- Find a star set for each sub-TCO;

- Always add a cross-Star edge with maximum link contribution.

  • Input: (Vd,Td,Intd,Ed), d=1,..,p

  • Output: one TCO

  • Algorithm:

    - Start with existing internal-TCO links;

    - // Do nothing;

    - Always add a cross-TCO edge with maximum link contribution.

ICDCS’10 Genoa, Italy


Example of sm
Example of SM

{c}

{a}

V0

V1

{c}

{a,c,d}

V4

{d}

V6

V12

{a,b,c}

V13

V7

{c}

V9

{a,b,c}

V10

V3

{d}

{c}

Running time largely improved because

#stars << #nodes

for most cases.

{a,b,c}

V2

V11

{b,c,d}

{a,b,d}

V8

V14

V5

{a}

{a,b,d}

ICDCS’10 Genoa, Italy


Divide and conquer dc for minavg tco
Divide and Conquer (DC) for MinAvg-TCO

  • The number of nodes is a dominant factor for the running time of the GM algorithm.

  • Divide-and-conquer

    • Divide the MinAvg-TCO problem into several sub-overlay construction problems

    • Conquer the sub-MinAvg-TCO problems independently and build sub-overlays into sub-TCOs

    • Combine these sub-TCOs to one TCO

ICDCS’10 Genoa, Italy


Design of dc algorithm
Design of DC algorithm

  • How to divide the node set V:

    • Node clustering vs. random partitioning

    • The number of partitions p

  • The balance between conquer and combine

    • p = 1 (single partition): conquer only = GM

    • p = |V| (each node is a partition): combine only = GM

  • How to decentralize DC:

    • Note the DC algorithm as presented is fully centralized.

    • However, it is possible to decentralize it.

  • Theoretical analysis: not straightforward.

ICDCS’10 Genoa, Italy


Example of dc
Example of DC

{c}

{a}

V0

V1

{c}

{a,c,d}

V4

{d}

V6

V12

{a,b,c}

V13

V7

{c}

V9

{a,b,c}

V10

V3

{d}

{c}

- Divide overlay based on V

- Conquer each sub-TCO by GM

- Combine TCO into one by SM

{a,b,c}

V2

V11

{b,c,d}

{a,b,d}

V8

V14

V5

{a}

{a,b,d}

ICDCS’10 Genoa, Italy


Experiment setting
Experiment setting

  • The number of nodes

    |V| = 1000 ranging from 1000 to 8000

  • The number of topics

    |T| = 100 ranging from 100 to 1000

  • The number of topics that subscribed by a node

    NodeIntSize=20 ranging from 10 to 100

  • Topic distribution uniform, zipf, exponential

ICDCS’10 Genoa, Italy


Experiment design
Experiment design

  • Evaluation:average node degree, running time

    • Star Merge for MinAvg-TCO-Join

    • DC for MinAvg-TCO

      • Random node partitioning

      • The effects of the number of nodes

      • The effects of the number of topics

      • The effects of average subscription size of a node

      • Comparison with RingPT

        RingPT is an algorithm that mimics the common practice of building separate overlay for each topic.

ICDCS’10 Genoa, Italy


Star merge sm vs nm vs gm
Star MergeSM vs NM vs GM

ICDCS’10 Genoa, Italy


Divide and conquer the effect of the number of nodes
Divide-and-conquerThe effect of the number of nodes

ICDCS’10 Genoa, Italy


Divide and conquer dc vs gm vs ringpt
Divide-and-conquerDC vs GM vs RingPT

ICDCS’10 Genoa, Italy


Algorithm summary
Algorithm summary

ICDCS’10 Genoa, Italy



Minimal number of links
Minimal Number of Links

  • A typical pub/sub system combines a number of protocols, many of which maintaining per-link state

    • A node must constantly monitor the availability of each of its neighbors (heartbeats and keep-alive state)

    • If the links are maintained using TCP, there is the cost of connection state for each link

    • The more links there are, the fewer topics can be routed over each individual link, thereby diminishing cross-topic aggregation benefits

    • If sequential-diff-based compression scheme is used, there is an extra cost associated with a history table


ad