Practical theory perspectives
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

Practical Theory Perspectives PowerPoint PPT Presentation


  • 48 Views
  • Uploaded on
  • Presentation posted in: General

Practical Theory Perspectives. CS598ig – Fall 04 Presented by: Mayssam Sayyadian. Publish/Subscribe System. Event notification system Producer publishes messages Consumer waits for certain types of events by placing subscriptions Basic components to be defined: Information space

Download Presentation

Practical Theory Perspectives

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Practical theory perspectives

Practical Theory Perspectives

CS598ig – Fall 04

Presented by: Mayssam Sayyadian


Publish subscribe system

Publish/Subscribe System

  • Event notification system

  • Producer publishes messages

  • Consumer waits for certain types of events by placing subscriptions

  • Basic components to be defined:

    • Information space

    • Subscriptions

    • Events (event schema)

    • Notifications

  • Many applications and examples: stock information delivery, auction systems, air traffic control, news feed , network monitoring, etc…


Research issues

Research Issues

  • System architecture

  • Matching and Dispatching

  • Routing

  • Reliable messaging sending

  • Security

  • Special application issues

    • Mobile environment…


Pub sub systems examples

Pub/Sub Systems: Examples

  • IBM – Gryphon

  • Stanford – SIFT and more…

  • CU-Boulder – Siena

  • France – Le Subscribe

  • Technische University Darmstadt – REBECA

  • Microsoft – Herald

  • MIT

  • Others – XMLBlaster, Elvin4, TIB, Keryx, REBECA


Earlier classification

Earlier Classification

  • Subject based (channel based)

    • System contains many channels

    • Subscriptions and notifications belong to special channel

    • Simple and straight forward matching

    • Restrictive

  • Content based

    • No channel

    • Notifications are sent to the subscribers based on their content

    • More generic

    • Matching suffer from scaling problem (addressed in this paper)


Content based matching problem

Content Based Matching Problem

  • Naïve solution:

    • Match incoming events per each subscription

    • Linear to the number of subscriptions

    • Not practical

  • Requisite:

    • Matching and dispatching should be sub-linear in terms of subscriptions

  • Intuition:

    • Combine parts of subscription to reduce the number of tests for each event


Event forwarding algorithms

Event Forwarding Algorithms

  • Decision trees

    • Use a tree structure to describe the event matching information

    • Forwarding process is an event go through the tree structure

    • Example: Gryphon

  • Hash functions

    • Use hash function to index all components of notifications

    • Use other efficient way to find matched notifications

    • Examples: Le Subscribe


The big picture the information bus

The Big Picture: The Information Bus

Picture from “The Information Bus – An Architecture for Extensible Distributed Systems”, Brian M. Opi, et al, SOSP 1993


A scalable matching algorithm

A Scalable Matching Algorithm

  • “Matching Events in a Content-based Subscription System”, M. K. Aguilera - IBM

  • Address scalability of matching algorithms

  • Sub-linear in the number of subscriptions

  • Space complexity: linear

  • Do preprocessing

  • Assume (almost) infrequent update for subscriptions


Matching algorithm

Matching Algorithm

  • Classification ?

    • Consider a decision tree classifier with subscriptions as set of possible classes

  • Analyze subscriptions

    • sub := pr1 ^ pr2 ^ pr3

    • Conjunction of elementary predicatespri = testi(e)  resi

    • e.g. (city=LA) and (temperature < 40)

    • pr1 = test1(…)  LA

    • pr2 = test2(…)  “<“

    • test1 = “examine attribute city”

    • test2 = “examine attribute temperature 40”


Matching algorithm cont d

Matching Algorithm (Cont’d.)

  • Preprocess to make the matching tree

  • Each non-leaf node is a test

  • Each edge from test node is a possible result

  • Each leaf node is a subscription

  • Pre-process each of the subscriptions and combine the information to prepare the tree

  • On receiving events, follow the sequence of test nodes and edges till a leaf node is reached


Matching tree

Matching Tree

  • Don’t care tests

  • Related tests

    sub3=(test1  res1)^(test2  res2)

    sub4=(test3  res3)^(test4  res4)

    (test3  res3)  (test1  res1)


Matching tree equality tests

Matching Tree (Equality Tests)

Conjugation of equality tests:

sub1=(attr1=v1)^(attr2=v2)^(attr3=v3)

sub2=(attr1=v1)^(attr2=*)^(attr3=v3’)

sub3=(attr1=v1’)^(attr2=v2)^(attr3=v3)


Complexity

Complexity

  • Assumptions:

    • All attributes have the same value set

    • Only equality tests being done

    • No related test in the tree

    • Events come from a uniform distribution

  • Pre-processing:

    • Time complexity: O(NK), K attributes & N subscriptions

    • Space complexity: O(NK)

  • Matching Complexity:

    • Expected time to match a random event: O(N 1-λ ), sub linear

    • λ = ln V / (ln V + ln K’), note 1> λ >0

      • V: number of possible values for each attribute

      • K’: number of attributes in the schema + 1

    • What about worst case ?


Optimizations

Optimizations

  • Collapse a chain of * edges (60% gain)

    • Example: collapse B to A

  • Statically pre-compute successor nodes (20% gain)

  • Separate sub-trees for attributes that rarely have don’t care in subscriptions


Performance

Performance

  • Operations per Event

  • Space per Event = Edges + Successor nodes

  • Latency: 4ms for 25,000 subscriptions

  • Attributes vary in popularity, follow Zipf’s distribution

  • Tests for 30 attributes with 3 possible values

  • Distribution always got 100 matches per event

Operations

per Event

Space

(thousands of cells)


Discussion points

Discussion Points

  • Topology Matters !

  • What about non-equality based subscriptions ?

    • If content based subscriptions are used with equality tests only, are there other ways to achieve sub-linear matching times?

  • Exact vs. approximate results

  • What if

    • Subscriptions vary by time frequently

    • Stream of subscriptions

    • Multi dimensional events


Practical theory perspectives

“Computation in Networks of Passively Mobile Finite-State Sensors”, Dana Angluin, James Aspnes, Zoe Diamadi, Michael Fischer, Rene Peralta, PODC 2004.


The problem a flock of birds

The Problem … A Flock of Birds !

  • Birds: finite state agents (sensors with states)

  • Resource is limited

  • Passive mobility (no control)

  • Communication: How much ?

  • Problems

  • Is there a solutions ?

  • What is the probable solutions ?


A wider view

A Wider View

  • Question:

    • What computations are possible in a cooperative network of passively mobile finite-state sensors.

  • Assumptions:

    • Mobility is passive (not under sensor’s control)

    • Sufficiently rapid and unpredictable (no stable routing strategy)

    • Complete communication

    • Identical sensors: no identifier


Formal model population protocols

Formal Model: Population Protocols

  • Population Protocol (A):

    • A finite input and output alphabets: X, Y

    • A finite set of states: Q

    • An input function: I : X→Q

    • An output function: O : Q →Y

    • A transition function: : (Q  Q) → Q  Q

    • Transitions:(p,q)→(p’,q’) if (p,q)=(p’,q’)


Formal model cont d

Formal Model (Cont’d)

  • Population protocol runs in a Population of any finite size n.

  • Population P :

    • A set A of n agents with irreflexive relationship E AA that are interpreted as directed edges of an interaction graph

  • Population Configuration

    • A mapping C: A Q

    • Specifies the set of states of each member of the population

  • Computation:

    • A finite or infinite sequence of population configurations:

      C0 , C1 , C2 , … such that i: C  Ci


Formal models computation

Formal Models: Computation

  • No halting but stabilizing !

  • Stabilizing is a global property of the population

    • Individual agents do not know the if they have stabilized

    • It is possible to bound number of interactions before having outputs stabilized, by some stochastic assumptions

  • To model computation:

    • What is the input assignment

    • What should be the output assignment

    • Definition of an output stable configuration

  • Formally define: stably computing an input-output relation by a population protocol

  • FA(x) = y for R(x, y)  A stably computes the partial function FA: X Y


Functions

Functions

  • Population protocols compute partial functions from X to Y .

  • Need for suitable input and output encoding for functions on other domains

    • Functions with multiple arguments

    • Predicates on X

    • Integer Functions


A stably computable expression language

A Stably Computable Expression Language

  • Closure properties:

    • If f and g are stably computable then so is about f, f  g and f  g

  • Parity (if there are odd number of 1’s in the input)

  • Majority

  • Arithmetic functions

  • Stably computable expression language

  • An upper bound on the set of stably computable predicates

    All predicates stably computable in the model with all pairs enabled are in the class NL  characterization of this theorem is an open problem


Other issues

Other Issues

  • Restricted Interactions

    • Some interaction graphs permit powerful computations

    • E.g. a population whose interaction graph is a directed line  linear space Turing machine

  • The complete graph (discussed so far) is the weakest structure for computing predicates

    •  Any weakly connected graph can simulate this


Randomized interactions

Randomized Interactions

  • Measures other than stability

  • Let’s add probabilistic assumptions on interactions

    • Consider computations that are correct with high probability

    • Question about expected resource use

  • Benefits of a leader

    • Simulating counters: The model can simulate O(1) counters of O(n)

    • How to elect a leader  use ideas of majority and parity functions

  • The set of predicates accepted by a randomized population protocol with probability ½ +  is contained in P RL


Discussion points1

Discussion Points

  • So what ?!

    • Theoretic fundamentals always help

    • Consider interaction graph as input  what interesting properties about the underlying interaction graph for input could be stably computed ?  applications in analyzing the structure of sensor nets.

    • Consider one-way communication

    • Assume sampling models other than uniform, where does this help?

  • Formal methods + Methodology

    • Remember converting differential equations into distributed protocols

    • What do you THINK !

    • Formalizing computation  Apply methodology


Practical theory perspectives

“Performance Evaluation of a Communication Round over the Internet”, Omar Bakr, Idit Keidar, PODC’02

Some slides taken from Omar Bakr’s’s presentation


Communication round

Communication Round

  • Exchange of information from all hosts to all hosts

  • Part of many distributed algorithms, systems

    • consensus, atomic commit, replication, ...

  • Evaluation  Some metric

    • Number of rounds (or steps) required

    • How long is it going to take

      • Local running time of one host engaged

      • Overall running time

  • What is the best way to implement it ?

    • Centralized vs. decentralized


Example implementations

Example Implementations

(b)

(a)

  • All to all

  • Leader

  • Secondary Leader

(c)


Experiment i

Experiment I

  • 10 hosts: Taiwan, Korea, US academia, ISPs

  • TCP/IP (connections always up)

  • Algorithms:

    • All-to-all

    • Leader (initiator)

    • Secondary leader (not initiator)

  • Periodically initiated at each host

  • 650 times over 3.5 days


Practical theory perspectives

  • Overall Running Time:

    • Elapsed time from initiation (at initiator) until all hosts terminate

    • Requires estimating clock differences

      • Clocks not synchronized, drift

      • We compute difference over short intervals

      • Compute 3 different ways

      • Achieve accuracy within 20 ms. on 90% of runs

  • Overall Running Times From MIT

    • Ping-measured latencies (IP):

      • Longest link latency 240 milliseconds

      • Longest link to MIT 150 milliseconds


Measured running times runs initiated at mit taiwan

Measured Running Times Runs Initiated at MIT / Taiwan


What s going on

What’s going on ?

  • Loss rates on two links are very high

    • 42% and 37%

    • Taiwan to two ISPs in the US

  • Loss rates on other links up to 8%

  • Upon loss, TCP’s timeout is big

    • More than round-trip-time

  • All-to-all sends messages on lossy links

    • Often delayed by loss


Distribution of running times up to 1 3 sec

Distribution of Running Times Up to 1.3 sec.


Removing taiwan

Removing Taiwan

  • Overall running times much better

    • For every initiator and algorithm, less than 10% over 2 seconds (as opposed to 55% previously)

  • All-to-all overall still worse than others!

    • either Leader or Secondary Leader best, depending on initiator

    • loss rates of 2% - 8% are not negligible

    • all-to-all sends O(n2) messages; suffers

  • But, all-to-all has best local running times


Probability of delay due to loss

Probability of Delay due to Loss

  • If all links would have same latency

    • assume 1% loss on all links; 10 hosts (n=10)

    • Leader sends 3(n-1) = 27 messages

      • probability of at least one loss: 1 -.9927 »24%

    • All-2-all sends n(n-1) = 90 messages

      • probability of at least one loss: 1 -.9990 » 60%

  • In reality, links don’t have same latency

    • only loss on long links matters

  • Each communication has a cost !


Discussioln points and lessons learned

Discussioln Points and Lessons Learned

  • Internet is A VERY SPECIAL distributed system (not an ideal one !)

  • Message loss causes high variation in TCP link latencies

    • latency distribution has high variance, heavy tail

  • Latency distribution determines expected time for receiving O(n) concurrent messages

  • Secondary leader helps

    • No triangle inequality, especially for loss

  • Different for overall vs. local running times

  • Number of rounds/steps not sufficient metric

    • One-to-all and all-to-all have different costs


  • Login