practical theory perspectives
Download
Skip this Video
Download Presentation
Practical Theory Perspectives

Loading in 2 Seconds...

play fullscreen
1 / 39

Practical Theory Perspectives - PowerPoint PPT Presentation


  • 69 Views
  • Uploaded on

Practical Theory Perspectives. CS598ig – Fall 04 Presented by: Mayssam Sayyadian. Publish/Subscribe System. Event notification system Producer publishes messages Consumer waits for certain types of events by placing subscriptions Basic components to be defined: Information space

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Practical Theory Perspectives' - tanika


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
practical theory perspectives

Practical Theory Perspectives

CS598ig – Fall 04

Presented by: Mayssam Sayyadian

publish subscribe system
Publish/Subscribe System
  • Event notification system
  • Producer publishes messages
  • Consumer waits for certain types of events by placing subscriptions
  • Basic components to be defined:
    • Information space
    • Subscriptions
    • Events (event schema)
    • Notifications
  • Many applications and examples: stock information delivery, auction systems, air traffic control, news feed , network monitoring, etc…
research issues
Research Issues
  • System architecture
  • Matching and Dispatching
  • Routing
  • Reliable messaging sending
  • Security
  • Special application issues
    • Mobile environment…
pub sub systems examples
Pub/Sub Systems: Examples
  • IBM – Gryphon
  • Stanford – SIFT and more…
  • CU-Boulder – Siena
  • France – Le Subscribe
  • Technische University Darmstadt – REBECA
  • Microsoft – Herald
  • MIT
  • Others – XMLBlaster, Elvin4, TIB, Keryx, REBECA
earlier classification
Earlier Classification
  • Subject based (channel based)
    • System contains many channels
    • Subscriptions and notifications belong to special channel
    • Simple and straight forward matching
    • Restrictive
  • Content based
    • No channel
    • Notifications are sent to the subscribers based on their content
    • More generic
    • Matching suffer from scaling problem (addressed in this paper)
content based matching problem
Content Based Matching Problem
  • Naïve solution:
    • Match incoming events per each subscription
    • Linear to the number of subscriptions
    • Not practical
  • Requisite:
    • Matching and dispatching should be sub-linear in terms of subscriptions
  • Intuition:
    • Combine parts of subscription to reduce the number of tests for each event
event forwarding algorithms
Event Forwarding Algorithms
  • Decision trees
    • Use a tree structure to describe the event matching information
    • Forwarding process is an event go through the tree structure
    • Example: Gryphon
  • Hash functions
    • Use hash function to index all components of notifications
    • Use other efficient way to find matched notifications
    • Examples: Le Subscribe
the big picture the information bus
The Big Picture: The Information Bus

Picture from “The Information Bus – An Architecture for Extensible Distributed Systems”, Brian M. Opi, et al, SOSP 1993

a scalable matching algorithm
A Scalable Matching Algorithm
  • “Matching Events in a Content-based Subscription System”, M. K. Aguilera - IBM
  • Address scalability of matching algorithms
  • Sub-linear in the number of subscriptions
  • Space complexity: linear
  • Do preprocessing
  • Assume (almost) infrequent update for subscriptions
matching algorithm
Matching Algorithm
  • Classification ?
    • Consider a decision tree classifier with subscriptions as set of possible classes
  • Analyze subscriptions
    • sub := pr1 ^ pr2 ^ pr3
    • Conjunction of elementary predicatespri = testi(e)  resi
    • e.g. (city=LA) and (temperature < 40)
    • pr1 = test1(…)  LA
    • pr2 = test2(…)  “<“
    • test1 = “examine attribute city”
    • test2 = “examine attribute temperature 40”
matching algorithm cont d
Matching Algorithm (Cont’d.)
  • Preprocess to make the matching tree
  • Each non-leaf node is a test
  • Each edge from test node is a possible result
  • Each leaf node is a subscription
  • Pre-process each of the subscriptions and combine the information to prepare the tree
  • On receiving events, follow the sequence of test nodes and edges till a leaf node is reached
matching tree
Matching Tree
  • Don’t care tests
  • Related tests

sub3=(test1  res1)^(test2  res2)

sub4=(test3  res3)^(test4  res4)

(test3  res3)  (test1  res1)

matching tree equality tests
Matching Tree (Equality Tests)

Conjugation of equality tests:

sub1=(attr1=v1)^(attr2=v2)^(attr3=v3)

sub2=(attr1=v1)^(attr2=*)^(attr3=v3’)

sub3=(attr1=v1’)^(attr2=v2)^(attr3=v3)

complexity
Complexity
  • Assumptions:
    • All attributes have the same value set
    • Only equality tests being done
    • No related test in the tree
    • Events come from a uniform distribution
  • Pre-processing:
    • Time complexity: O(NK), K attributes & N subscriptions
    • Space complexity: O(NK)
  • Matching Complexity:
    • Expected time to match a random event: O(N 1-λ ), sub linear
    • λ = ln V / (ln V + ln K’), note 1> λ >0
      • V: number of possible values for each attribute
      • K’: number of attributes in the schema + 1
    • What about worst case ?
optimizations
Optimizations
  • Collapse a chain of * edges (60% gain)
    • Example: collapse B to A
  • Statically pre-compute successor nodes (20% gain)
  • Separate sub-trees for attributes that rarely have don’t care in subscriptions
performance
Performance
  • Operations per Event
  • Space per Event = Edges + Successor nodes
  • Latency: 4ms for 25,000 subscriptions
  • Attributes vary in popularity, follow Zipf’s distribution
  • Tests for 30 attributes with 3 possible values
  • Distribution always got 100 matches per event

Operations

per Event

Space

(thousands of cells)

discussion points
Discussion Points
  • Topology Matters !
  • What about non-equality based subscriptions ?
    • If content based subscriptions are used with equality tests only, are there other ways to achieve sub-linear matching times?
  • Exact vs. approximate results
  • What if
    • Subscriptions vary by time frequently
    • Stream of subscriptions
    • Multi dimensional events
slide18
“Computation in Networks of Passively Mobile Finite-State Sensors”, Dana Angluin, James Aspnes, Zoe Diamadi, Michael Fischer, Rene Peralta, PODC 2004.
the problem a flock of birds
The Problem … A Flock of Birds !
  • Birds: finite state agents (sensors with states)
  • Resource is limited
  • Passive mobility (no control)
  • Communication: How much ?
  • Problems
  • Is there a solutions ?
  • What is the probable solutions ?
a wider view
A Wider View
  • Question:
    • What computations are possible in a cooperative network of passively mobile finite-state sensors.
  • Assumptions:
    • Mobility is passive (not under sensor’s control)
    • Sufficiently rapid and unpredictable (no stable routing strategy)
    • Complete communication
    • Identical sensors: no identifier
formal model population protocols
Formal Model: Population Protocols
  • Population Protocol (A):
    • A finite input and output alphabets: X, Y
    • A finite set of states: Q
    • An input function: I : X→Q
    • An output function: O : Q →Y
    • A transition function: : (Q  Q) → Q  Q
    • Transitions:(p,q)→(p’,q’) if (p,q)=(p’,q’)
formal model cont d
Formal Model (Cont’d)
  • Population protocol runs in a Population of any finite size n.
  • Population P :
    • A set A of n agents with irreflexive relationship E AA that are interpreted as directed edges of an interaction graph
  • Population Configuration
    • A mapping C: A Q
    • Specifies the set of states of each member of the population
  • Computation:
    • A finite or infinite sequence of population configurations:

C0 , C1 , C2 , … such that i: C  Ci

formal models computation
Formal Models: Computation
  • No halting but stabilizing !
  • Stabilizing is a global property of the population
    • Individual agents do not know the if they have stabilized
    • It is possible to bound number of interactions before having outputs stabilized, by some stochastic assumptions
  • To model computation:
    • What is the input assignment
    • What should be the output assignment
    • Definition of an output stable configuration
  • Formally define: stably computing an input-output relation by a population protocol
  • FA(x) = y for R(x, y)  A stably computes the partial function FA: X Y
functions
Functions
  • Population protocols compute partial functions from X to Y .
  • Need for suitable input and output encoding for functions on other domains
    • Functions with multiple arguments
    • Predicates on X
    • Integer Functions
a stably computable expression language
A Stably Computable Expression Language
  • Closure properties:
    • If f and g are stably computable then so is about f, f  g and f  g
  • Parity (if there are odd number of 1’s in the input)
  • Majority
  • Arithmetic functions
  • Stably computable expression language
  • An upper bound on the set of stably computable predicates

All predicates stably computable in the model with all pairs enabled are in the class NL  characterization of this theorem is an open problem

other issues
Other Issues
  • Restricted Interactions
    • Some interaction graphs permit powerful computations
    • E.g. a population whose interaction graph is a directed line  linear space Turing machine
  • The complete graph (discussed so far) is the weakest structure for computing predicates
    •  Any weakly connected graph can simulate this
randomized interactions
Randomized Interactions
  • Measures other than stability
  • Let’s add probabilistic assumptions on interactions
    • Consider computations that are correct with high probability
    • Question about expected resource use
  • Benefits of a leader
    • Simulating counters: The model can simulate O(1) counters of O(n)
    • How to elect a leader  use ideas of majority and parity functions
  • The set of predicates accepted by a randomized population protocol with probability ½ +  is contained in P RL
discussion points1
Discussion Points
  • So what ?!
    • Theoretic fundamentals always help
    • Consider interaction graph as input  what interesting properties about the underlying interaction graph for input could be stably computed ?  applications in analyzing the structure of sensor nets.
    • Consider one-way communication
    • Assume sampling models other than uniform, where does this help?
  • Formal methods + Methodology
    • Remember converting differential equations into distributed protocols
    • What do you THINK !
    • Formalizing computation  Apply methodology
slide29
“Performance Evaluation of a Communication Round over the Internet”, Omar Bakr, Idit Keidar, PODC’02

Some slides taken from Omar Bakr’s’s presentation

communication round
Communication Round
  • Exchange of information from all hosts to all hosts
  • Part of many distributed algorithms, systems
    • consensus, atomic commit, replication, ...
  • Evaluation  Some metric
    • Number of rounds (or steps) required
    • How long is it going to take
      • Local running time of one host engaged
      • Overall running time
  • What is the best way to implement it ?
    • Centralized vs. decentralized
example implementations
Example Implementations

(b)

(a)

  • All to all
  • Leader
  • Secondary Leader

(c)

experiment i
Experiment I
  • 10 hosts: Taiwan, Korea, US academia, ISPs
  • TCP/IP (connections always up)
  • Algorithms:
    • All-to-all
    • Leader (initiator)
    • Secondary leader (not initiator)
  • Periodically initiated at each host
  • 650 times over 3.5 days
slide33
Overall Running Time:
    • Elapsed time from initiation (at initiator) until all hosts terminate
    • Requires estimating clock differences
      • Clocks not synchronized, drift
      • We compute difference over short intervals
      • Compute 3 different ways
      • Achieve accuracy within 20 ms. on 90% of runs
  • Overall Running Times From MIT
    • Ping-measured latencies (IP):
      • Longest link latency 240 milliseconds
      • Longest link to MIT 150 milliseconds
what s going on
What’s going on ?
  • Loss rates on two links are very high
    • 42% and 37%
    • Taiwan to two ISPs in the US
  • Loss rates on other links up to 8%
  • Upon loss, TCP’s timeout is big
    • More than round-trip-time
  • All-to-all sends messages on lossy links
    • Often delayed by loss
removing taiwan
Removing Taiwan
  • Overall running times much better
    • For every initiator and algorithm, less than 10% over 2 seconds (as opposed to 55% previously)
  • All-to-all overall still worse than others!
    • either Leader or Secondary Leader best, depending on initiator
    • loss rates of 2% - 8% are not negligible
    • all-to-all sends O(n2) messages; suffers
  • But, all-to-all has best local running times
probability of delay due to loss
Probability of Delay due to Loss
  • If all links would have same latency
    • assume 1% loss on all links; 10 hosts (n=10)
    • Leader sends 3(n-1) = 27 messages
      • probability of at least one loss: 1 -.9927 »24%
    • All-2-all sends n(n-1) = 90 messages
      • probability of at least one loss: 1 -.9990 » 60%
  • In reality, links don’t have same latency
    • only loss on long links matters
  • Each communication has a cost !
discussioln points and lessons learned
Discussioln Points and Lessons Learned
  • Internet is A VERY SPECIAL distributed system (not an ideal one !)
  • Message loss causes high variation in TCP link latencies
    • latency distribution has high variance, heavy tail
  • Latency distribution determines expected time for receiving O(n) concurrent messages
  • Secondary leader helps
    • No triangle inequality, especially for loss
  • Different for overall vs. local running times
  • Number of rounds/steps not sufficient metric
    • One-to-all and all-to-all have different costs
ad