Cpsc 689 discrete algorithms for mobile and wireless systems
Download
1 / 57

CPSC 689: Discrete Algorithms for Mobile and Wireless Systems - PowerPoint PPT Presentation


  • 70 Views
  • Uploaded on

CPSC 689: Discrete Algorithms for Mobile and Wireless Systems. Spring 2009 Prof. Jennifer Welch. Lecture 33. Topic: Data Aggregation in Sensor Networks Sources: Nath, Gibbons, Seshan & Anderson Shrivastava, Buragohain, Agrawal & Suri. Aggregation Problem.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' CPSC 689: Discrete Algorithms for Mobile and Wireless Systems' - lucie


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Cpsc 689 discrete algorithms for mobile and wireless systems

CPSC 689: Discrete Algorithms for Mobile and Wireless Systems

Spring 2009

Prof. Jennifer Welch


Lecture 33
Lecture 33 Systems

  • Topic:

    • Data Aggregation in Sensor Networks

  • Sources:

    • Nath, Gibbons, Seshan & Anderson

    • Shrivastava, Buragohain, Agrawal & Suri

Discrete Algs for Mobile Wireless Sys


Aggregation problem
Aggregation Problem Systems

  • How to compute the answer to a query in a sensor network that requires aggregating data from all (or many) sensors?

  • Example: Suppose the nodes take temperature readings and queries ask for min/max/average temperature

  • Data has to flow through the network to the node that issues the query

  • In some cases, data can be aggregated on the way

    • save bandwidth and energy

    • Example: to find max temp., each node propagates largest temp. it has learned about

Discrete Algs for Mobile Wireless Sys


Communication to support aggregation
Communication to Support Aggregation Systems

  • Need to propagate sensor readings in some orderly way

  • Example: send data over a spanning tree rooted at the querying node

    • not robust: link or node failure will partition the tree, lose contact with sensors in subtree

  • Prefer to use multipath routing (message is sent on several paths)

    • redundancy provides more resilience

  • But duplication causes problems for aggregation

    • OK for max, but what about average?

Discrete Algs for Mobile Wireless Sys


Overview of algorithm
Overview of Algorithm Systems

  • Provides framework for synopses of the data to be sent over multiple paths and then reconstructing correct answer

  • Phase 1: aggregate query is flooded through the network and an aggregation topology is constructed

  • Phase 2: aggregate values are continually routed toward the querying node:

    • each node converts its sensor data to a synopsis (SG function)

    • nodes merge two synopses into one (SF function)

    • querying node converts synopsis back to final answer (SE function)

Discrete Algs for Mobile Wireless Sys


Specific aggregation topology
Specific Aggregation Topology Systems

  • Rings:

    • kind of like levels in breadth-first search

  • Nodes are partitioned into rings during Phase 1:

    • querying node q is in ring 0

    • a node is in ring i if it receives the query first from a node in ring i–1

  • Phase 2 is divided into epochs, one aggregate answer per epoch

    • each node in outer ring (farthest distance from q) computes s := SG(r), where r is its sensor reading, and broadcasts s

    • each node in ring i computes s := SG(r), where r is its sensor reading, and updates s := SF(s,s'), where s' is each synopsis received from a neighbor, then broadcasts s

    • querying node computes SE(s)

  • Synchronous algorithm

Discrete Algs for Mobile Wireless Sys


Figure
Figure Systems

R2

R0

R1

q

A

C

B

Discrete Algs for Mobile Wireless Sys


Analysis of framework
Analysis of Framework Systems

  • Complexity: each node broadcasts once per epoch

  • Same as spanning-tree-based approach

  • More resilient than spanning-tree-based approach

Discrete Algs for Mobile Wireless Sys


The functions
The Functions Systems

  • What should SG, SF, and SE be in order to give the "correct" answer?

  • First, give a condition on the functions that is intuitive

  • Then show there are 4 simple checks that can be done on proposed functions

  • These conditions are necessary and sufficient to preserve correctness

Discrete Algs for Mobile Wireless Sys


Odi correctness
ODI-Correctness Systems

  • Final result should be independent of how the data was routed to querier:

    • same no matter in which order the readings are combined and how many times they are included (duplicated) during the routing

  • Sensor reading r : <measurement, metadata>

    • assumed to be unique

  • Suppose we have SG, SF and SE

  • Define synopsis label SL(s) = {r} if s = SG(r ) and SL(s) = SL(s1) Ums SL(s2) if s = SF(s1,s2)

Discrete Algs for Mobile Wireless Sys


Odi correctness cont d
ODI-Correctness (cont'd) Systems

  • What constitutes a "duplicate" depends on what is being computed

    • Ex: average temp vs. number of distinct temps

  • q : multiset of sensor readings  set of (unique) values

  • q(SL(s)) = set of unique values in all the sensor readings that formed the synopsis

Discrete Algs for Mobile Wireless Sys


Odi correctness definition
ODI-Correctness Definition Systems

  • Let {v1,…,vk} be set of values in the label of s, i.e., q(SL(s)).

  • Then s must be same as computation on "canonical left-deep tree":

    • s := SG(v1)

    • for i = 2 to k do

      • s := SF(s,SG(vi))

  • I.e., regardless of redundancy caused by multipath routing, the final synopsis is the same as if each distinct value is included just once

Discrete Algs for Mobile Wireless Sys


Odi correctness figure

s Systems

s

r5

SF

SF

SF

SF

SF

SF

SF

SF

SF

SF

SF

r4

SG

SG

SG

SG

SG

SG

SG

SG

SG

SG

r3

r2

r1

r5

r3

r1

r4

r2

ODI-Correctness Figure

Canonical left-deep tree

Aggregation DAG

Discrete Algs for Mobile Wireless Sys


A simple test for odi correctness
A Simple Test for ODI-Correctness Systems

  • duplicate preservation: q({r1}) = q({r2})  SG(r1) = SG(r2)

    • if two readings are considered duplicates, then the same synopsis is generated

  • commutativity: SF(s1,s2) = SF(s2,s1)

  • associativity: SF(s1,SF(s2,s3)) = SF(SF(s1,s2),s3)

  • idempotence: SF(s,s) = s

Discrete Algs for Mobile Wireless Sys


More about the conditions
More About the Conditions Systems

  • Theorem: The previous 4 conditions are necessary and sufficient for the SG and SF functions to ensure ODI-correctness.

  • Proof Sketch:

    • sufficiency: If SG and SF satisfy the 4 conditions, then show that any computation DAG can be transformed into a canonical left-deep binary tree that produces the same output

    • necessity: Argue that the 4 conditions follow from the definition of ODI-correctness.

Discrete Algs for Mobile Wireless Sys


Count example
Count Example Systems

  • Query: What is the (approximate) total number of sensor nodes in the network?

  • Synopsis: a bit vector of length k > log N, where N is an upper bound on the number of nodes

    • N could be original number of nodes deployed, or some function of the size of the id space

Discrete Algs for Mobile Wireless Sys


Sg for count example
SG for Count Example Systems

  • No sensor is actually read for this example.

  • Let SG return vector s[1..k], where

    • a certain entry is 1

    • rest of the entries are 0

  • How to decide which entry should be 1:

    • entry CT(k), where CT(k) is a random variable that returns value i with probability 1/2i, 1 ≤ i < k.

  • How to compute CT(k):

    • Toss a fair coin until either the first head occurs or k coin tosses have occurred with no heads; return number of tosses

Discrete Algs for Mobile Wireless Sys


Computation of ct k
Computation of CT(k) Systems

  • Why does the coin-tossing protocol give the desired random variable?

  • Proof by Example: Suppose k = 4.

    • First toss is H, and 1 is returned, with probability 1/2

    • Otherwise, second toss is H, and 2 is returned, with probability 1/4

    • Otherwise third toss is H and 3 is returned, with probability 1/8

    • (and then 4 is returned with probability 1/8, but the definition of CT(4) only cares about 1 through 3)

Discrete Algs for Mobile Wireless Sys


Sf and se for count example
SF and SE for Count Example Systems

  • SF(s,s'):

    • s[i] := s[i] OR s'[i], 1 ≤ i ≤ k

    • return s

  • SE(s):

    • return 2i-1/.77351, where i is the minimum index such that s[i] = 0

Discrete Algs for Mobile Wireless Sys


Intuition for count synopsis functions
Intuition for Count Synopsis Functions Systems

  • Suppose all (live) sensors have a failure-free path to the querier.

  • The final bit vector to which SE is applied indicates which bit positions have been set by at least one node

  • The probability of n nodes failing to set the i-th bit is (1–2i)n by definition of SG

  • Thus the number of (live) nodes is proportional to 2i–1

    • constant of proportionality is 1/.77351

Discrete Algs for Mobile Wireless Sys


Intuition for count synopsis functions1
Intuition for Count Synopsis Functions Systems

  • Alternatively…

  • We expect half the nodes to set the 1st bit, a quarter of the nodes to set the 2nd bit, an eighth of the nodes to set the 3rd bit, etc.

  • If there are n distinct nodes, then we might expect log n bits to be set

  • I.e., if log n = i bits are set, then we might expect there to be about n = 2i nodes

Discrete Algs for Mobile Wireless Sys


Count algorithm is odi correct
Count Algorithm is ODI-Correct Systems

  • Note that ODI-correctness says nothing about the SE function, only that SE will return the same result as in the canonical tree.

    • "Clever algorithms are still required to get provably good approximations, although the task has been simplified…"

  • Commutativity, associativity, and idempotence follow from properties of Boolean OR

Discrete Algs for Mobile Wireless Sys


Count algorithm is odi correct1
Count Algorithm is ODI-Correct Systems

  • Why does SG preserve duplicates?

  • Assume each node calls SG only once.

  • Show that if sensor readings are considered duplicates, then the synopsis generated by SG is the same.

    • Since there is no actual sensor reading for this algorithm, we just use ids for the readings.

    • Assumption that each node calls SG only once ensures the property.

Discrete Algs for Mobile Wireless Sys


Implicit acknowledgments
Implicit Acknowledgments Systems

  • When a node broadcasts a synopsis, avoid overhead of explicit acknowledgments from receivers this way:

    • node u broadcasts its synopsis

    • node u snoops (listens to) subsequent broadcasts by its parent nodes (nodes closer to the querying node)

    • if the synopsis broadcast by a parent "effectively includes" u's synopsis, u does not need to rebroadcast, otherwise rebroadcast (or adapt the topology)

Discrete Algs for Mobile Wireless Sys


Implicit acknowledgments cont d
Implicit Acknowledgments (cont'd) Systems

  • How can u accurately infer if its broadcasts was "effectively included"?

  • Suppose u's synopsis was x and the parent's was z.

  • If SF(x,z) = z, then x is effectively included.

  • Why? Since SF is commutative, associative, and idempotent, it is a "semi-lattice".

    • in a semi-lattice, every 2 elements x and y have a least upper bound z, and SF(x,z) = z = SF(y,z)

    • Count example: check if appropriate bits are set

Discrete Algs for Mobile Wireless Sys


Error bounds of approximate answers
Error Bounds of Approximate Answers Systems

  • Sources of error:

    • communication error: some nodes have no failure-free propagation path to querier

    • approximation error: introduced by SG, SF and SE functions.

      • defined as relative error of computed answer w.r.t. exact algorithm using the same readings

  • Argue that communication error can be made negligible by deploying sensor nodes sufficiently densely

Discrete Algs for Mobile Wireless Sys


Error bounds of approximate answers cont d
Error Bounds of Approximate Answers (cont'd) Systems

  • Approximation error analysis for the centralized data stream model work in this model, since synposis is ODI-correct

    • canonical left-deep tree corresponds to processing a data stream of sensor readings in a centralized location

  • Thus, e.g., Count algorithm has same approximation error guarantees as computed by Flajolet & Martin

Discrete Algs for Mobile Wireless Sys


More examples
More Examples Systems

  • Max and Min: easy.

    • SG is the value, SF takes larger/smaller, SE is identity

  • Sum: cf. paper by Considine et al. which adapts Count algorithm

  • Average, Standard deviation, Second Moment: cf. paper by Considine et al. which uses Sum

  • Count Distinct: modification of Count

Discrete Algs for Mobile Wireless Sys


Uniform sample example
Uniform Sample Example Systems

  • Compute a uniform sample of a given size K of the values occurring at all nodes in the network

  • Synopsis: a sample of size K tuples (or fewer initially)

  • SG: output (val,r,id) where

    • val is the sensor reading of the node

    • r is a random number drawn uniformly from [0,1]

    • id is the node's id

  • SF(s,s'): list the tuples in s U s' in decreasing order of r-value, and output the first K (or all, if less than K total)

    • U is set union, removes duplicates

  • SE(s): output the set of values in the tuples of s

Discrete Algs for Mobile Wireless Sys


Uniform sample example cont d
Uniform Sample Example (cont'd) Systems

  • SG labels each reading with a random number, thus placing it in a random position in the global ordering of all readings

  • So taking first K in the ordering gives a uniform sample.

  • Uniform sample can then be used…

Discrete Algs for Mobile Wireless Sys


More examples1
More Examples Systems

  • Use uniform samples to compute these aggregates:

    • k-th statistical moment (k = 1 is the mean)

    • k-th percentile value (k = 50 is the median)

      with certain error and probability, by choosing the sample size appropriately (cf. Bar-Yossef et al.)

  • Compute the k most frequent values (k = 1 is the mode): run an ODI-correct Count algorithm for each value

Discrete Algs for Mobile Wireless Sys


Adapting the topology
Adapting the Topology Systems

  • If message loss is detected as occurring "too frequently", nodes can adapt the Ring topology

  • Idea: use a heuristic that tries to assign a node u to a ring so that there are plenty of ndoes in the next ring to forward u's synopsis to the querier

  • ODI-correct synopses are helpful:

    • implicit acks are used to detect message loss energy-efficiently

    • duplicates that occur during the adaptation of the topology are not a problem

Discrete Algs for Mobile Wireless Sys


Simulation results
Simulation Results Systems

  • Extensive!

  • Synopsis diffusion

    • reduces answer errors in lossy environments

    • helps address challenges from correlated node failures

    • does not use significantly more power

  • What topology to use?

    • Adaptive Rings has same overhead as Rings but much better accuracy

    • Adaptive Rings gets about 90% of the sensor readings most of the time vs. 100% with Flooding, but uses much less power

Discrete Algs for Mobile Wireless Sys


Medians and beyond sbas
Medians and Beyond Systems[SBAS]

  • Extend beyond min/max/sum the class of queries that can be answered in sensor networks to include

    • approximate quantiles (including median)

    • most frequent data values (including consensus)

    • histogram of data distribution

    • range queries

  • Provide strict theoretical guarantees on the approximation quality of the answers in terms of message size

Discrete Algs for Mobile Wireless Sys


Comparison with nath paper
Comparison with Nath Paper Systems

  • Some of the same problems are considered

  • "Medians and Beyond" is concerned with efficiency of message size and its tradeoff with quality of approximation

  • Nath paper was concerned with handling arbitrary ordering and duplicates

    • "Medians and Beyond" assumes no duplicates

Discrete Algs for Mobile Wireless Sys


Overview
Overview Systems

  • Assume we have a tree rooted at the querying node

  • To compute Average: each node sends to its parent the sum of thedata values of its descendants and its number of descendants

    • constant size messages

  • To compute Median, need to keep track of all distinct values

    • size of messages, and memory, grows linearly

  • Trade off memory and bandwidth with accuracy of approximations

Discrete Algs for Mobile Wireless Sys


Q digests
Q-Digests Systems

  • Assume sensor readings are integers in the range [1,s]

  • Introduce q-digest data structure to answer quantile queries with

    • messages of size m

    • error O((log s)/m)

  • Users specify message size vs. error tradeoff

  • q-digest measures maximum error accumulated so far

  • Once q -digest query is done, use it to compute quantiles, data distribution,…

Discrete Algs for Mobile Wireless Sys


More on q digest
More on q-Digest Systems

  • Compute a compressed view of the complete distribution of values (instead of just a function of the values)

  • Use this view of the distribution to compute approximations of various functions

  • Basic idea: Essentially compute a histogram, but

    • equally large, instead of equally spaced, buckets

    • buckets can overlap

    • size of buckets gives accuracy vs. communication tradeoff

Discrete Algs for Mobile Wireless Sys


Definition of q digest
Definition of q-Digest Systems

  • Group values into variable-sized buckets of almost equal weights

    • size refers to range

    • weight refers to number of elements

  • q-digest consists of a set of buckets

  • Build a complete binary tree

    • 1,…,s at the leaves

    • every tree node is a bucket, its range is all the leaves in its subtree

  • At any given point, only some of the buckets are being used

Discrete Algs for Mobile Wireless Sys


Example
Example Systems

1

data range 1-8

15 data items

5 buckets

2

2

4

6

1

2

3

4

5

6

7

8

Discrete Algs for Mobile Wireless Sys


Definition of q digest1
Definition of q-Digest Systems

  • Given compression parameter k and number of data items n, a (tree) node v is in the q-digest iff:

    • count(v) ≤ n/k

      • node should not have a high count

    • count(v) + count(parent(v)) + count(sibling(v)) > n/k

      • if a node and its children have low total count then combine using Compress algorithm

  • For a leaf node, if count > n/k, then it is in the q-digest

  • Root only needs to satisfy first condition

Discrete Algs for Mobile Wireless Sys


Example1

1 Systems

2

2

4

6

1

2

3

4

5

6

7

8

Example

check that this has k = 5;

n/k = 3

Discrete Algs for Mobile Wireless Sys


Centralized construction of q digest
Centralized Construction of q-Digest Systems

  • Go through all the tree nodes bottom up

  • Check which ones satisfy the 2 properties.

  • If a node v has a child that violates 2nd property then merge v with both its children

  • Detailed info about values which occur frequently is preserved, while less frequently occurring values are lumped into larger buckets resulting in info loss

Discrete Algs for Mobile Wireless Sys


2 Systems

2

1

1

4

6

1

1

1

4

6

8

1

2

3

4

5

6

8

7

1

2

3

4

5

6

7

1

1

2

2

2

2

4

6

4

6

8

1

2

3

4

5

6

7

8

1

2

3

4

5

6

7

1


Distributed construction of q digest
Distributed Construction of q-Digest Systems

  • Represent a q-digest by numbering the nodes of the digest tree and sending a set of (node id, count) pairs

  • q-digests move up the spanning tree, being merged as they go.

  • To merge 2 q-digests:

    • take their union

    • add the counts of buckets with the same range

    • compress the result

  • Merging can cause information loss.

Discrete Algs for Mobile Wireless Sys


Analysis of q digest
Analysis of Q-Digest Systems

  • Lemma 1: A q-digest with parameter k has size (number of buckets) at most 3k.

    • because the count of a node and its children can't be too small

  • Lemma 2: In a q-digest with parameter k, the maximum error in the count of any node is n(log s)/k.

    • because in the worst case the count of a node can deviate from the actual value by the sum of the counts of its ancestors

  • Lemma 3: Merging multiple q-digests gives the same error as in Lemma 2.

Discrete Algs for Mobile Wireless Sys


Quantile queries
Quantile Queries Systems

  • Problem Statement: Given a fraction q between 0 and 1, find the value whose rank in sorted sequence of the n values is qn.

    • Median is when q = 1/2

  • Relative error is defined to be |r – qn|/n, where r is the true rank of the returned value

Discrete Algs for Mobile Wireless Sys


Using q digest to answer a quantile query
Using Q-Digest to Answer a Quantile Query Systems

  • Goal: find q-th quantile

  • Sort the nodes of the q-digest in increasing order of max values (right endpoints); break ties by putting smaller ranges first

    • this gives post-order traversal of the tree

  • Scan sorted list and add up the counts

  • Let v be the first node at which the running sum exceeds qn

  • Return the max value of node v

Discrete Algs for Mobile Wireless Sys


Error analysis
Error Analysis Systems

  • Answer returned is v.max

  • There are at least qn values less than or equal to v.max, by choice of v

  • Error comes from values that are less than v.max but are stored in ancestors of v (these buckets are listed after v)

  • But this error is at most n(log s)/k

  • Note that estimate is always at least as great as the eact answer

Discrete Algs for Mobile Wireless Sys


Example2
Example Systems

1

  • Find Median (q = 1/2); recall n = 15 so look for 7.5

  • Sorted list is (j,4), (k,6), (f,2), (g,2), (a,1)

  • Running sums of counts are 4, 10 - done!

  • Return max value in tree node k, which is 4

  • Error is at most sum of counts on path from k to root, which is 1

a

a through o are the

ids of the digest tree

nodes:

j = [3:3]

k = [4:4]

f = [5:6]

g = [7:8]

a = [1:8]

b

c

2

2

d

e

g

f

4

6

k

m

o

i

j

n

l

h

1

2

3

4

5

6

7

8

Discrete Algs for Mobile Wireless Sys


Trading off error and message size
Trading Off Error and Message Size Systems

  • Memory and message size are controlled by the compression factor k:

    • If k is small, then fewer buckets but wider range of values are lumped together

    • If k is large, then more buckets but more fine-grained distribution of values to buckets

  • If the maximum number of buckets you can afford is m, then set k = m/3 (by Lemma 1) and get error at most  = 3(log s)/m (by Lemmas 2 and 3)

Discrete Algs for Mobile Wireless Sys


Other queries
Other Queries Systems

  • Inverse Quantile: given a value x, determine its rank in the sorted sequence of input values

  • Algorithm:

    • construct same sorted list

    • traverse list from beginning to end

    • return as the answer the sum of the counts of buckets v for which x > v.max.

  • Reported rank is between

    rank(x) and rank(x) + n

Discrete Algs for Mobile Wireless Sys


Other queries1
Other Queries Systems

  • Range Query: find the number of values in the range [low,high].

  • Algorithm:

    • perform inverse quantile queries to get the ranks of low and high

    • return the difference in their ranks

  • Maximum error is 2n

Discrete Algs for Mobile Wireless Sys


Other queries2
Other Queries Systems

  • Consensus Query: Given a fraction f between 0 and 1, find all values that are reported by more than fn sensors

  • Algorithm:

    • Find all unit-width (leaf) buckets with count > (f–)n and return their values

  • Since a leaf bucket's count has error at most n, this finds all values with frequency more than fn

  • There may be some false positives: some values with count between (f–)n and fn may also be reported

Discrete Algs for Mobile Wireless Sys


Confidence factor
Confidence Factor Systems

  • Worst-case error is 3 (log s)/m, but it is unlikely that an execution will be this bad

    • choosing message size m according to this constraint will be overkill and waste bandwidth

  • Instead set m to a value for which it is expected that the error bound will be met

  • Need to calculate the actual error in each q-digest: called confidence factor

  • Define weight of a path: sum of counts of the nodes in the path

  • Define confidence factor: maximum weight of any root-to-leaf path, divided by n

Discrete Algs for Mobile Wireless Sys


Simulation results1
Simulation Results Systems

  • Compared against simple scheme of keeping track of every distinct value together with its count

  • q-digest scheme works well

Discrete Algs for Mobile Wireless Sys


Open questions
Open Questions Systems

  • Continuous queries?

  • Lost messages?

  • Duplicate invariance?

  • Include spatial information?

  • Optimality of results?

Discrete Algs for Mobile Wireless Sys


ad