1 / 57

# CPSC 689: Discrete Algorithms for Mobile and Wireless Systems - PowerPoint PPT Presentation

CPSC 689: Discrete Algorithms for Mobile and Wireless Systems. Spring 2009 Prof. Jennifer Welch. Lecture 33. Topic: Data Aggregation in Sensor Networks Sources: Nath, Gibbons, Seshan & Anderson Shrivastava, Buragohain, Agrawal & Suri. Aggregation Problem.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' CPSC 689: Discrete Algorithms for Mobile and Wireless Systems' - lucie

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### CPSC 689: Discrete Algorithms for Mobile and Wireless Systems

Spring 2009

Prof. Jennifer Welch

Lecture 33 Systems

• Topic:

• Data Aggregation in Sensor Networks

• Sources:

• Nath, Gibbons, Seshan & Anderson

• Shrivastava, Buragohain, Agrawal & Suri

Discrete Algs for Mobile Wireless Sys

Aggregation Problem Systems

• How to compute the answer to a query in a sensor network that requires aggregating data from all (or many) sensors?

• Example: Suppose the nodes take temperature readings and queries ask for min/max/average temperature

• Data has to flow through the network to the node that issues the query

• In some cases, data can be aggregated on the way

• save bandwidth and energy

• Example: to find max temp., each node propagates largest temp. it has learned about

Discrete Algs for Mobile Wireless Sys

• Need to propagate sensor readings in some orderly way

• Example: send data over a spanning tree rooted at the querying node

• not robust: link or node failure will partition the tree, lose contact with sensors in subtree

• Prefer to use multipath routing (message is sent on several paths)

• redundancy provides more resilience

• But duplication causes problems for aggregation

• OK for max, but what about average?

Discrete Algs for Mobile Wireless Sys

Overview of Algorithm Systems

• Provides framework for synopses of the data to be sent over multiple paths and then reconstructing correct answer

• Phase 1: aggregate query is flooded through the network and an aggregation topology is constructed

• Phase 2: aggregate values are continually routed toward the querying node:

• each node converts its sensor data to a synopsis (SG function)

• nodes merge two synopses into one (SF function)

• querying node converts synopsis back to final answer (SE function)

Discrete Algs for Mobile Wireless Sys

• Rings:

• kind of like levels in breadth-first search

• Nodes are partitioned into rings during Phase 1:

• querying node q is in ring 0

• a node is in ring i if it receives the query first from a node in ring i–1

• Phase 2 is divided into epochs, one aggregate answer per epoch

• each node in outer ring (farthest distance from q) computes s := SG(r), where r is its sensor reading, and broadcasts s

• each node in ring i computes s := SG(r), where r is its sensor reading, and updates s := SF(s,s'), where s' is each synopsis received from a neighbor, then broadcasts s

• querying node computes SE(s)

• Synchronous algorithm

Discrete Algs for Mobile Wireless Sys

Figure Systems

R2

R0

R1

q

A

C

B

Discrete Algs for Mobile Wireless Sys

Analysis of Framework Systems

• Complexity: each node broadcasts once per epoch

• Same as spanning-tree-based approach

• More resilient than spanning-tree-based approach

Discrete Algs for Mobile Wireless Sys

The Functions Systems

• What should SG, SF, and SE be in order to give the "correct" answer?

• First, give a condition on the functions that is intuitive

• Then show there are 4 simple checks that can be done on proposed functions

• These conditions are necessary and sufficient to preserve correctness

Discrete Algs for Mobile Wireless Sys

ODI-Correctness Systems

• Final result should be independent of how the data was routed to querier:

• same no matter in which order the readings are combined and how many times they are included (duplicated) during the routing

• assumed to be unique

• Suppose we have SG, SF and SE

• Define synopsis label SL(s) = {r} if s = SG(r ) and SL(s) = SL(s1) Ums SL(s2) if s = SF(s1,s2)

Discrete Algs for Mobile Wireless Sys

ODI-Correctness (cont'd) Systems

• What constitutes a "duplicate" depends on what is being computed

• Ex: average temp vs. number of distinct temps

• q : multiset of sensor readings  set of (unique) values

• q(SL(s)) = set of unique values in all the sensor readings that formed the synopsis

Discrete Algs for Mobile Wireless Sys

ODI-Correctness Definition Systems

• Let {v1,…,vk} be set of values in the label of s, i.e., q(SL(s)).

• Then s must be same as computation on "canonical left-deep tree":

• s := SG(v1)

• for i = 2 to k do

• s := SF(s,SG(vi))

• I.e., regardless of redundancy caused by multipath routing, the final synopsis is the same as if each distinct value is included just once

Discrete Algs for Mobile Wireless Sys

s Systems

s

r5

SF

SF

SF

SF

SF

SF

SF

SF

SF

SF

SF

r4

SG

SG

SG

SG

SG

SG

SG

SG

SG

SG

r3

r2

r1

r5

r3

r1

r4

r2

ODI-Correctness Figure

Canonical left-deep tree

Aggregation DAG

Discrete Algs for Mobile Wireless Sys

• duplicate preservation: q({r1}) = q({r2})  SG(r1) = SG(r2)

• if two readings are considered duplicates, then the same synopsis is generated

• commutativity: SF(s1,s2) = SF(s2,s1)

• associativity: SF(s1,SF(s2,s3)) = SF(SF(s1,s2),s3)

• idempotence: SF(s,s) = s

Discrete Algs for Mobile Wireless Sys

• Theorem: The previous 4 conditions are necessary and sufficient for the SG and SF functions to ensure ODI-correctness.

• Proof Sketch:

• sufficiency: If SG and SF satisfy the 4 conditions, then show that any computation DAG can be transformed into a canonical left-deep binary tree that produces the same output

• necessity: Argue that the 4 conditions follow from the definition of ODI-correctness.

Discrete Algs for Mobile Wireless Sys

Count Example Systems

• Query: What is the (approximate) total number of sensor nodes in the network?

• Synopsis: a bit vector of length k > log N, where N is an upper bound on the number of nodes

• N could be original number of nodes deployed, or some function of the size of the id space

Discrete Algs for Mobile Wireless Sys

SG for Count Example Systems

• No sensor is actually read for this example.

• Let SG return vector s[1..k], where

• a certain entry is 1

• rest of the entries are 0

• How to decide which entry should be 1:

• entry CT(k), where CT(k) is a random variable that returns value i with probability 1/2i, 1 ≤ i < k.

• How to compute CT(k):

• Toss a fair coin until either the first head occurs or k coin tosses have occurred with no heads; return number of tosses

Discrete Algs for Mobile Wireless Sys

Computation of CT(k) Systems

• Why does the coin-tossing protocol give the desired random variable?

• Proof by Example: Suppose k = 4.

• First toss is H, and 1 is returned, with probability 1/2

• Otherwise, second toss is H, and 2 is returned, with probability 1/4

• Otherwise third toss is H and 3 is returned, with probability 1/8

• (and then 4 is returned with probability 1/8, but the definition of CT(4) only cares about 1 through 3)

Discrete Algs for Mobile Wireless Sys

SF and SE for Count Example Systems

• SF(s,s'):

• s[i] := s[i] OR s'[i], 1 ≤ i ≤ k

• return s

• SE(s):

• return 2i-1/.77351, where i is the minimum index such that s[i] = 0

Discrete Algs for Mobile Wireless Sys

• Suppose all (live) sensors have a failure-free path to the querier.

• The final bit vector to which SE is applied indicates which bit positions have been set by at least one node

• The probability of n nodes failing to set the i-th bit is (1–2i)n by definition of SG

• Thus the number of (live) nodes is proportional to 2i–1

• constant of proportionality is 1/.77351

Discrete Algs for Mobile Wireless Sys

• Alternatively…

• We expect half the nodes to set the 1st bit, a quarter of the nodes to set the 2nd bit, an eighth of the nodes to set the 3rd bit, etc.

• If there are n distinct nodes, then we might expect log n bits to be set

• I.e., if log n = i bits are set, then we might expect there to be about n = 2i nodes

Discrete Algs for Mobile Wireless Sys

• Note that ODI-correctness says nothing about the SE function, only that SE will return the same result as in the canonical tree.

• "Clever algorithms are still required to get provably good approximations, although the task has been simplified…"

• Commutativity, associativity, and idempotence follow from properties of Boolean OR

Discrete Algs for Mobile Wireless Sys

• Why does SG preserve duplicates?

• Assume each node calls SG only once.

• Show that if sensor readings are considered duplicates, then the synopsis generated by SG is the same.

• Since there is no actual sensor reading for this algorithm, we just use ids for the readings.

• Assumption that each node calls SG only once ensures the property.

Discrete Algs for Mobile Wireless Sys

Implicit Acknowledgments Systems

• node u broadcasts its synopsis

• node u snoops (listens to) subsequent broadcasts by its parent nodes (nodes closer to the querying node)

• if the synopsis broadcast by a parent "effectively includes" u's synopsis, u does not need to rebroadcast, otherwise rebroadcast (or adapt the topology)

Discrete Algs for Mobile Wireless Sys

• How can u accurately infer if its broadcasts was "effectively included"?

• Suppose u's synopsis was x and the parent's was z.

• If SF(x,z) = z, then x is effectively included.

• Why? Since SF is commutative, associative, and idempotent, it is a "semi-lattice".

• in a semi-lattice, every 2 elements x and y have a least upper bound z, and SF(x,z) = z = SF(y,z)

• Count example: check if appropriate bits are set

Discrete Algs for Mobile Wireless Sys

• Sources of error:

• communication error: some nodes have no failure-free propagation path to querier

• approximation error: introduced by SG, SF and SE functions.

• defined as relative error of computed answer w.r.t. exact algorithm using the same readings

• Argue that communication error can be made negligible by deploying sensor nodes sufficiently densely

Discrete Algs for Mobile Wireless Sys

• Approximation error analysis for the centralized data stream model work in this model, since synposis is ODI-correct

• canonical left-deep tree corresponds to processing a data stream of sensor readings in a centralized location

• Thus, e.g., Count algorithm has same approximation error guarantees as computed by Flajolet & Martin

Discrete Algs for Mobile Wireless Sys

More Examples Systems

• Max and Min: easy.

• SG is the value, SF takes larger/smaller, SE is identity

• Sum: cf. paper by Considine et al. which adapts Count algorithm

• Average, Standard deviation, Second Moment: cf. paper by Considine et al. which uses Sum

• Count Distinct: modification of Count

Discrete Algs for Mobile Wireless Sys

Uniform Sample Example Systems

• Compute a uniform sample of a given size K of the values occurring at all nodes in the network

• Synopsis: a sample of size K tuples (or fewer initially)

• SG: output (val,r,id) where

• val is the sensor reading of the node

• r is a random number drawn uniformly from [0,1]

• id is the node's id

• SF(s,s'): list the tuples in s U s' in decreasing order of r-value, and output the first K (or all, if less than K total)

• U is set union, removes duplicates

• SE(s): output the set of values in the tuples of s

Discrete Algs for Mobile Wireless Sys

• SG labels each reading with a random number, thus placing it in a random position in the global ordering of all readings

• So taking first K in the ordering gives a uniform sample.

• Uniform sample can then be used…

Discrete Algs for Mobile Wireless Sys

More Examples Systems

• Use uniform samples to compute these aggregates:

• k-th statistical moment (k = 1 is the mean)

• k-th percentile value (k = 50 is the median)

with certain error and probability, by choosing the sample size appropriately (cf. Bar-Yossef et al.)

• Compute the k most frequent values (k = 1 is the mode): run an ODI-correct Count algorithm for each value

Discrete Algs for Mobile Wireless Sys

• If message loss is detected as occurring "too frequently", nodes can adapt the Ring topology

• Idea: use a heuristic that tries to assign a node u to a ring so that there are plenty of ndoes in the next ring to forward u's synopsis to the querier

• implicit acks are used to detect message loss energy-efficiently

• duplicates that occur during the adaptation of the topology are not a problem

Discrete Algs for Mobile Wireless Sys

Simulation Results Systems

• Extensive!

• Synopsis diffusion

• reduces answer errors in lossy environments

• helps address challenges from correlated node failures

• does not use significantly more power

• What topology to use?

• Adaptive Rings has same overhead as Rings but much better accuracy

• Adaptive Rings gets about 90% of the sensor readings most of the time vs. 100% with Flooding, but uses much less power

Discrete Algs for Mobile Wireless Sys

Medians and Beyond Systems[SBAS]

• Extend beyond min/max/sum the class of queries that can be answered in sensor networks to include

• approximate quantiles (including median)

• most frequent data values (including consensus)

• histogram of data distribution

• range queries

• Provide strict theoretical guarantees on the approximation quality of the answers in terms of message size

Discrete Algs for Mobile Wireless Sys

Comparison with Nath Paper Systems

• Some of the same problems are considered

• "Medians and Beyond" is concerned with efficiency of message size and its tradeoff with quality of approximation

• Nath paper was concerned with handling arbitrary ordering and duplicates

• "Medians and Beyond" assumes no duplicates

Discrete Algs for Mobile Wireless Sys

Overview Systems

• Assume we have a tree rooted at the querying node

• To compute Average: each node sends to its parent the sum of thedata values of its descendants and its number of descendants

• constant size messages

• To compute Median, need to keep track of all distinct values

• size of messages, and memory, grows linearly

• Trade off memory and bandwidth with accuracy of approximations

Discrete Algs for Mobile Wireless Sys

Q-Digests Systems

• Assume sensor readings are integers in the range [1,s]

• Introduce q-digest data structure to answer quantile queries with

• messages of size m

• error O((log s)/m)

• Users specify message size vs. error tradeoff

• q-digest measures maximum error accumulated so far

• Once q -digest query is done, use it to compute quantiles, data distribution,…

Discrete Algs for Mobile Wireless Sys

More on q-Digest Systems

• Compute a compressed view of the complete distribution of values (instead of just a function of the values)

• Use this view of the distribution to compute approximations of various functions

• Basic idea: Essentially compute a histogram, but

• equally large, instead of equally spaced, buckets

• buckets can overlap

• size of buckets gives accuracy vs. communication tradeoff

Discrete Algs for Mobile Wireless Sys

Definition of q-Digest Systems

• Group values into variable-sized buckets of almost equal weights

• size refers to range

• weight refers to number of elements

• q-digest consists of a set of buckets

• Build a complete binary tree

• 1,…,s at the leaves

• every tree node is a bucket, its range is all the leaves in its subtree

• At any given point, only some of the buckets are being used

Discrete Algs for Mobile Wireless Sys

Example Systems

1

data range 1-8

15 data items

5 buckets

2

2

4

6

1

2

3

4

5

6

7

8

Discrete Algs for Mobile Wireless Sys

Definition of q-Digest Systems

• Given compression parameter k and number of data items n, a (tree) node v is in the q-digest iff:

• count(v) ≤ n/k

• node should not have a high count

• count(v) + count(parent(v)) + count(sibling(v)) > n/k

• if a node and its children have low total count then combine using Compress algorithm

• For a leaf node, if count > n/k, then it is in the q-digest

• Root only needs to satisfy first condition

Discrete Algs for Mobile Wireless Sys

1 Systems

2

2

4

6

1

2

3

4

5

6

7

8

Example

check that this has k = 5;

n/k = 3

Discrete Algs for Mobile Wireless Sys

• Go through all the tree nodes bottom up

• Check which ones satisfy the 2 properties.

• If a node v has a child that violates 2nd property then merge v with both its children

• Detailed info about values which occur frequently is preserved, while less frequently occurring values are lumped into larger buckets resulting in info loss

Discrete Algs for Mobile Wireless Sys

2 Systems

2

1

1

4

6

1

1

1

4

6

8

1

2

3

4

5

6

8

7

1

2

3

4

5

6

7

1

1

2

2

2

2

4

6

4

6

8

1

2

3

4

5

6

7

8

1

2

3

4

5

6

7

1

• Represent a q-digest by numbering the nodes of the digest tree and sending a set of (node id, count) pairs

• q-digests move up the spanning tree, being merged as they go.

• To merge 2 q-digests:

• take their union

• add the counts of buckets with the same range

• compress the result

• Merging can cause information loss.

Discrete Algs for Mobile Wireless Sys

Analysis of Q-Digest Systems

• Lemma 1: A q-digest with parameter k has size (number of buckets) at most 3k.

• because the count of a node and its children can't be too small

• Lemma 2: In a q-digest with parameter k, the maximum error in the count of any node is n(log s)/k.

• because in the worst case the count of a node can deviate from the actual value by the sum of the counts of its ancestors

• Lemma 3: Merging multiple q-digests gives the same error as in Lemma 2.

Discrete Algs for Mobile Wireless Sys

Quantile Queries Systems

• Problem Statement: Given a fraction q between 0 and 1, find the value whose rank in sorted sequence of the n values is qn.

• Median is when q = 1/2

• Relative error is defined to be |r – qn|/n, where r is the true rank of the returned value

Discrete Algs for Mobile Wireless Sys

• Goal: find q-th quantile

• Sort the nodes of the q-digest in increasing order of max values (right endpoints); break ties by putting smaller ranges first

• this gives post-order traversal of the tree

• Scan sorted list and add up the counts

• Let v be the first node at which the running sum exceeds qn

• Return the max value of node v

Discrete Algs for Mobile Wireless Sys

Error Analysis Systems

• There are at least qn values less than or equal to v.max, by choice of v

• Error comes from values that are less than v.max but are stored in ancestors of v (these buckets are listed after v)

• But this error is at most n(log s)/k

• Note that estimate is always at least as great as the eact answer

Discrete Algs for Mobile Wireless Sys

Example Systems

1

• Find Median (q = 1/2); recall n = 15 so look for 7.5

• Sorted list is (j,4), (k,6), (f,2), (g,2), (a,1)

• Running sums of counts are 4, 10 - done!

• Return max value in tree node k, which is 4

• Error is at most sum of counts on path from k to root, which is 1

a

a through o are the

ids of the digest tree

nodes:

j = [3:3]

k = [4:4]

f = [5:6]

g = [7:8]

a = [1:8]

b

c

2

2

d

e

g

f

4

6

k

m

o

i

j

n

l

h

1

2

3

4

5

6

7

8

Discrete Algs for Mobile Wireless Sys

• Memory and message size are controlled by the compression factor k:

• If k is small, then fewer buckets but wider range of values are lumped together

• If k is large, then more buckets but more fine-grained distribution of values to buckets

• If the maximum number of buckets you can afford is m, then set k = m/3 (by Lemma 1) and get error at most  = 3(log s)/m (by Lemmas 2 and 3)

Discrete Algs for Mobile Wireless Sys

Other Queries Systems

• Inverse Quantile: given a value x, determine its rank in the sorted sequence of input values

• Algorithm:

• construct same sorted list

• traverse list from beginning to end

• return as the answer the sum of the counts of buckets v for which x > v.max.

• Reported rank is between

rank(x) and rank(x) + n

Discrete Algs for Mobile Wireless Sys

Other Queries Systems

• Range Query: find the number of values in the range [low,high].

• Algorithm:

• perform inverse quantile queries to get the ranks of low and high

• return the difference in their ranks

• Maximum error is 2n

Discrete Algs for Mobile Wireless Sys

Other Queries Systems

• Consensus Query: Given a fraction f between 0 and 1, find all values that are reported by more than fn sensors

• Algorithm:

• Find all unit-width (leaf) buckets with count > (f–)n and return their values

• Since a leaf bucket's count has error at most n, this finds all values with frequency more than fn

• There may be some false positives: some values with count between (f–)n and fn may also be reported

Discrete Algs for Mobile Wireless Sys

Confidence Factor Systems

• Worst-case error is 3 (log s)/m, but it is unlikely that an execution will be this bad

• choosing message size m according to this constraint will be overkill and waste bandwidth

• Instead set m to a value for which it is expected that the error bound will be met

• Need to calculate the actual error in each q-digest: called confidence factor

• Define weight of a path: sum of counts of the nodes in the path

• Define confidence factor: maximum weight of any root-to-leaf path, divided by n

Discrete Algs for Mobile Wireless Sys

Simulation Results Systems

• Compared against simple scheme of keeping track of every distinct value together with its count

• q-digest scheme works well

Discrete Algs for Mobile Wireless Sys

Open Questions Systems

• Continuous queries?

• Lost messages?

• Duplicate invariance?

• Include spatial information?

• Optimality of results?

Discrete Algs for Mobile Wireless Sys