1 / 25

Medians and Beyond: New Aggregation Techniques for Sensor Networks

Medians and Beyond: New Aggregation Techniques for Sensor Networks. CS851 Seminar Presentation. Outline. Motivations, State of Art, Contributions The Q-Digest Scheme Queries on Q-Digest Experimental Evaluation Conclusions Be prepared! I have questions for you!. Motivations.

aidan-neal
Download Presentation

Medians and Beyond: New Aggregation Techniques for Sensor Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation

  2. Outline • Motivations, State of Art, Contributions • The Q-Digest Scheme • Queries on Q-Digest • Experimental Evaluation • Conclusions Be prepared! I have questions for you!

  3. Motivations • Trade Computation for Communication • Transmitting one bit over radio is at least three orders of magnitude more expensive in terms of energy consumption than executing a single instruction • Support Aggregation Queries • Need aggregated answer, not a single raw reading • Quantile query • Nthvalue • Reverse quantile query • Value  Nth • Consensus query • Most frequent? • Histogram

  4. State of Art • TinyDB project in Berkeley & Cougar project in Cornell • Pros: • Energy efficient in-network data aggregation • Work very well in singleton sensor values • MIN, MAX, AVERAGE, SUM, COUNT • Cons: • Do not deal with complex aggregate measures • Median, Quantile, Reverse Quantile, Consensus • [Zhao et. al. 2003] • Algorithms for constructing summaries like MAX, AVG • Focus more on network monitoring and maintenance • [Przydatek et. al. 2003] • Secure aggregation

  5. Contributions • Propose Q-Digest for Approximated Aggregation • Provide Strict Theoretical Guarantees on the Approximation Quality of the Queries in Terms of the Message Size • Evaluate the performance of Q-Digest in Simulation

  6. Roadmap • Motivations, State of Art, Contributions • The Q-Digest Scheme • Queries on Q-Digest • Experimental Evaluation • Conclusions and Discussions

  7. Properties of Q-Digest • Each node v in tree T is a bucket; • Whose range [v.min, v.max] defines the position and width of the bucket; • Has counter count(v); • Given the compression parameter K, a node v is in q-digest iff it satisfies: • (1) If not a leaf, no high count; • (2) If not the root, a node and its children should not have low count; • A q-digest is a set of buckets of different sizes and their associated counts;

  8. Building a Q-Digest • Going bottom up to check whether any node violates digest property (2) • If yes, delete itself and its sibling, and merge to its parent; • Key feature of q-digest: Detailed information concerning data values which occur frequently are preserved in the digest, while less frequently occurring values are lumped into larger buckets resulting in information loss.

  9. Merging Q-Digest • Parent node merge Q1(n1,K) and Q2(n2,K) from children How about merging Q1(n1,k1) and Q2(n2,K2)? • Each node has different communication ability • Each node has different power level • Powerful node can have bigger K while less powerful node can have smaller K value. Can we still get the same accuracy? Is that feasible?

  10. What dos it mean 3K? 3K bites? Space Complexity and Error Bound (1/4) The root node does not satisfy property (2).?? 3K means 3K <nodeID(v), count(v)> pairs

  11. Space Complexity and Error Bound (2/4) What about the leaf node, which does not satisfy property (1)? It doesn’t matter, because a leaf node is not the ancestor of any node.

  12. Space Complexity and Error Bound (3/4)

  13. Space Complexity and Error Bound (4/4)

  14. Representation of a Q-Digest • Now to transmit the q-digest we send a set of tuple of the following form <nideID(v), count(v)> which requires a total of bits for each tuple.

  15. Roadmap • Motivations, State of Art, Contributions • The Q-Digest Scheme • Queries on Q-Digest • Experimental Evaluation • Conclusions and Discussions

  16. Quantile Query(1/3) • Quantile query: • Given a fraction 0<q<1, find the value whose rank in sorted sequence of the n values is qn. • Answer the query: • Sort nodes in q-digest in increasing v.max; breaking ties by putting smaller ranges first; • Scan the sorted list and add the counts of nodes; • For some node v, the sum becomes more than qn, and the v.max is reported as the estimate of the quantile;

  17. Quantile Query(2/3) • The confidence factor • Why need this? • is the worst case error estimation, which only occurs for a very pathological input case • What is it? • Confidence factor is defined as: (maximum weight of any path from root to leaf in Q)/n

  18. Confidence Factor Example • N=15, k=5, =8 1 1 5 7 3 3 3 3 (maximum weight of any path from root to leaf in Q)/n = 7/15 < = 3 * log8 / 3K = 3*3/3*5 = 9/15

  19. Roadmap • Motivations, State of Art, Contributions • The Q-Digest Scheme • Queries on Q-Digest • Experimental Evaluation • Conclusions and Discussions

  20. Performance Evaluation • Settings • Routing tree • Breadth first search tree • Sensor field • 1000 x 1000 area with 1000 sensor nodes • 2000 x 2000 area with 4000 sensor nodes • Sensor value • Random • Correlated : • United States Geological Survey • Compare with List scheme: • List: Report all (value, count) back to base station; no in-network aggregation;

  21. Error and Message Size • 160 bytes message size can get 5% error • 400 bytes message size can get 2% error

  22. Total Data Transmission • Q-digest transmit less data than list • Random input needs more transmission than correlated data

  23. Residual Power • For every byte transmitted, one unit of 40000 unit of power is depleted. • (How about reception?) • In List, 0.02% nodes have residual power fraction less than ½. • (???)

  24. Conclusions • Propose Q-Digest for Approximated Aggregation • Provide Strict Theoretical Guarantees on the Approximation Quality of the Queries in Terms of the Message Size • Evaluate the performance of Q-Digest in Simulation

  25. Thank you!

More Related