probabilistic data aggregation n.
Skip this Video
Loading SlideShow in 5 Seconds..
Probabilistic Data Aggregation PowerPoint Presentation
Download Presentation
Probabilistic Data Aggregation

Loading in 2 Seconds...

play fullscreen
1 / 11

Probabilistic Data Aggregation - PowerPoint PPT Presentation

  • Uploaded on

Probabilistic Data Aggregation. Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004. Motivation. Definition of Data agg. An important function for network infrastructures Exact result not achievable in face of loss and faults

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Probabilistic Data Aggregation' - tyler-mejia

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
probabilistic data aggregation

Probabilistic Data Aggregation

Ling Huang, Ben Zhao, Anthony Joseph

Sahara Retreat

January, 2004

  • Definition of Data agg.
  • An important function for network infrastructures
  • Exact result not achievable in face of loss and faults
  • Low overhead, accurate approximation is crucial in
    • Sensor networks
    • P2P networks
    • Network monitoring and intrusion detection systems
  • But, it’s difficult to achieve
    • Many problems in existing approaches











  • Aggregate functions
    • MIN, MAX, AGG, COUNT, …, etc.
  • In-Network hierarchical processing
    • Reduce overhead
    • Query propagation
    • Tree construction
    • Aggregates calculation
  • Addressing fault-tolerance
    • Multi-root
    • Multi-tree
    • Reliable transmission
problems in existing approaches
Problems in Existing Approaches
  • Few approach is designed to handle data loss and corruption.
    • Simple algorithm for data loss recovery
  • Fragile for large process groups
    • Need all relevant nodes for participation
  • Difficult to trade accuracy for communication overhead
    • Good applications need this tradeoff
    • Only need approximation
    • But, minimize resource consumption
our approach
Our Approach
  • Probabilistic data aggregation: a scalable and robust approach
    • Model loss on links and failures on nodes
    • Apply statistical learning theory (SLT) into aggregation
    • Develop protocol that handles loss and failures as essential part of normal operations
      • Self-repairing algorithm for aggregation tree maintenance
      • Nodes participate in aggregation and communication according to statistical sampling algorithm
      • In the absence of data, estimate value using statistical learning algorithm
design system architecture


Distribution Estimator

Data Predictor


Tree Constructor

Design & System Architecture
  • Building blocks
    • Spanning tree with fault-detection and self-repairing algorithm for tree construction and maintenance
    • Statistical sampling for low-overhead and scalability without much loss of accuracy
    • Distribution estimation to provide information for work load analysis, data prediction and outlier detection
    • Data prediction to compensate the data loss in sampling, as well as the uncontrolled loss on links
statistical sampling
Statistical Sampling
  • A simple approach: sampling on the agg. tree
    • Every child node report the aggregation result of its subtree to its parent with certain probability, which is the design parameter of the algorithm
    • Low overhead of in control traffic and easy for implementation.
    • Might result in high data loss close to the root
  • Distribution of sampling rate on the tree
    • Uniform distribution on each level
    • Linear distribution on each level
    • Proportional to the number of nodes on its subtree
    • Value-based sampling
prediction algorithm
Prediction Algorithm
  • Naive algorithm: use value in previous epoch as current one.
  • Linear Prediction: linear algorithm with Minimum Mean Square Estimation (MMSE)


  • More sophisticate algorithm like Kalman Filter can be used to achieve better prediction results.
the protocol
The Protocol
  • Tree construction and query propagation start from root of the query
  • Aggregates are computed in each epoch from bottom up
  • When a node receives data from a child, it updates the distribution statistics based on the distribution estimator.
  • If a node receives data from all its children in the epoch, it does a normal data aggregation.
  • If a node doesn't receive data from a child at the end of epoch, it does a data prediction to estimate a value, and then performs the aggregation.
  • Aggregates are report from children to parents with certain probability.
  • If necessary, a node might performance outlier detection on the data from a child. However
    • It is very danger to discard a data
    • Assume neighbor nodes has physical locality, a parent can use both temporal and spatial statistics to do the outlier detection.
future work
Future Work
  • Integrated optimization by combining tree construction with statistical learning theory
    • Sampling on graph before tree construction
  • Non-linear estimation algorithm for data prediction
  • Evaluation of outlier detector in data aggregation
  • System implementation
  • System deployment and evaluation in real environment