1 / 33

A*-tree: A Structure for Storage and Modeling of Uncertain Multidimensional Arrays

A*-tree: A Structure for Storage and Modeling of Uncertain Multidimensional Arrays . Presented by: ZHANG Xiaofei March 2, 2011. Outline. Motivation Modeling correlated uncertainty Construction of A*-tree Analysis of A*-tree Query processing Experiments. Outline. Motivation

faolan
Download Presentation

A*-tree: A Structure for Storage and Modeling of Uncertain Multidimensional Arrays

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A*-tree: A Structure for Storage and Modeling of Uncertain Multidimensional Arrays Presented by: ZHANG Xiaofei March 2, 2011

  2. Outline • Motivation • Modeling correlated uncertainty • Construction of A*-tree • Analysis of A*-tree • Query processing • Experiments

  3. Outline • Motivation • Modeling correlated uncertainty • Construction of A*-tree • Analysis of A*-tree • Query processing • Experiments

  4. Motivation • Multidimensional arrays • Suit for scientific and engineering applications • Logically equivalent to relational tables <A1,A2,…,An> D2 D1 A cell of the multidimensional arrays: (A1,A2,…,Ak, D1,D2,…Dd)

  5. Motivation (Cont’d) • Uncertain data • Inevitable • Two categories

  6. Motivation (Cont’d) • Correlated uncertain data • Examples: Geographically distributed sensors More applications examples can be found in router’s network traffic analysis, quantization of image or sound, etc.

  7. Outline • Motivation • Modeling correlated uncertainty • Construction of A*-tree • Analysis of A*-tree • Query processing • Experiments

  8. Modeling Correlated Uncertainty • PGM: Probabilistic Graphical Model • Bayesian network Limitations: Prior knowledge and initial probabilities Significant computational cost(NP hard)

  9. Modeling Correlated Uncertainty (Cont’d) • PGM: Probabilistic Graphical Model • Markov Random Fields A graphical model in which a set of random variables have a Markov property described by an undirected graph Pros: cyclic dependencies Cons: no induced dependencies NP hard to compute

  10. Modeling Correlated Uncertainty (Cont’d) • Considering the locality of correlation • E.g. a 2-dimensional arrays

  11. Outline • Motivation • Modeling correlated uncertainty • Construction of A*-tree • Analysis of A*-tree • Query processing • Experiments

  12. Construction of A*-tree • Basic A*-structure k-ary tree: k=2^d, where d is the number of correlated dimensions Each leaf contains the joint distribution of four neighboring cells it maps to The joint distribution at each internal node is recursively defined

  13. Construction of A*-tree (Cont’d) • Joint distribution at a node X1 X2 Y=(X1+X2+X3+X4)/4 Xi=Y(1+Fi) X3 X4 Fi range k, r entries in distribution table, l bits to present probability

  14. Construction of A*-tree (Cont’d) • Extension of A*-tree • Uneven dimensional size • 2k+1 partitioned as k and k+1 • Shorter dimension stops partition first, with partition of longer dimension goes on

  15. Construction of A*-tree (Cont’d) • Extension of A*-tree • Basic uncertainty blocks of arbitrary shapes • Each cell is intuitively the basic uncertain block, however, maybe this granularity is too fine • Initial identification of uncertainty blocks is user and application specified

  16. Outline • Motivation • Modeling correlated uncertainty • Construction of A*-tree • Analysis of A*-tree • Query processing • Experiments

  17. Analysis of A*-tree • Natural mapping from A*-tree to Bayesian Network

  18. Analysis of A*-tree (Cont’d) • How A*-tree model express the neighboring correlation • From the perspective of any random query, the average level where cell correlation is encoded is low. (efficient inference & accurate modeling)

  19. Analysis of A*-tree (Cont’d) • Neighboring cells and clustering distance • Definition

  20. Analysis of A*-tree (Cont’d) • Neighboring cells and clustering distance

  21. Analysis of A*-tree (Cont’d) • CD (Clustering Distance) • For any query that may return q pairs of neighboring cells Expected average CD e.g. for 1024*1024 array, h=10, then E(argCD )~ 1.01

  22. Analysis of A*-tree (Cont’d) • Accuracy vs. Efficiency • Double “flip” • Polynomial time scan O(d*n) • Consider basic uncertainty block

  23. Outline • Motivation • Modeling correlated uncertainty • Construction of A*-tree • Analysis of A*-tree • Query processing • Experiments

  24. Query Processing • Monte Carlo based query processing • Sampling Q: select avg(brightness) From space_image Where Dis(x,y,z,322,108,251)<50

  25. Query Processing (Cont’d) • Compared with MRF • MRF require sequenced round sampling • Each sample node is computed from all the nodes

  26. Query Processing (Cont’d) • Other queries • COUNT, AVG and SUM • Minimum Set Cover • Build-in cell-count function • Effectively query answering

  27. Outline • Motivation • Modeling correlated uncertainty • Construction of A*-tree • Analysis of A*-tree • Query processing • Experiments

  28. Experiments • Data set description • Evaluations • Accuracy of modeling the underlying joint distribution • Execution time • Aggregate query • Space cost

  29. Experiments (Cont’d) • Accuracy

  30. Experiments (Cont’d) • Accuracy

  31. Experiments (Cont’d) • Execution time

  32. Experiments (Cont’d) • Aggregate query and space cost

  33. Thank you! Q&A

More Related