Loading in 2 Seconds...

A*-tree: A Structure for Storage and Modeling of Uncertain Multidimensional Arrays

Loading in 2 Seconds...

- 94 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' A*-tree: A Structure for Storage and Modeling of Uncertain Multidimensional Arrays ' - faolan

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### A*-tree: A Structure for Storage and Modeling of Uncertain Multidimensional Arrays

OutlineOutlineOutlineOutline

Presented by: ZHANG Xiaofei

March 2, 2011

Outline

- Motivation
- Modeling correlated uncertainty
- Construction of A*-tree
- Analysis of A*-tree
- Query processing
- Experiments

Outline

- Motivation
- Modeling correlated uncertainty
- Construction of A*-tree
- Analysis of A*-tree
- Query processing
- Experiments

Motivation

- Multidimensional arrays
- Suit for scientific and engineering applications
- Logically equivalent to relational tables

<A1,A2,…,An>

D2

D1

A cell of the multidimensional arrays: (A1,A2,…,Ak, D1,D2,…Dd)

Motivation (Cont’d)

- Uncertain data
- Inevitable
- Two categories

Motivation (Cont’d)

- Correlated uncertain data
- Examples: Geographically distributed sensors

More applications examples can be found in router’s network traffic analysis, quantization of image or sound, etc.

Outline

- Motivation
- Modeling correlated uncertainty
- Construction of A*-tree
- Analysis of A*-tree
- Query processing
- Experiments

Modeling Correlated Uncertainty

- PGM: Probabilistic Graphical Model
- Bayesian network

Limitations:

Prior knowledge and initial probabilities

Significant computational cost(NP hard)

Modeling Correlated Uncertainty (Cont’d)

- PGM: Probabilistic Graphical Model
- Markov Random Fields

A graphical model in which a set of random variables have a Markov property described by an undirected graph

Pros: cyclic dependencies

Cons: no induced dependencies

NP hard to compute

Modeling Correlated Uncertainty (Cont’d)

- Considering the locality of correlation
- E.g. a 2-dimensional arrays

- Motivation
- Modeling correlated uncertainty
- Construction of A*-tree
- Analysis of A*-tree
- Query processing
- Experiments

Construction of A*-tree

- Basic A*-structure

k-ary tree: k=2^d, where d is the number of correlated dimensions

Each leaf contains the joint distribution of four neighboring cells it maps to

The joint distribution at each internal node is recursively defined

Construction of A*-tree (Cont’d)

- Joint distribution at a node

X1

X2

Y=(X1+X2+X3+X4)/4

Xi=Y(1+Fi)

X3

X4

Fi range k, r entries in distribution table, l bits to present probability

Construction of A*-tree (Cont’d)

- Extension of A*-tree
- Uneven dimensional size
- 2k+1 partitioned as k and k+1
- Shorter dimension stops partition first, with partition of longer dimension goes on

Construction of A*-tree (Cont’d)

- Extension of A*-tree
- Basic uncertainty blocks of arbitrary shapes
- Each cell is intuitively the basic uncertain block, however, maybe this granularity is too fine
- Initial identification of uncertainty blocks is user and application specified

- Motivation
- Modeling correlated uncertainty
- Construction of A*-tree
- Analysis of A*-tree
- Query processing
- Experiments

Analysis of A*-tree

- Natural mapping from A*-tree to Bayesian Network

Analysis of A*-tree (Cont’d)

- How A*-tree model express the neighboring correlation
- From the perspective of any random query, the average level where cell correlation is encoded is low. (efficient inference & accurate modeling)

Analysis of A*-tree (Cont’d)

- Neighboring cells and clustering distance
- Definition

Analysis of A*-tree (Cont’d)

- Neighboring cells and clustering distance

Analysis of A*-tree (Cont’d)

- CD (Clustering Distance)
- For any query that may return q pairs of neighboring cells

Expected average CD

e.g. for 1024*1024 array, h=10, then

E(argCD )~ 1.01

Analysis of A*-tree (Cont’d)

- Accuracy vs. Efficiency
- Double “flip”
- Polynomial time scan O(d*n)
- Consider basic uncertainty block

- Motivation
- Modeling correlated uncertainty
- Construction of A*-tree
- Analysis of A*-tree
- Query processing
- Experiments

Query Processing

- Monte Carlo based query processing
- Sampling

Q: select avg(brightness)

From space_image

Where

Dis(x,y,z,322,108,251)<50

Query Processing (Cont’d)

- Compared with MRF
- MRF require sequenced round sampling
- Each sample node is computed from all the nodes

Query Processing (Cont’d)

- Other queries
- COUNT, AVG and SUM

- Minimum Set Cover
- Build-in cell-count function
- Effectively query answering

- Motivation
- Modeling correlated uncertainty
- Construction of A*-tree
- Analysis of A*-tree
- Query processing
- Experiments

Experiments

- Data set description
- Evaluations
- Accuracy of modeling the underlying joint distribution
- Execution time
- Aggregate query
- Space cost

Experiments (Cont’d)

- Accuracy

Experiments (Cont’d)

- Accuracy

Experiments (Cont’d)

- Execution time

Experiments (Cont’d)

- Aggregate query and space cost

Q&A

Download Presentation

Connecting to Server..