a generic framework for handling uncertain data with local correlations l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
A Generic Framework for Handling Uncertain Data with Local Correlations PowerPoint Presentation
Download Presentation
A Generic Framework for Handling Uncertain Data with Local Correlations

Loading in 2 Seconds...

play fullscreen
1 / 26

A Generic Framework for Handling Uncertain Data with Local Correlations - PowerPoint PPT Presentation


  • 69 Views
  • Uploaded on

A Generic Framework for Handling Uncertain Data with Local Correlations. Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong Kong University of Science and Technology Clear Water Bay, Kowloon Hong Kong, China { xlian , leichen } @cse.ust.hk.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A Generic Framework for Handling Uncertain Data with Local Correlations' - enye


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a generic framework for handling uncertain data with local correlations

A Generic Framework for Handling Uncertain Data with Local Correlations

Xiang Lian and Lei Chen

Department of Computer Science and Engineering

The Hong Kong University of Science and Technology

Clear Water Bay, Kowloon

Hong Kong, China

{xlian, leichen}@cse.ust.hk

VLDB 2011 @ Seattle

motivation example

Sensory data: <temperature, light>

Motivation Example
  • Forest monitoring application

forest

VLDB 2011 @ Seattle

motivation example cont d
Motivation Example (cont'd)
  • Samples si collected from sensor node ni

VLDB 2011 @ Seattle

motivation example cont d4
Motivation Example (cont'd)
  • Sensory data are uncertain and imprecise

uncertainty regions

VLDB 2011 @ Seattle

motivation example cont d5
Motivation Example (cont'd)
  • 3 monitoring areas

forest

VLDB 2011 @ Seattle

motivation example cont d6
Motivation Example (cont'd)
  • 3 monitoring areas

forest

sensors far away

spatially close sensors

VLDB 2011 @ Seattle

locally correlated sensory data
Locally Correlated Sensory Data

Area 2

Efficient Query Answering on Locally Correlated Uncertain Data

Area 3

Area 1

VLDB 2011 @ Seattle

outline
Outline
  • Introduction
  • Model for Locally Correlated Uncertain Data
  • Problem Definition
  • Query Answering on Uncertain Data With Local Correlations
  • Experimental Evaluation
  • Conclusions

VLDB 2011 @ Seattle

introduction
Introduction
  • Uncertain data are pervasive in real applications
    • Sensor networks
    • RFID networks
    • Location-based services
    • Data integration
  • While existing works often assume the independence among uncertain objects,
    • Uncertain objects exhibit correlations

local correlations!

VLDB 2011 @ Seattle

data model for local correlations
Data Model for Local Correlations
  • Data Model
    • Uncertain objects contain several locally correlated partitions (LCPs)
      • Uncertain objects within each LCP are correlated with each other
      • Uncertain objects from distinct LCPs are independent of each other

VLDB 2011 @ Seattle

data model for local correlations cont d
Data Model for Local Correlations (cont'd)
  • Bayesian network
    • Each vertex corresponds to a random variable
    • Each vertex is associated with a conditional probability table (CPT)

VLDB 2011 @ Seattle

data model for local correlations cont d13
Data Model for Local Correlations (cont'd)
  • The joint probability of variables
    • Join tuples in CPTs and multiply conditional probabilities
    • Variable elimination

VLDB 2011 @ Seattle

definition of lc pnn query
Definition of LC-PNN Query
  • Probabilistic Nearest Neighbor Query on Uncertain and Locally Correlated Data, LC-PNN

VLDB 2011 @ Seattle

challenges solutions
Challenges & Solutions
  • Challenges
    • Straightforward method of linear scan is costly
    • Computation cost of integration is expensive
    • Dealing with data correlations
  • Filtering Methods
    • Index pruning
    • Candidate filtering with pre-computations

VLDB 2011 @ Seattle

index pruning
Index Pruning
  • Basic idea
    • Let best_so_far be the smallest maximum distance from query point q to any uncertain objects seen so far
    • Then, any objects/nodes e having mindist(q, e) > best_so_far can be safely pruned

best_so_far

VLDB 2011 @ Seattle

candidate filtering with pre computations
Candidate Filtering with Pre-Computations
  • Basic idea
    • Obtain an upper bound, UB_PrLC-PNN(q, oi), of the LC-PNN probability
    • Object oi can be safely pruned, if UB_PrLC-PNN(q, oi) < a

How to obtain the probability upper bound?

Derived from formula of the LC-PNN probability upper bound via pivots!

VLDB 2011 @ Seattle

derivation of probability upper bound
Derivation of Probability Upper Bound

pivotpivs5

l

VLDB 2011 @ Seattle

range min l max l of l
Range [min_l, max_l] of l
  • l=
  • Let min_l = and

max_l =

  • If online l is smaller than min_l, then JPo(s5) = 1
  • If online l is greater than max_l , then JPo(s5) = 0
  • Thus, we do not need to store pre-computations with l outside the range [min_l, max_l]

VLDB 2011 @ Seattle

selection of pivot positions
Selection of Pivot Positions
  • We provide a cost model to formalize the filtering and refinement costs, and obtain a good value of parameter d to achieve low query cost

VLDB 2011 @ Seattle

lc pnn query procedure
LC-PNN Query Procedure
  • Index uncertain objects containing LCPs in an R-tree based index
  • For an LC-PNN query
    • When traversing the index, apply index pruning method and candidate filtering to remove false alarms
  • Refine candidates and return true query answers

VLDB 2011 @ Seattle

experimental evaluation
Experimental Evaluation
  • Data Sets
    • Real data: California road network
    • Synthetic data: lUeU, lUeG, lSeU, and lSeG
      • Generate center locations of LCPs with Uniform or Skew distribution
      • Produce extent lengths of LCPs with Uniform or Gaussian distribution
      • Within LCPs, randomly generate locally correlated uncertain objects with Bayesian networks
  • Competitor
    • Basic method [Cheng et al., SIGMOD 2003]
      • Assuming uncertain objects are independent
  • Measures
    • Wall clock time
    • Speed-up ratio

VLDB 2011 @ Seattle

lc pnn performance vs a
LC-PNN Performance vs. a

Extent length of LCP = [1, 3], data size N = 150K, average No. of uncertain objects in an LCP = 5

VLDB 2011 @ Seattle

conclusions
Conclusions
  • We proposed the problem of queries over locally correlated uncertain data, in particular, the LC-PNN query, which is important in real applications
  • We designed the index pruning method, and based on a proposed cost model, we presented the candidate filtering method via offline pre-computations w.r.t. pivots
  • We provided efficient query processing techniques to answer LC-PNN queries on locally correlated uncertain data, and discussed applying the same framework to answer other types of queries.

VLDB 2011 @ Seattle

thank you

Thank you!

Q/A

VLDB 2011 @ Seattle