- 46 Views
- Uploaded on
- Presentation posted in: General

Processing Transitive Nearest-Neighbor Queries in Multi-Channel Access Environments

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Processing Transitive Nearest-Neighbor Queries in Multi-Channel Access Environments

Xiao Zhang1, Wang-Chien Lee1, Prasenjit Mitra1, 2, Baihua Zheng3

1 Department of Computer Science and Engineering

2 College of Information Science and Technology

The Pennsylvania State University

3 School of Information Systems, Singapore Management University

EDBT, Nantes, France, 03/28/2008

- Background
- Problem Analysis
- New TNN Algorithms
- Optimization
- Experiments
- Conclusions & Future Work

- What is TNN?
- S is a set of banks
- R is a set of restaurants
- TNN distance = 5+1 = 6

- What is TNN?
- Given a query point p and two datasets S and R, TNN returns a pair of objects (s, r) such that ∀(s’, r’)∈S×R,
dis(p, s) + dis(s, r) ≤ dis(p, s’) + dis(s’, r’)

where dis(p,s) is the Euclidean distance between p and s.

- First proposed by Zheng, Lee and Lee [1].

[1] B. Zheng, K.C.Lee and W.-C.Lee. Transitive nearest neighbor search in mobile environments. SUTC 2006

- Server has all the data and broadcasts data in forms of radio signals in channels.
- Mobile clients (cell phones and PDAs) tune in to broadcast channels, download necessary data and process queries.

- Broadcast VS. on-demand
- Support an arbitrary number of mobile devices to have simultaneous access
- Efficient use of limited bandwidth
- Light workload on the server side

- Assumption:
- Zheng, Lee and Lee assumed a single broadcast channel.
- Based on existing technology (dual-mode, dual-standby cell phone), we assume multiple channels.
- A mobile client can access information in multiple channels simultaneously

- Challenges:
- How to utilize the parallel processing ability of mobile clients to facilitate query processing?
- How to reduce access time?
- How to reduce energy consumption?

- 1. We developed two new algorithms for TNN query in multi-channel access environment.
- 2. We proposed two new distance metrics (MinTransDist and MinMaxTransDist) so that our new algorithms efficiently reduce search cost.
- 3. We proposed an optimization technique to reduce energy consumption.

- 1. Two broadcast channels, for S and R
- 2. 2-dim points
- 3. Air-indexing: R-tree[2]
- 4. Broadcast in depth-first order, in order to avoid back-tracking
- 5. (1, m) interleaving [3]
- 6. performance metrics (in # of pages):
- Access time
- Tune-in time

[2] A. Guttman. R-trees: a dynamic index structure for spatial searching. inSigmod’84

[3] T.Imielinski, S.Viswanathan, and B.Badrinath. Data on air: organization and access. TKDE 1997

- Randomly choose ANY pair of objects (s’, r’ ), use the trans. dist. as a search range
- Guarantee to enclose the answer pair (s, r)

- Theorem[1]:
- the transitive distance determined by any pair of objects (s, r) is an upper bound.

- General ideas of answering TNN queries:
- Estimate: find a search range from the query point p by searching the index
- Filter:filter unqualified data objects in the search range determined earlier to find the pair of objects with minimum transitive distance.

- Deficiencies of existing algorithms:
- Approximate-TNN-Search:
- Uses an equation to estimate the search range in the first step
- Search range may be too large or too small

- Window-Based-TNN-Search:
- Two sequential NN searches in estimation step
- Search range estimation is done in sequential order
- Large access time

- Approximate-TNN-Search:

- Algo 1: Double-NN-Search
- Issue two NN queries in estimation step
- p’s NN in S, and p’s NN in R
- (s1, r2)

- Hybrid-NN-Search
- Increases interaction between two channels
- Uses result of the finished NN to guide the unfinished NN in order to reduce search range
- Uses new distance metrics to perform branch-and-bound
- Treat TNN distance as a whole

- NN in Channel 1 finishes first
- Already found s=p.NN(S)
- Looking for r2, instead of r1

- NN in channel 2 finishes first
- Already found r=p.NN(R)
- Looking for s2 instead of s1
- Use new criteria when searching the index
- Need new distance metrics for branch&bound

- MinTransDist:
- Lower bound for trans. dist. from p to an MBR to r.

- MinMaxTransDist:
- Upper bound for trans. dist. from p to an MBR to r.

- Details given in the paper.

- Algorithm description:
- If the two NN searches in both channels are not finished, follow the Double-NN algorithm
- If the NN search in Channel 1 (Dataset S) finishes first, let s=p.NN(S), use s as the new query point and perform NN on the remaining portion of R-tree for dataset R.
- If the NN search in Channel 2 (Dataset R) finishes first, change distance metrics, use MinTransDist and MinMaxTransDist to perform branch-and-bound. Find an s which can minimize the transitive distance.

- Updating and pruning strategy
- Use queue to keep potential MBRs, sorted based on their arrival time
- Case 2 (s=p.NN(S) finishes first):
- Switch NN query point to the s
- Initial upper bound update
- If there is an intermediate result r’, update the upper bound with dis(p, s)+dis(s, r’ )
- Scan the queue of MBRs and use dist. metr. in traditional NN queries.

- Updating and pruning strategy (cont.)
- Case 3 (r=p.NN(R) finishes first):
- If there is an intermediate result s’, use
dis(p, s’)+dis(s’, r) as the new upper bound

- Then scan all the MBRs in the queue, use
z=minMi∈MBR_queue{MinMaxTransDist(p, Mi, r)} to update the upper bound.

- In traversal, use MinMaxTransDist to update the upper bound; use MinTransDist for pruning

- If there is an intermediate result s’, use

- Case 3 (r=p.NN(R) finishes first):

- Example for pruning:

- Goal: reduce energy consumption
- Analysis:
- Previous algorithms minimize the search range in the Estimate Step by issuing “exact” search
- Energy consumption in Filter Step is low
- Energy consumption in Estimate Step is high

- Approach:
- use “approximate” search in Estimate Step to save energy in this step

- Approximate Search:
- Relax the pruning condition
- Use ratio of overlapping area to estimate the probability
- Compare the ratio with a threshold α

- How to determine α？
- factors:
- R-tree height and node depth
- Use small α on the root and large α on leaves

- Difference in densities of the two datasets involved
- Small α or 0on the dataset with smaller density

- R-tree height and node depth

exact search

approximate search

0

α

1

- Dataset 1:
- 39,000 * 39,000 square region
- Densities: 10-7.0, 10-6.6, 10-6.2, 10-5.8, 10-5.4, 10-5.0, 10-4.6, 10-4.2
- # of points: 152, 382, 960, 2411, 6055, 15210, 38206, 95969

- Dataset 2:
- 39,000 * 39,000 square region
- # of points: 2,000 – 30,000 with 2,000 increment

- R-tree as air index
- Broadcast in depth-first order
- STR packing algorithm [3]
- (1, m) interleaving [2]
- 1,000 query points generated for each of the experiments

[3] S.Leutenegger, M.Lopez and J.Edginton. Str: a simple and efficient algorithm for r-tree packing. ICDE 1997

[2] T.Imielinski, S.Viswanathan, and B.Badrinath. Data on air: organization and access. TKDE 1997

- Algorithms with exact search:
- Access time: Double-NN and Hybrid-NN have the same access time, which is smaller than Window-Based
- 1.8≥ size(S) / size(R) ≥ 1 / 40

- Algorithms with exact search:
- Tune-in time: when 0.01 ≤ size(S)/size(R) ≤ 0.4 Hybrid-NN gives the best tune-in time

- ANN vs. eNN
- Improvement in tune-in time ranges from 11%-20%

- Hybrid algorithm with ANN:

- Double-NN and Hybrid-NN effectively reduce access time
- Cases in which our algorithms reduces tune-in time are stated and discussed
- Optimization technique effectively reduces tune-in time of all three algorithms

- Generalized TNN queries in broadcast environment:
- More than 2 datasets are involved
- Visiting order not specified
- Complete route query

- Using new distance metrics in disk based environment

- Any questions?

- Def 1: (MinTransDist)
- Given two points p and r, and an MBR MS, MinTransDist(p, MS ,r) finds a point s on MS such that MinTransDist(p, MS ,r)=dis(p, s)+dis(s, r) and for any point s’≠ s, s’ ∈MS
dis(p, s’)+dis(s’, r) ≥ MinTransDist(p, MS ,r)

- Given two points p and r, and an MBR MS, MinTransDist(p, MS ,r) finds a point s on MS such that MinTransDist(p, MS ,r)=dis(p, s)+dis(s, r) and for any point s’≠ s, s’ ∈MS

- Def 2: (MaxDist)
- Given two points p and r, and a line segment ℓ, MaxDist(p, ℓ, r) = maxi=I,2 {dis(p, vi)+dis(vi, r), where vi, (i=1, 2) are the two end points of ℓ
- MaxDist(p, ℓ, r) gives a tight upper bound for all the transitive distances from p to any points on ℓ, to r.

ℓ

r

p

- Def 3: (MinMaxTransDist)
- Given two points p and r, and an MBR MS, MinMaxTransDist(p, MS, r) = min1≤i≤4{ MaxDist(p,ℓi, r ) } where ℓi (1≤i≤4) are the four sides of MBR MS

- Lemma:
- Given a starting point p, an ending point r, and an MBR MSenclosing a point dataset S, ∃s ∈ S, such that dis(p, s)+dis(s, r) ≤ MinMaxTransDist(p, MS, r)