190 likes | 290 Views
Approximate Range Searching in External Memory. Micha Streppel TU Eindhoven NCIM- Groep , the Netherlands and Ke Yi AT&T Labs, USA HKUST, Hong Kong. Range Searching. A set S of N points in R d
E N D
Approximate Range Searching in External Memory MichaStreppel TU Eindhoven NCIM-Groep, the Netherlands and Ke Yi AT&T Labs, USA HKUST, Hong Kong
Range Searching • A set S of N points in Rd • Build a data structure such that given a query range Q, S ∩ Q can be returned efficiently Q focus on range reporting, range aggregation in paper
External Memory Model Memory size: M I/O block size: B Disk size: infinite
Range Searching in External Memory • 1D:B-tree • Size: O(N/B), Query: O(logB(N/B)+k/B)) • 2D: • Half planes [Agarwal et al. 2000] • Size: O(N/B), Query: O(logB(N/B)+k/B)) • Orthogonal rectangles [Arge et al. 1999] • Size: O(N/B), Query: Θ((N/B)ε+k/B) • Query: O(logB(N/B)+k/B)) , Size: Θ((N/B) log(N/B)/loglogBN) • kdB-tree [Robinson 1981] • Size O(N/B), Query: O((N/B)½ + k/B) Q Q Q Exact range searching is difficult!
Approximate Range Searching radius = ε · diam(Q) • Internal memory: • BBD-tree [Arya and Mount, 1995] • BAR-tree [Duncan et al. 2001] • Size: O(N), Query: O(log(N) + 1/ε + kε) for any convex Q • External memory: this paper! Q
Externalization previously Query bounds of linear structures in internal/external memory
Externalizing the kd-tree B = 3 Internal memory: O(N½ + k) External memory: O((N/B)½ + k/B) for orthogonal rectangle ranges
The BAR-Tree [Duncan et al. 2001] • A space-partitioning scheme • Similar to kd-tree • But also use diagonal cuts • All cells are convex and fat • Some cuts have to be unbalanced • But no two consecutive unbalanced cuts • Height: O(log N) • Query range intersects O(log(N) + 1/ε + kε) cells(any convex range)
Blocking the BAR-Tree • Top-down blocking • Rules for u: • Check u’s two subtreesT1, T2 • Add u if both have≥ B/2 nodes • If T1 small, check if entire T1 fits • then add T1 • else do not add u • Not possible for both T1 and T2 to be small B = 8
Blocking the BAR-Tree Any subtreeTu is stored in O(|Tu|/B+1) blocks
I/O Analysis of a Query organized in O(1/ε)subtrees Qε Q nodes completelyinside Qε nodes intersectsboth Q and ∂Qε total #: O(kε) total #: O(1/ε) total I/O: O(1/ε) total I/O: O(1/ε + kε/B)
I/O Analysis of a Query There are O(log N) such nodes, but we would like O(logBN) I/Os
Current Blocking Not Sufficient size = B/2 − 1
Regrouping Shallow Subtrees • Identify shallow nodes top-down • u is shallow if there is a path of length log(B) beneath u is stored in more than c blocks • For such a u • Do a BFS for log(B) levels • Move these nodes from their original blocks to a new block size = B/2 − 1 Achieving the desired query I/O: O(logB(N/B) + 1/ε + kε/B)
Construction and Update • Construction: O(N/B ·logM/B(N/B)) I/Os • Same as sorting • Insertions and deletions • Use partial rebuilding • O(logBN + 1/B · logM/B(N/B)log(N/B)) I/Os amortized
Extension to Objects • S: a collection of objects • The density of S is the smallest number λ such that any ball b is intersected by at most λ objects o in S with radius(o) ≥ radius(b) [de Berg et al. 1997] low density high density high density
Extension to Objects • The object-BAR-tree (using guarding sets [de Berg et al. 2003]) • Size: O(λN/B) • Query: O(logB(N/B) + λ/B·1/ε + λ·kε/B) • Construction: O(λN/B · logM/B(N/B)) low density high density high density
Remarks • Extends to d dimensions • Query becomes O(logB(N/B) + 1/εd-1 + kε/B) • Non-convex query ranges • Query becomes O(logB(N/B) + 1/εd+ kε/B) • Construction and query process does not depend on ε • The actual cost isO(logB(N/B) + minε{1/εd-1 + kε/B}) • Open problems • How to update the object-BAR-tree efficiently?
The END Thank you!