approximate nn queries on streams with guaranteed error performance bounds
Download
Skip this Video
Download Presentation
Approximate NN queries on Streams with Guaranteed Error/performance Bounds

Loading in 2 Seconds...

play fullscreen
1 / 10

Approximate NN queries on Streams with Guaranteed Error/performance Bounds - PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on

Approximate NN queries on Streams with Guaranteed Error/performance Bounds. Nick Koudas @ AT&T labs-research Beng Chin Ooi , Kian-Lee Tan , Rui Zhang @ National University of Singapore. Problem. Problem: kNN search. Environment: data stream (one scan; memory constraint).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Approximate NN queries on Streams with Guaranteed Error/performance Bounds' - neil


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
approximate nn queries on streams with guaranteed error performance bounds

Approximate NN queries on Streams with Guaranteed Error/performance Bounds

Nick Koudas @ AT&T labs-research

Beng Chin Ooi , Kian-Lee Tan , Rui Zhang

@ National University of Singapore

problem
Problem
  • Problem: kNN search.
  • Environment: data stream (one scan; memory constraint).
  • Approximate Solution: e-approximate kNN (ekNN).
  • Motivation: Applications in which absolute error is preferable or more straightforward.

IP:

137.132.48.120

137.132.48.121

slide3
Two Optimization Problems:
    • memory optimization for a given error bound: given an error bound e, use as little memory as possible to answer ekNN queries.
    • error minimization for a given memory size: given a fixed amount of memory, achieve the best accuracy for ekNN queries.
  • Requirements:
    • One scan algorithm.
    • Satisfies the constraints.
    • Efficient updates and query processing.
a framework
A Framework
  • Divide space into equal square-shaped cells.
  • Maintain at most K points in each cell.
  • For any k≤K, absolute error of kNN distance is bounded by dM, the maximum distance within a cell. For Euclidean distance: dM =

where d is dimensionality; u is the number of cells each dim is divided to.

maintenance of the points a d aptive i ndexing on s treams by space filling c urves disc
Maintenance of the Points--aDaptive Indexing on Streams by space-filling Curves (DISC)
  • Cells are not explicitly maintained, only points.
  • Cells linearized according to Z-curve.
  • Z-value of the cell is the key of a point.
  • Points maintained in a B*-tree.
  • An efficient merge-cell algorithm possible.
algorithm build index
Algorithm: Build index
  • m: the order of Z-curve, 2m cells each dim.
  • If e given, , we get .

me is integer, so

  • If memory constraint given, set a large enough m.
  • Build index
    • Initialize m
    • Read a record P, calculate Z-value, search the B*-tree and find out Nc: number of existing points in the cell P belongs to.
    • If Nc
      • Insert P to the B*-tree.
    • Else
      • Discard one and insert P.
    • If memory runs out //this only happens for the error minimization problem
      • Merge cells and let m=m-1
    • Go back to Step 2 (Read next record)
algorithm merge cells
Algorithm: Merge Cells
  • General Merge-Cell
    • Apply to any structure.
    • For each new cell, find all the points of the old cells in it, and merge them.
  • Bulk Merge-Cell
    • Only apply to DISC.
    • Scan all the leaf pages once.
algorithm knn search
Algorithm: KNN search
  • W: a window query centered at the center of the cell Q is in; and with gradually increasing side length s.
  • Find the kNN to Q within W.
    • If the kNN distance is no larger than the distance between the nearest side of W to Q and Q, search terminates;
    • Else increase s by 1/u .
ad