Approximate nn queries on streams with guaranteed error performance bounds
Download
1 / 10

Approximate NN queries on Streams with Guaranteed Error/performance Bounds - PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on

Approximate NN queries on Streams with Guaranteed Error/performance Bounds. Nick Koudas @ AT&T labs-research Beng Chin Ooi , Kian-Lee Tan , Rui Zhang @ National University of Singapore. Problem. Problem: kNN search. Environment: data stream (one scan; memory constraint).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Approximate NN queries on Streams with Guaranteed Error/performance Bounds' - neil


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Approximate nn queries on streams with guaranteed error performance bounds

Approximate NN queries on Streams with Guaranteed Error/performance Bounds

Nick Koudas @ AT&T labs-research

Beng Chin Ooi , Kian-Lee Tan , Rui Zhang

@ National University of Singapore


Problem
Problem Error/performance Bounds

  • Problem: kNN search.

  • Environment: data stream (one scan; memory constraint).

  • Approximate Solution: e-approximate kNN (ekNN).

  • Motivation: Applications in which absolute error is preferable or more straightforward.

IP:

137.132.48.120

137.132.48.121


  • Two Optimization Problems: Error/performance Bounds

    • memory optimization for a given error bound: given an error bound e, use as little memory as possible to answer ekNN queries.

    • error minimization for a given memory size: given a fixed amount of memory, achieve the best accuracy for ekNN queries.

  • Requirements:

    • One scan algorithm.

    • Satisfies the constraints.

    • Efficient updates and query processing.


A framework
A Framework Error/performance Bounds

  • Divide space into equal square-shaped cells.

  • Maintain at most K points in each cell.

  • For any k≤K, absolute error of kNN distance is bounded by dM, the maximum distance within a cell. For Euclidean distance: dM =

    where d is dimensionality; u is the number of cells each dim is divided to.


Maintenance of the points a d aptive i ndexing on s treams by space filling c urves disc
Maintenance of the Points Error/performance Bounds--aDaptive Indexing on Streams by space-filling Curves (DISC)

  • Cells are not explicitly maintained, only points.

  • Cells linearized according to Z-curve.

  • Z-value of the cell is the key of a point.

  • Points maintained in a B*-tree.

  • An efficient merge-cell algorithm possible.


Algorithm build index
Algorithm: Build index Error/performance Bounds

  • m: the order of Z-curve, 2m cells each dim.

  • If e given, , we get .

    me is integer, so

  • If memory constraint given, set a large enough m.

  • Build index

    • Initialize m

    • Read a record P, calculate Z-value, search the B*-tree and find out Nc: number of existing points in the cell P belongs to.

    • If Nc <K

      • Insert P to the B*-tree.

    • Else

      • Discard one and insert P.

    • If memory runs out //this only happens for the error minimization problem

      • Merge cells and let m=m-1

    • Go back to Step 2 (Read next record)


Algorithm merge cells
Algorithm: Merge Cells Error/performance Bounds

  • General Merge-Cell

    • Apply to any structure.

    • For each new cell, find all the points of the old cells in it, and merge them.

  • Bulk Merge-Cell

    • Only apply to DISC.

    • Scan all the leaf pages once.


Algorithm knn search
Algorithm: KNN search Error/performance Bounds

  • W: a window query centered at the center of the cell Q is in; and with gradually increasing side length s.

  • Find the kNN to Q within W.

    • If the kNN distance is no larger than the distance between the nearest side of W to Q and Q, search terminates;

    • Else increase s by 1/u .


Experiments
Experiments Error/performance Bounds


Questions
Questions ? Error/performance Bounds


ad