1 / 14

Lecture 11: Nearest Neighbor Search

This lecture covers the topic of Nearest Neighbor Search and introduces the concept of sketching short bit-strings to distinguish between close and far points. Various approaches for Nearest Neighbor Search are discussed, including linear scan, exhaustive storage, and Locality Sensitive Hashing. The lecture also explores LSH algorithms for Hamming space, Euclidean space, and other distance metrics.

hardwick
Download Presentation

Lecture 11: Nearest Neighbor Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 11:Nearest Neighbor Search

  2. Plan • Distinguished Lecture • Quantum Computing • Oct 19, 11:30am, Davis Aud in CEPSR • Nearest Neighbor Search • Scriber?

  3. Sketching • short bit-strings • given and ,can distinguish between: • Close: • Far: • With high success probability: only failure prob. • , norm: bits 010110 010101 Is ? Yes: close, No: far,

  4. NNS: approaches • Sketch : uses bits • 1: Linear scan++ • Precompute for • Given , compute • For each , estimate distance using • 2: Exhaustive storage++ • For each possible • compute point s.t. • On query , output • Space: Near-linear space and sub-linear query time?

  5. Locality Sensitive Hashing [Indyk-Motwani’98] Random hash function on satisfying: forclosepair (when ) is “high” forfarpair (when ) is “small” Use several hash tables “not-so-small” ,where 1

  6. LSH for Hamming space • Hash function is usually a concatenation of “primitive” functions: • Fact 1: • Example: Hamming space • , i.e., choose bit for a random • chooses bits at random 1

  7. Full Algorithm • Data structure is just hash tables: • Each hash table uses a fresh random function • Hash all dataset points into the table • Query: • Check for collisions in each of the hash tables • until we encounter a point within distance • Guarantees: • Space: , plus space to store points • Query time: (in expectation) • 50% probability of success.

  8. Choice of parameters ? • hash tables with • Pr[collision of farpair] = • Pr[collision of closepair] = • Success probabilityfor a hash table: • tables should suffice • Runtime as a function of ? • Hence sets.t.

  9. Analysis: correctness • Let be an -near neighbor • If does not exists, algorithm can output anything • Algorithm fails when: • near neighbor is not in the searched buckets • Probability of failure: • Probability do not collide in a hash table: • Probability they do not collide in hash tables at most

  10. Analysis: Runtime • Runtime dominated by: • Hash function evaluation: time • Distance computations to points in buckets • Distance computations: • Care only about far points, at distance • In one hash table, we have • Probability a far point collides is at most • Expected number of far points in a bucket: • Over hash tables, expected number of far points is • Total: in expectation

  11. LSH in practice safety not guaranteed • If want exact NNS, what is ? • Can choose any parameters • Correct as long as • Performance: • trade-off between # tables and false positives • will depend on dataset “quality” fewer false positives fewer tables

  12. LSH Algorithms Hamming space Euclidean space • Table does not include: • additive space • factor in query time

  13. LSH Zoo () To sketch or not to sketch To be or not to be • Hamming distance [IM’98] • : pick a random coordinate(s) • (Manhattan) distance [AI’06] • : cell in a randomly shifted grid • Jaccard distance between sets: • : pick a random permutation on the universe min-wise hashing[Bro’97] Claim: Pr[collision]=J(A,B) sketch sketch not not not not be be to or to or 1 …01111… …11101… 1 …01122… …21102… {be,not,or,to} {not,or,to, sketch} be to =be,to,sketch,or,not

  14. LSH for Euclidean distance [Datar-Immorlica-Indyk-Mirrokni’04] • LSH function : • pick a random line , and quantize • project point into • is a random Gaussian vector • random in • is a parameter (e.g., 4) • Claim:

More Related