Similarity search without tears the omni family of all purpose access methods
Download
1 / 20

Similarity Search without Tears: the OMNI-Family of All-Purpose Access Methods - PowerPoint PPT Presentation


  • 128 Views
  • Uploaded on

Similarity Search without Tears: the OMNI-Family of All-Purpose Access Methods. Michael Kelleher Kiyotaka Iwataki The Department of Computer and Information Science and Engineering, University of Florida. Outline. Problem/Solution Background The Omni-concept Members of the Omni-family

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Similarity Search without Tears: the OMNI-Family of All-Purpose Access Methods' - della


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Similarity search without tears the omni family of all purpose access methods

Similarity Search without Tears: the OMNI-Family of All-Purpose Access Methods

Michael Kelleher

Kiyotaka Iwataki

The Department of Computer and Information Science and Engineering, University of Florida


Outline
Outline All-Purpose Access Methods

  • Problem/Solution

  • Background

  • The Omni-concept

  • Members of the Omni-family

  • Experimental Results


Problem
Problem All-Purpose Access Methods

  • Diverse and complex data

  • How to search

  • Expensive distance calculations


Solution
Solution All-Purpose Access Methods

  • Reduce the number of distance calculations

  • The Omni-Concept/Family

    • Select a set of foci

    • Gauge all other objects with their distance from this set

    • The foci increase the pruning of distance calculations

    • Scalable


Background metric spaces
Background: Metric Spaces All-Purpose Access Methods

  • Set of objects S = {s1,s2,s3,…,sn} of domain S, d() has following properties:

    • Symmetry: d(s1,s2) = d(s2,s1)

    • Non-negativity: 0<d(s1,s2) < infinity, s1≠ s2, and d(s1,s1) = 0

    • Triangle inequality: d(s1,s3) ≤ d(s1,s2) + d(s2,s3)

  • A metric space is a pair M = <S,d()>

  • Spatial datasets following an Lp distance function are special cases of metric spaces.


Range and nn queries
Range and NN Queries All-Purpose Access Methods

  • Range: Given a query object sq, and a max search distance rq: Rquery(sq,rq)= {si | si ∈ S: d(si,sq) ≤ rq}

  • NN: Given a query object sq ∈ S: NNquery(sq)= {sn ∈ S | ∀si ∈ S: d(sn,sq) ≤ d(si,sq)}


Current solutions
Current solutions All-Purpose Access Methods

  • Metric tree of Uhlmann

  • Vantage-point tree

  • Generalized hyper-plane tree

  • Multi-vantage point tree

  • Geometric Near Access tree

  • The M-tree


Intrinsic dimensionality
Intrinsic Dimensionality All-Purpose Access Methods

  • Some assume embedding dimensionality of dataset define behavior on a query.

  • Datasets can inhibit small portion of embedding space.

  • Intrinsic dimensionality gives better precision in selectivity.

  • Use correlation of fractal dimensions D2 as an approximation of the intrinsic dimension.


Omni concepts
Omni-concepts All-Purpose Access Methods

  • Omni-foci base (F): Given M F = {f1,f2,…,fl | fK ∈ S, fk≠fj, l≤N},

  • Omni-coordinates (Ci): Ci = { <fk, d(fk,si)>, for all fk ∈ F}

  • mbOr: Given F and a collection of objects A = {x1,x2,….xn} ⊂ S, the intersection of the metric intervals RA = |l1 Ii where Ii = [min(d(xj,fi)), max(d(xj,fi))}, 1 <=i<=l, 1 <= j <=n.


df1b All-Purpose Access Methods

df1a

df2b

df2a

df1b

df1a


Cardinality of f
Cardinality of F All-Purpose Access Methods

  • Good number for the cardinality of F would be between the next integer that contains the intrinsic dimension ceil(D2)+1 and 2*ceil(D2)+1.


How to choose foci hf algorithm
How to choose foci: HF-Algorithm All-Purpose Access Methods

s1

3

s4

5.5

3

7

10

s3

6

5

s5

2

s6

6

s2


Hf algorithm
HF-Algorithm All-Purpose Access Methods

  • HF-Algorithm practical: O(N)

  • Requires l*N distance calculations

  • Best foci algorithm O(N!/(N-l)!)


Omni sequential
Omni-sequential All-Purpose Access Methods

  • Omni-sequential

    Calculate Ci

    Precede distance calculation by

    for fk ∈ F

    if | dfk(si) – dfk(sq) | > rq

    then skip distance calc.


Omnib tree
OmniB+-tree All-Purpose Access Methods

  • Store Ci in l B+trees, one for each focus

  • Subsets Ik⊂ S are retrieved from corresponding b+-tree and used to generate mbOr.

  • Ik is objects between dfk(sq) – rq and dfk(sq) + rq

  • Calculate distance from sq to each obj in intersection.


Omnir tree
OmniR-tree All-Purpose Access Methods

  • Algorithm to do insertion, node partitioning, range queries are same.

  • KNN requires NN algorithm used in metric tree. A deep search first preformed to find k-candidates. Continues reducing radius whenever the furthest neighbor is replaced, until every entry that overlaps the radius in the query has been tested.


Omnir tree1
OmniR-tree All-Purpose Access Methods

  • Requires an R tree to store Ci

  • Requires a page direct access file to store the objects in the dataset.

  • When a leaf in R tree is retrieved, and the Ci stored in this node qualify objects, the actual distance is calculated.


Graph’s prove intrinsic dimensionality of the data is a good reference for the number of foci.


Review
Review good reference for the number of foci.

  • Reduce the number of distance calculations

  • The Omni-Family

    • Select a set of foci

    • Gauge all other objects with their distance from this set

    • The foci increase the pruning of distance calculations

    • Scalable


ad