1 / 20

# Similarity Search without Tears: the OMNI-Family of All-Purpose Access Methods - PowerPoint PPT Presentation

Similarity Search without Tears: the OMNI-Family of All-Purpose Access Methods. Michael Kelleher Kiyotaka Iwataki The Department of Computer and Information Science and Engineering, University of Florida. Outline. Problem/Solution Background The Omni-concept Members of the Omni-family

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

## PowerPoint Slideshow about ' Similarity Search without Tears: the OMNI-Family of All-Purpose Access Methods' - della

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Similarity Search without Tears: the OMNI-Family of All-Purpose Access Methods

Michael Kelleher

Kiyotaka Iwataki

The Department of Computer and Information Science and Engineering, University of Florida

Outline All-Purpose Access Methods

• Problem/Solution

• Background

• The Omni-concept

• Members of the Omni-family

• Experimental Results

Problem All-Purpose Access Methods

• Diverse and complex data

• How to search

• Expensive distance calculations

Solution All-Purpose Access Methods

• Reduce the number of distance calculations

• The Omni-Concept/Family

• Select a set of foci

• Gauge all other objects with their distance from this set

• The foci increase the pruning of distance calculations

• Scalable

Background: Metric Spaces All-Purpose Access Methods

• Set of objects S = {s1,s2,s3,…,sn} of domain S, d() has following properties:

• Symmetry: d(s1,s2) = d(s2,s1)

• Non-negativity: 0<d(s1,s2) < infinity, s1≠ s2, and d(s1,s1) = 0

• Triangle inequality: d(s1,s3) ≤ d(s1,s2) + d(s2,s3)

• A metric space is a pair M = <S,d()>

• Spatial datasets following an Lp distance function are special cases of metric spaces.

Range and NN Queries All-Purpose Access Methods

• Range: Given a query object sq, and a max search distance rq: Rquery(sq,rq)= {si | si ∈ S: d(si,sq) ≤ rq}

• NN: Given a query object sq ∈ S: NNquery(sq)= {sn ∈ S | ∀si ∈ S: d(sn,sq) ≤ d(si,sq)}

Current solutions All-Purpose Access Methods

• Metric tree of Uhlmann

• Vantage-point tree

• Generalized hyper-plane tree

• Multi-vantage point tree

• Geometric Near Access tree

• The M-tree

Intrinsic Dimensionality All-Purpose Access Methods

• Some assume embedding dimensionality of dataset define behavior on a query.

• Datasets can inhibit small portion of embedding space.

• Intrinsic dimensionality gives better precision in selectivity.

• Use correlation of fractal dimensions D2 as an approximation of the intrinsic dimension.

Omni-concepts All-Purpose Access Methods

• Omni-foci base (F): Given M F = {f1,f2,…,fl | fK ∈ S, fk≠fj, l≤N},

• Omni-coordinates (Ci): Ci = { <fk, d(fk,si)>, for all fk ∈ F}

• mbOr: Given F and a collection of objects A = {x1,x2,….xn} ⊂ S, the intersection of the metric intervals RA = |l1 Ii where Ii = [min(d(xj,fi)), max(d(xj,fi))}, 1 <=i<=l, 1 <= j <=n.

df1b All-Purpose Access Methods

df1a

df2b

df2a

df1b

df1a

Cardinality of F All-Purpose Access Methods

• Good number for the cardinality of F would be between the next integer that contains the intrinsic dimension ceil(D2)+1 and 2*ceil(D2)+1.

How to choose foci: HF-Algorithm All-Purpose Access Methods

s1

3

s4

5.5

3

7

10

s3

6

5

s5

2

s6

6

s2

HF-Algorithm All-Purpose Access Methods

• HF-Algorithm practical: O(N)

• Requires l*N distance calculations

• Best foci algorithm O(N!/(N-l)!)

Omni-sequential All-Purpose Access Methods

• Omni-sequential

Calculate Ci

Precede distance calculation by

for fk ∈ F

if | dfk(si) – dfk(sq) | > rq

then skip distance calc.

OmniB+-tree All-Purpose Access Methods

• Store Ci in l B+trees, one for each focus

• Subsets Ik⊂ S are retrieved from corresponding b+-tree and used to generate mbOr.

• Ik is objects between dfk(sq) – rq and dfk(sq) + rq

• Calculate distance from sq to each obj in intersection.

OmniR-tree All-Purpose Access Methods

• Algorithm to do insertion, node partitioning, range queries are same.

• KNN requires NN algorithm used in metric tree. A deep search first preformed to find k-candidates. Continues reducing radius whenever the furthest neighbor is replaced, until every entry that overlaps the radius in the query has been tested.

OmniR-tree All-Purpose Access Methods

• Requires an R tree to store Ci

• Requires a page direct access file to store the objects in the dataset.

• When a leaf in R tree is retrieved, and the Ci stored in this node qualify objects, the actual distance is calculated.

Graph’s prove intrinsic dimensionality of the data is a good reference for the number of foci.

Review good reference for the number of foci.

• Reduce the number of distance calculations

• The Omni-Family

• Select a set of foci

• Gauge all other objects with their distance from this set

• The foci increase the pruning of distance calculations

• Scalable