1 / 26

Access Structures for Angular Similarity Queries

Access Structures for Angular Similarity Queries. Tan Apaydin and Hakan Ferhatosmanoglu IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 18, NO. 11, NOVEMBER 2006. Motivation.

ray
Download Presentation

Access Structures for Angular Similarity Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Access Structures forAngular Similarity Queries Tan Apaydin and HakanFerhatosmanoglu IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 18, NO. 11, NOVEMBER 2006

  2. Motivation • Angular similarity measures have been utilized by several database applications to define semantic similarity between various data types such as text documents, time-series, images, and scientific data. • Problems due to a mismatch of geometry make current techniques either inapplicable or their use results in poor performance. • This brings up the need for effective indexing methods for angular similarity queries.

  3. We propose access structures to enable efficient execution of queries seeking angular similarity. • We explore quantization-based indexing, which scales well with the dimensionality ,and propose techniques that are better suited to angular measures than the conventional techniques.

  4. Vector Approximation file (VA-file)

  5. Round-robin manner • Approach would slice the major pyramids in a round-robin manner. • For instance, = 1 according to = 1 to = 1 to (in a cyclicmanner)

  6. Equi-populated Equi-volumed

  7. A particular point is contained in major pyramid , where is the dimension with the greatest corresponding value, i.e., . • For instance, in three dimensions, P(0.7, 0.3, 0.2) will be in “x1 = 1 major pyramid” since 0.7 ()is greater than both 0.3 and 0.2 .

  8. Filtering Step • The easiest way to decide whether an approximation intersects the range query space is to look at the boundaries of the unit square which are not intersectingthe origin.

  9. Q max min

  10. If a feature vector is represented as ) the cosine angle isdefined by the following formula: • if we assume the query point to be normalized, then can be simplified to • where U() is the unit normalized query

  11. Let Q be a three-dimensional query point and u=(,,) be the unit vector which is the normalization of the query vector. • The expression for an equivalence conic surface in angular space is the following equation:

  12. Lagrange’s multipliers approach • For , the closed form of the ellipse equation is • To maximize or minimize subject to the constraint , the following system of equations is solved:

  13. To compute the extreme values for on , take f() = • To compute the extreme values for on , take f() = • To compute the extreme values for on , take f() = ,) = (+ ) ,) = (+ ) ,) = (+ )

  14. Filter Approximations • We have the min-max values, we can use them to retrieve the relevant approximations. • These are the approximations in the specified range neighborhood of the query.

  15. Identifying feature vectors • Pruning step, we need to compute the angular distance of every candidate point to the query point and, if a point is in the given range , then we output that point in the result set.

  16. CONE-SHELL QUANTIZER (CS-Q) • Uses cone partitions, rather than pyramids, and is organized as shells instead of the sweep approach followed by AS-Q.

  17. Angular Approximations based on Equal Populations • is the number of data points • is the reference point • isthe set of all approximations • is the ithapproximation. • 1) For each data point , 1 kN, calculate the angular distance between and . • 2) Sort the data points in nondecreasing order based on their angular distances to . • 3) Assume t is the given population for each approximation. Assign the first t number of points in sorted order to Sae , the second t number of points to , and so on.

  18. EXPERIMENTAL RESULTS

  19. AS-Q

  20. CS-Q

  21. Scalability test results

More Related