Design of new index structures for the isat algorithm
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

Design Of New Index Structures For The ISAT Algorithm PowerPoint PPT Presentation


  • 61 Views
  • Uploaded on
  • Presentation posted in: General

Design Of New Index Structures For The ISAT Algorithm. By Biswanath Panda, Mirek Riedewald, Paul Chew, Johannes Gehrke. Outline. Last Time New API For ISATAB Looked at some preliminary results Today Detailed look at the experiments Two new index structures Open questions.

Download Presentation

Design Of New Index Structures For The ISAT Algorithm

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Design of new index structures for the isat algorithm

Design Of New Index Structures For The ISAT Algorithm

By

Biswanath Panda, Mirek Riedewald, Paul Chew, Johannes Gehrke


Outline

Outline

  • Last Time

    • New API For ISATAB

    • Looked at some preliminary results

  • Today

    • Detailed look at the experiments

    • Two new index structures

    • Open questions


Index structures

Index Structures

  • Linked List

  • Linked List + Point RTree

  • RTree

  • Random Projection RTree

    • Explained on next slide

  • Ellipsoidal RTree

    • Use ellipsoids rather than bounding boxes


Random projection rtree

Random Projection RTree

  • Ides

    • Project the ellipsoids on sufficient random lines

    • If a query point lies within the projections, then it lies in ellipsoid

    • Similar to work by Kleinberg on finding the nearest neighbors.

  • Method

    • Project ellipsoids on d*log d lines

    • Use these projections as the sides of a bounding box in a d*logd dimensional RTree


Smaller experiment setup

Smaller Experiment Setup

  • Methane Simulation

    • 33 dimensions

    • Error tolerance : 5e-4

    • Number of queries : 6e+6

    • Pruning Factor : Number of objects looked at for growing


Total time

Total Time

  • Original :1000s LinkedList : 821s LinkedList+Rtree: 857s


Retrieves

Retrieves

  • Best case : 3500 more retrieves

  • Fastest Convergence For The Rtree but slower in time


Grows

Grows

  • Original : 29851 LinkedList+Rtree : 20857

  • Fastest Convergence of LinkedList+Rtree


Design of new index structures for the isat algorithm

Adds

  • Original : 2184 Rtree+LinkedList : 2003

  • Rtree+LinkedList converges fastest


Takeaways

Takeaways

  • Need for a model to prune amount to grow and search

    -Number of adds and grows decrease over time

    -How much to search?

  • Small Index

    • LinkedList seems best for searching

    • Point Rtree best for growing


Larger index

Larger Index

  • Methane Simulation

    • 33 dimensions

    • Error tolerance : 5e-5

    • Queries : 6e+6

    • Pruning Factor : Number of objects looked at for growing

    • Index now grows to around 30000 ellipsoids


Total time1

Total Time

  • Rtree does better than a linked list search

  • Total time increases as you search more


Performance of list in retrieves

Performance of List In Retrieves

  • Performance of list is not so good


Design of new index structures for the isat algorithm

Adds

  • 1000 Less Adds than the original index


Grows1

Grows

  • Significantly lower number of grows

  • We do nearly 100000 more retrieves


Takeaways1

Takeaways

  • Caching still useful but effect reduced

  • Need of model reinforced

    • Adverse effect on overall running time

  • Lesser number of adds grows and retrieves

  • Bad news

    • We are still slower than ISAT in overall running time.


Pruning power of indexes

Pruning Power Of Indexes

  • ISATAB does 50% primary retrieves

  • Possible reason for time difference


Other experiments

Other Experiments

  • Initial results on Random Projections and Ellipsoidal RTree

  • Random Projections

    • Better pruning

    • Similar number of retrieves adds and grows

    • A little slower : Possibly because of the larger dimensionality of the RTree


Other experiments contd

Other Experiments Contd..

  • Ellipsoidal Rtree

    • Very slow for the simulation with 33000 ellipsoids

    • Grows are a problem

      • Nearest neighbor search is very slow

        • Finding the minimum distance between a query point and the ellipsoid

      • Finding covering ellipsoids

    • Both methods show promise in terms of searching but the costs need to be understood


Open question

Open Question

  • Why does ISATAB show such great pruning?

    • Does not know about extent of ellipsoids

  • Understand the space

    • How do the ellipsoids look?

    • What arrangement of ellipsoids are possible?

    • How often to ellipsoids straddle planes?


Size distribution of ellipsoids

Size Distribution Of Ellipsoids

  • Some ellipsoids are very large

  • The ellipsoids added at the beginning are the largest

  • What are the implications of this?


Conclusion

Conclusion

  • We do not need a generic index structure

    • Need to understand what are the properties of the space we are indexing

  • Model

    • Need to understand how to model the different search parameters


  • Login