220 likes | 360 Views
This document discusses innovative index structures designed to enhance the ISAT algorithm's efficiency, focusing on two new types: the Ellipsoidal R-Tree and Random Projection R-Tree. It provides detailed experimentation results, comparing traditional methods with the proposed structures through methane simulation across 33 dimensions. Performance insights reveal that while Linked List and R-Tree combinations exhibit faster convergence, they require a better understanding of space properties. Open questions remain regarding the ellipsoidal structure's pruning power and implications for search efficiency.
E N D
Design Of New Index Structures For The ISAT Algorithm By Biswanath Panda, Mirek Riedewald, Paul Chew, Johannes Gehrke
Outline • Last Time • New API For ISATAB • Looked at some preliminary results • Today • Detailed look at the experiments • Two new index structures • Open questions
Index Structures • Linked List • Linked List + Point RTree • RTree • Random Projection RTree • Explained on next slide • Ellipsoidal RTree • Use ellipsoids rather than bounding boxes
Random Projection RTree • Ides • Project the ellipsoids on sufficient random lines • If a query point lies within the projections, then it lies in ellipsoid • Similar to work by Kleinberg on finding the nearest neighbors. • Method • Project ellipsoids on d*log d lines • Use these projections as the sides of a bounding box in a d*logd dimensional RTree
Smaller Experiment Setup • Methane Simulation • 33 dimensions • Error tolerance : 5e-4 • Number of queries : 6e+6 • Pruning Factor : Number of objects looked at for growing
Total Time • Original :1000s LinkedList : 821s LinkedList+Rtree: 857s
Retrieves • Best case : 3500 more retrieves • Fastest Convergence For The Rtree but slower in time
Grows • Original : 29851 LinkedList+Rtree : 20857 • Fastest Convergence of LinkedList+Rtree
Adds • Original : 2184 Rtree+LinkedList : 2003 • Rtree+LinkedList converges fastest
Takeaways • Need for a model to prune amount to grow and search -Number of adds and grows decrease over time -How much to search? • Small Index • LinkedList seems best for searching • Point Rtree best for growing
Larger Index • Methane Simulation • 33 dimensions • Error tolerance : 5e-5 • Queries : 6e+6 • Pruning Factor : Number of objects looked at for growing • Index now grows to around 30000 ellipsoids
Total Time • Rtree does better than a linked list search • Total time increases as you search more
Performance of List In Retrieves • Performance of list is not so good
Adds • 1000 Less Adds than the original index
Grows • Significantly lower number of grows • We do nearly 100000 more retrieves
Takeaways • Caching still useful but effect reduced • Need of model reinforced • Adverse effect on overall running time • Lesser number of adds grows and retrieves • Bad news • We are still slower than ISAT in overall running time.
Pruning Power Of Indexes • ISATAB does 50% primary retrieves • Possible reason for time difference
Other Experiments • Initial results on Random Projections and Ellipsoidal RTree • Random Projections • Better pruning • Similar number of retrieves adds and grows • A little slower : Possibly because of the larger dimensionality of the RTree
Other Experiments Contd.. • Ellipsoidal Rtree • Very slow for the simulation with 33000 ellipsoids • Grows are a problem • Nearest neighbor search is very slow • Finding the minimum distance between a query point and the ellipsoid • Finding covering ellipsoids • Both methods show promise in terms of searching but the costs need to be understood
Open Question • Why does ISATAB show such great pruning? • Does not know about extent of ellipsoids • Understand the space • How do the ellipsoids look? • What arrangement of ellipsoids are possible? • How often to ellipsoids straddle planes?
Size Distribution Of Ellipsoids • Some ellipsoids are very large • The ellipsoids added at the beginning are the largest • What are the implications of this?
Conclusion • We do not need a generic index structure • Need to understand what are the properties of the space we are indexing • Model • Need to understand how to model the different search parameters