Design Of New Index Structures For The ISAT AlgorithmPowerPoint Presentation

### Design Of New Index Structures For The ISAT Algorithm

By

Biswanath Panda, Mirek Riedewald, Paul Chew, Johannes Gehrke

Outline

- Last Time
- New API For ISATAB
- Looked at some preliminary results

- Today
- Detailed look at the experiments
- Two new index structures
- Open questions

Index Structures

- Linked List
- Linked List + Point RTree
- RTree
- Random Projection RTree
- Explained on next slide

- Ellipsoidal RTree
- Use ellipsoids rather than bounding boxes

Random Projection RTree

- Ides
- Project the ellipsoids on sufficient random lines
- If a query point lies within the projections, then it lies in ellipsoid
- Similar to work by Kleinberg on finding the nearest neighbors.

- Method
- Project ellipsoids on d*log d lines
- Use these projections as the sides of a bounding box in a d*logd dimensional RTree

Smaller Experiment Setup

- Methane Simulation
- 33 dimensions
- Error tolerance : 5e-4
- Number of queries : 6e+6
- Pruning Factor : Number of objects looked at for growing

Total Time

- Original :1000s LinkedList : 821s LinkedList+Rtree: 857s

Retrieves

- Best case : 3500 more retrieves
- Fastest Convergence For The Rtree but slower in time

Grows

- Original : 29851 LinkedList+Rtree : 20857
- Fastest Convergence of LinkedList+Rtree

Adds

- Original : 2184 Rtree+LinkedList : 2003
- Rtree+LinkedList converges fastest

Takeaways

- Need for a model to prune amount to grow and search
-Number of adds and grows decrease over time

-How much to search?

- Small Index
- LinkedList seems best for searching
- Point Rtree best for growing

Larger Index

- Methane Simulation
- 33 dimensions
- Error tolerance : 5e-5
- Queries : 6e+6
- Pruning Factor : Number of objects looked at for growing
- Index now grows to around 30000 ellipsoids

Total Time

- Rtree does better than a linked list search
- Total time increases as you search more

Performance of List In Retrieves

- Performance of list is not so good

Adds

- 1000 Less Adds than the original index

Grows

- Significantly lower number of grows
- We do nearly 100000 more retrieves

Takeaways

- Caching still useful but effect reduced
- Need of model reinforced
- Adverse effect on overall running time

- Lesser number of adds grows and retrieves
- Bad news
- We are still slower than ISAT in overall running time.

Pruning Power Of Indexes

- ISATAB does 50% primary retrieves
- Possible reason for time difference

Other Experiments

- Initial results on Random Projections and Ellipsoidal RTree
- Random Projections
- Better pruning
- Similar number of retrieves adds and grows
- A little slower : Possibly because of the larger dimensionality of the RTree

Other Experiments Contd..

- Ellipsoidal Rtree
- Very slow for the simulation with 33000 ellipsoids
- Grows are a problem
- Nearest neighbor search is very slow
- Finding the minimum distance between a query point and the ellipsoid

- Finding covering ellipsoids

- Nearest neighbor search is very slow
- Both methods show promise in terms of searching but the costs need to be understood

Open Question

- Why does ISATAB show such great pruning?
- Does not know about extent of ellipsoids

- Understand the space
- How do the ellipsoids look?
- What arrangement of ellipsoids are possible?
- How often to ellipsoids straddle planes?

Size Distribution Of Ellipsoids

- Some ellipsoids are very large
- The ellipsoids added at the beginning are the largest
- What are the implications of this?

Conclusion

- We do not need a generic index structure
- Need to understand what are the properties of the space we are indexing

- Model
- Need to understand how to model the different search parameters

