1 / 15

Index Structures For ISAT Work In Progress

Index Structures For ISAT Work In Progress. By Biswanath Panda, Mirek Riedewald, Paul Chew & Johannes Gehrke. Where Do We Fit In?. ISAT uses a tabulation method to find approximate function values Currently a simple binary tree Try existing and new index structures

elam
Download Presentation

Index Structures For ISAT Work In Progress

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Index Structures For ISATWork In Progress By Biswanath Panda, Mirek Riedewald, Paul Chew & Johannes Gehrke

  2. Where Do We Fit In? • ISAT uses a tabulation method to find approximate function values • Currently a simple binary tree • Try existing and new index structures • Lot of work on indexing high dimensional data • Very little work with real applications

  3. Outline Of Talk • New API for ISATAB • Description of indexes • Experimental results • Ongoing work • Discussion

  4. API Design For ISAT • Why did we need it? • Old API • ISATAB logic and index logic separate • Index logic and ISATAB operations on the index still together snepQuery (containment search) snepQueryList (proximity search) nepAdd nepUpdate nepRemove ISATAB

  5. New API Design General Index Template ISATAB API snepQuery (containment search) snepQueryList (proximity search) nepAdd nepUpdate nepRemove General Index API containmentIterator proximityIterator insertDataItem deleteDataItem updateDataItem Specific Index Implements index logic ISATAB Packaging Object Transforms ISATAB Objects into index objects

  6. Benefits And Losses • Advantages • General Index Template does not need to be changed • Can use any existing index that follows API • Disadvantages • Different layers cannot talk easily • Extra overheads of multiple function calls

  7. Indexes For ISATAB • Initial results showed usefulness of caching • LRU List • Two broad categories of indexes • Point Indexes • Current binary tree • Rtree of points • Ellipsoid Indexes • Rtree of rectangles • LRU list

  8. What Is A Rtree?

  9. Rtree Properties • Balanced tree • Each node must have a minimum occupancy • Overlapping bounding boxes deteriorates search • Delete operation: Deletes and reinserts underfull nodes.

  10. Rtree For ISAT • Point Rtree • Indexes centers of ellipsoids • Find nearest neighbors both for queries and growing • No delete operation • Bounding Box Rtree • Take the bounding box of the ellipsoids • Check for containment in bounding box for queries • Find nearest neighbor to bounding box for a grow • Delete operation in grow

  11. Experiments • Methane Simulation with 32 species

  12. Takeaways • Caching good for searching not for growing • Fast Scan does seem to do well • Point Rtree + list and Original ISAT do the best • Original ISAT does only 50% primary retrieves • Rtree does well but is expensive • What we do not understand? • Our code always does more grows

  13. Another Example • Methane simulation with 55 species • Large number of Grows : Order of 105 grows in 2x106 queries • PointRtree + list = 65% hits • Original ISATAB = 90% hits (44% secondary retrieves) • Rtree = 86% hits (could only reach 105 queries) • Simple caching not going to work • Grow Cost • Grow for Rtree more expensive than point Rtree • Searching for growable ellipsoids and growing them dominates simulation

  14. Summary Of Experiments • Different simulations have very different characteristics • Concept of growing still not clear • Definitely needed • When, what and how much to grow? • Simple caching definitely not good • Tradeoff discovered • Indexing ellipsoids helps search but growing may dominate costs • Indexing points helps updates but search suffers.

  15. Ongoing Work • Hybrid index structures • Transition from update friendly to search friendly indexes • Dynamically change index parameters • Study statistics as the simulation proceeds • Rtree with ellipsoidal bounding region • Random Projections • Project ellipsoids on random lines • If a point lies within the projections, then it lies within the ellipsoid • Simple averaging of nearest neighbors • Error Analysis with different index structures • First paper to introduce problem to the database community

More Related