1 / 40

Indexing For Function Approximation

Indexing For Function Approximation. Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University. Motivation. Simulations are important in science Large simulations computationally infeasible Driven by complex mathematical models

gotzon
Download Presentation

Indexing For Function Approximation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University VLDB 2006, Seoul

  2. Motivation • Simulations are important in science • Large simulations computationally infeasible • Driven by complex mathematical models • Require solution to complex differential equations • Approximation techniques speed up simulations • Bounded error in the simulation • Approximate simulation steps using information from previous steps VLDB 2006, Seoul

  3. Outline • Example scientific application • Combustion simulation • Function approximation problem • Formulation • Hardness • Algorithm • Indexing problem VLDB 2006, Seoul

  4. Combustion Simulation High Dimensional Composition Vector Air Outflow Inflow Methane Mixing & Reaction Air + Methane VLDB 2006, Seoul

  5. Properties Of Simulation • Composition dimensionality • 9 for simple hydrogen simulations • >50 for complex methane simulations • Cost of reaction function evaluation: 30ms • Number of function evaluations: 108 to 1010 • Total simulation time • 108 function evaluations ≈ 35 days VLDB 2006, Seoul

  6. Function Approximation • Approximate the reaction function • Approach • Use previous function evaluations to approximate future function evaluations • ISAT (In Situ Adaptive Tabulation) [Pope’ 97] • Definition: ε-approximation of f(x) • Let f: Rm → Rn be a function, let x Rm and ε R. f*(x) is an ε-approximation of f(x) if || f*(x) –f(x)|| < ε VLDB 2006, Seoul

  7. Example f Cost VLDB 2006, Seoul

  8. Example f f*(x2) = f(x) + s * (x2 - x) ( x, f(x) ) ε An ε-Local Region Rf,f*(x, ε)  Rm ε x1 x2 Original Cost Cost VLDB 2006, Seoul

  9. Example f2* f f3* f1* x1 x2 x3 x4 x5 x6 Original Cost Cost VLDB 2006, Seoul

  10. Example f2* f f3* f1* x1 x2 x3 x4 x5 x6 When should a local region be added? VLDB 2006, Seoul

  11. Example f2* f4* f f3* Each query point can be covered by several Local Regions f1* x1 x2 x3 x4 x5 x7 x6 x8 VLDB 2006, Seoul

  12. Challenges • Finding good f* s and corresponding Local Regions • Computing a set of Local Regions • Data management: storing Local Regions for future use • Problem: Minimize total simulation time by computing and storing a set of Local Regions VLDB 2006, Seoul

  13. Finding The Optimal Set Of Local Regions • Simplified cost model • Both the function value and Local Region at a point can be obtained at some constant cost equal across all regions • Approximations have zero cost • Offline Problem • Given a set X={ x1, x2, … xn} of query points, find the smallest set L={ l1, l2, … lk } of Local Regions, such that for each xi X there is an lj  L which contains xi • NP-Complete: Reduction from Geometric Covering By Discs • Online Problem • No online algorithm is competitive VLDB 2006, Seoul

  14. Algorithm Illustration f2* f4* f f3* f1* x1 x2 x3 x4 x5 x7 x6 x8 VLDB 2006, Seoul

  15. Algorithm Initialize S Retrieve Lookup x in S Simulation N Y Local Region Found? Return Approximation Evaluate function at x Add new region containing x to S Add VLDB 2006, Seoul

  16. Possible Instantiation Of Local Regions • Local Regions can be approximated using high dimensional ellipsoids [Pope ‘97] • Based on Taylor Expansion of function • Two step approach • Initial conservative approximation • Grow x x1 VLDB 2006, Seoul

  17. Example x ε’ < ε x1 x2 VLDB 2006, Seoul

  18. Example x ε’ < ε x’1 x’2 VLDB 2006, Seoul

  19. Example x ε’ < ε ε x’1 x’2 VLDB 2006, Seoul

  20. Updating Existing Regions N Evaluate function at x Y Can existing region contain x? N Grow Update existing regions to contain x Add new region containing x to S VLDB 2006, Seoul

  21. Outline • Example scientific application • Combustion Simulation • Function Approximation Problem • Formulation • Hardness • Algorithm • Indexing problem VLDB 2006, Seoul

  22. Indexing Problem • Workload • Retrieve: Find ellipsoid containing query point VLDB 2006, Seoul

  23. Indexing Problem • Workload • Retrieve: Find ellipsoid containing query point • Grow • Find ellipsoids to be grown • Update grown ellipsoids VLDB 2006, Seoul

  24. Indexing Problem • Workload • Retrieve: Find ellipsoid containing query point • Grow • Find ellipsoids to be grown • Update grown ellipsoids • Add: Insert a new ellipsoid VLDB 2006, Seoul

  25. New Indexing Problem • Shape of regions • Updates and queries interleaved • Additional costs: ellipsoid maintenance costs • Overall aim: Reduce total simulation time • Retrieve/grow/add are all optional • Tuning parameters at each step VLDB 2006, Seoul

  26. Outline • Example scientific application • Combustion simulation • Function approximation problem • Formulation • Hardness • Algorithm • Indexing problem • Cost structure, tuning parameters and effects • Index structures and experiments VLDB 2006, Seoul

  27. Grow Effects Cmiss = tf + tgrowsearch + Igrow * Cgrow + (1-Igrow)*Cadd • Tuning Parameter: Ellg • Limit on number of ellipsoids examined for growing • No pruning criteria • Affects • tgrowsearch • Chance of finding a growable ellipsoid • Tuning Parameter: Ngrown • Number of ellipsoids grown per step • Affects • Cgrow • Structure of the index (overlapping ellipsoids) VLDB 2006, Seoul

  28. Retrieve Effects Ctot = tsearch + Iret * tla + (1-Iret) * Cmiss • Tuning Parameter: Ellr • Limit on number of ellipsoids examined during retrieve • Limits how much of the index is searched • Affects • tsearch • Chances of a current retrieve and also future retrieves VLDB 2006, Seoul

  29. Add Effects Cmiss = tf + tgrowsearch + Igrow * Cgrow + (1-Igrow)*Cadd • Tuning parameter: Indirectly controlled by retrieves and grows • Affects • Should query point be covered by an add or grow? (-) Computing new ellipsoids is expensive (-) New ellipsoids cover smaller part of the domain (+) May lead to better ellipsoid distribution VLDB 2006, Seoul

  30. Candidate Index Structures • Bounding Box Rtree • Point Rtree • Ellipsoid Rtree • Random Projection Rtree • Binary Tree • MRU List + Rtree VLDB 2006, Seoul

  31. Binary Tree 1 1 B 2 A A 2 C q C B Primary Retrieve VLDB 2006, Seoul

  32. Binary Tree 1 q 1 B 2 A A 2 C C B Secondary Retrieve VLDB 2006, Seoul

  33. Binary Tree 1 1 B 2 A A 2 C C B VLDB 2006, Seoul

  34. Binary Tree 1 1 B 2 A A 2 C B C 3 D 3 C D Secondary Retrieve now Primary Retrieve VLDB 2006, Seoul

  35. Effects In Action: Binary Tree • 32 dimensional Methane simulation • 6 x 106 queries • Windows XP machine (2.4 Ghz, 2GB) VLDB 2006, Seoul

  36. MRU List + Rtree • MRU List for retrieving • High locality • Rtree for searching growable ellipsoids Rtree MRU List VLDB 2006, Seoul

  37. Effects In Action: MRU List + Rtree • Effects very different from Binary Tree VLDB 2006, Seoul

  38. Total Simulation Times VLDB 2006, Seoul

  39. Conclusion & Future Work • Formulated the function approximation problem • New class of applications for high dimensional indexing • Understand index selection for function approximation • Future work • Dynamic parameter settings • New benchmark for index structures • Evaluation of other index structures • Comparison with other function approximation techniques VLDB 2006, Seoul

  40. Questions? VLDB 2006, Seoul

More Related