1 / 23

Loading a Cache with Query Results

Loading a Cache with Query Results. Laura Haas, IBM Almaden Donald Kossmann, Univ. Passau Ioana Ursu, IBM Almaden. Background & Motivation. Applications invoke queries and methods Queries select relevant objects Methods work with relevant objects

mihaly
Download Presentation

Loading a Cache with Query Results

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Loading a Cache with Query Results Laura Haas, IBM Almaden Donald Kossmann, Univ. Passau Ioana Ursu, IBM Almaden

  2. Background & Motivation • Applications invoke queries and methods • Queries select relevant objects • Methods work with relevant objects • Example: find hotels and reserve rooms • Other examples: CAX, SAP R/3, Web foreach h in (select oid from hotels h where city = Edinburgh) h.requestRoom(3, Sep-6, Sep-12);

  3. Background and Motivation • Traditional client-server systems: • methods are executed by clients with caching • queries are executed by clients and servers • query processing is independent of caching • Problems: • data must be fetched twice • objects are faulted in individually • Terrible performance in many environments

  4. Traditional System foreach h in (select oid from ...) h.reserveRoom(); cache query processor <apex, ***, ...> <carlton, **, ...> <apex, ***, ...> server

  5. Goal & Solution • Load Cache as a by-product of queries. • copy relevant objects while executing the query • Cache operators do the copying • Extend the query optimizer • which collections should be cached? • when to copy? • Assumption: caching in the granularity of objects

  6. foreach h in (select oid from ...) h.reserveRooms(); Join <apex, ***, ...> Cache Hotels Cities <apex, ***, ...> <carlton, **, ...> server

  7. Tradeoffs • What to cache? • Cost of Cache operator must be smaller than savings obtained by this kind of pre-caching • When to cache? • late so that only relevant objects are cached • early so that other operators are not affected • N.B. Cache operators affect the cost of other (lower) operators in the plan

  8. Early vs. Late Cache Operators: Copying Irrelevant Objects Join <apex, ...> <ritz, ...> <carlton, ...> <plaza, ...> Cache Hotels Cities <apex, ***, Edinburgh> <ritz, *****, Paris> server

  9. Early vs. Late Cache Operators: Late Projections Early Cache - Cheap Join Late Cache - Expensive Join Join <apex> <ritz> <apex, Edin.> <ritz, Paris> Cache Cities Join Cache <apex, ***, Edinburgh> <ritz, *****, Paris> <apex, ***, Edinburgh> <ritz, *****, Paris> Cities Hotels Hotels

  10. Alternative Approaches • Determine candidate collections for caching; i.e. what to cache: • carry out data flow analysis • analyze selectclause of the query; cache if oidis returned • Determine when to cache candidate objects: • heuristics • cost-based approach

  11. Caching at the Top Heuristics • Policy • cache all candidate collections • cache no irrelevant objects (i.e., late caching) • Algorithm • generate query plan for select * query • place Cache operator at the top of plan • push down Cache operator through non-reductive operations • N.B.: Simulates „external“ approach

  12. Cache Operator Push Down Cache Operator may be pushed down non-reductive operations Sort Cache(h,c) Sort Cache(h) Sort Cache(h,c) Join Join Join Hotels Cache(c) Hotels Cities Hotels Cities Cities Initial Plan 1. Push Down 2. Push Down Push-down reduces the cost of non-reductive operations without causing irrelevant objects being copied

  13. Caching at the Bottom Heuristics • Policy • cache all candidate collections • increase cost of other operations as little as possible (i.e., early caching) • Algorithm • extend optimizer to produce plan with Cache operators as low as possible (details in paper) • pull-up Cache operators through pipeline Pull-up reduces the number of irrelevant objects that are cached without increasing the cost of pipelined operators

  14. Cost-based Cache Operator Placement • Try to find the best possible plan • Cache operators only if they are benefitial • Find best place for Cache operators in plan • Join order and site selection depends on caching • Extend the query optimizer • enumerate all possible Caching plans • estimate cost and benefit of Cache operators • extended pruning condition for dyn. programming

  15. Enumerating all Caching Plans Plans with Join at the Server Cache(h,c) Cache(h) Join Join Join Hotels Cities Hotels Cities Hotels Cities Plans with Join at the Client Join Join Cache(c) Cache(h) Cache(c) Cache(h) Join Hotels Cities Hotels Cities Hotels Cities

  16. Costing of Cache Operators • Overhead of Cache Operators • cost to probe hash table for every object • cost to copy objects which are not yet cached • Benefit of Cache Operators • savings: relevant objects are not refetched • savings depend on costs to fault-in object and current state of the cache • Cost = Overhead - Benefit • only Cache operators with Cost < 0 are useful

  17. Summary of Approaches • Heuristics • simple to implement • not much additional optimization overhead • poor plans in certain situations • Cost-based • very good plans • huge search space, slows down query optimizer

  18. Performance Experiments • Test Environment • Garlic heterogeneous database system • UDB, Lotus Notes, WWW servers • Benchmark • relational BUCKY benchmark database • simple queries to multi-way cross-source joins • simple accessor methods

  19. Application Run Time (secs)single-table query + accessor method

  20. Application Run Time (secs)three-way joins + accessor method

  21. Query Optimization Times(secs)vary number of candidate collections

  22. Conclusions • Loading the cache with query results can result in huge wins • for search & work applications • if client-server interaction is expensive • Use cost-based approach for simple queries • four or less candidate collections • Use heuristics for complex queries • Caching at Bottom heuristics is always at least as good as traditional, do-nothing approach

  23. Future Work • Explore full range of possible approaches • e.g. cost-based Cache operator pull-up and push-down • Consider tradeoff of optimization time and application run time (meta optimization) • invest in optimization time only if high gains in application run-time can be expected • consider state of the cache, dynamic optimization

More Related